Is it time to rebrand (or rethink) the modern data stack? , by Oliver Molander | November, 2022

Discuss this great feature

Google Trends search data for “modern data stack”

In just a few weeks, it will be exactly ten years since AWS Redshift was first made public via a limited preview. Redshift is considered the OG cloud data warehouse, followed by BigQuery and Snowflake. According to many, this piece of technology paved the way for the “modern data stack”.

Looking at Google Trends data, the Modern Data Stack hype really started in 2020, for example, when dbt Labs began raising the resulting funding round, reaching a valuation of $4.2B in February ’22 – their $12.9B. Series A in 2020 less than 24 months after M.

Because of the meteoric rise of a powerful data warehouse sitting on top of a cloud platform (Snowflake’s blockbuster IPO in 2020 ensured that anyone working in the tech was suddenly struck by this piece of technology). The way the modern data stack is represented by most of the data, startup, and VC community members (and naturally, vendors) today, is heavily centered around cloud data warehouses and, in simple words, around tools. is defined as a set. This.

More often than not, the real-time streaming paradigm is kept completely separate when drawing modern data stacks. But that is another article for another day.

To highlight the key impact (and subsequent focus) of the data warehouse, Matt Turk put it nicely in his famous Machine Learning, AI, and Data Landscape Analysis from 2021:

“Today, cloud data warehouses (Snowflake, Amazon Redshift and Google BigQuery) and lakehouses (Databricks) offer the ability to store massive amounts of data in a way that is useful, not entirely cost-prohibitive and for There is no need for a very technical army to sustain the people. In other words, after so many years, it is now finally possible to store and process Big Data.”

So from 2020 onwards, the hype around the Modern Data stack is clear to say the least. And definitions differ depending on agenda and background.

As Ben Stencil mentions in his “The Modern Data Experience” blog post about the Modern Data Stack from 2021 onwards:

“For analytics engineers, it’s a transformative change in technology and company organization. For startup founders, it’s a revolution in how companies operate. For VCs, it’s a $100 billion opportunity. For engineers, it’s is a dynamic architectural roadmap. For Gartner, it is the foundation of a new data and analytics strategy. For thought leaders, it is a data mesh. For an analyst with a indulgent blog on the Internet, it is a new orientation, a The new nomenclature, and a slew of other esoteric similes that would only care about someone living deep within their navel.”

There is no doubt that the cloud has changed the mindset from just storing useful data to storing all potentially useful data. Streaming technologies like Kafka and Kinesis, cloud data warehouses like Snowflake, Redshift and BigQuery, data lakes like S3 and GCS, and cloud data lakehouses like Databricks have reduced friction to store more data – high velocity, high cardinality and Volumes from different data sources on large.

With this change and explosion of data coming from a variety of sources, we are faced with a new, inherently human challenge: collaboration between data producers and data consumers – and putting checks and balances in place. And most importantly, the data is not treated as just a side product – rather it is treated like any other product or feature.

This shift in mindset is clearly visible within the data community during the past six months through an increased focus on data modeling and data contracts in particular.

Chad Sanderson put it well in his LinkedIn post:

“Data has a massive collaboration problem. Many technical issues are solved in data: storage/compute separation, ELT, orchestration, and so on. What hasn’t been solved is how to deliver real business value.” Producers and consumers work together for this.”

Often, I hear how we have solved the challenge of moving data from location A to B and now we can manage these explosive volumes of data. But on the other hand, I hear over and over again how we’ve made huge strides when it comes to making data truly usable and managing data quality at scale.

Enter the data contract that Chad is advocating for and has led in many ways within the data community. In overly simplified terms, a data contract should be considered as an agreement between the data producer and the data consumer, which should include what the data being produced should look like, what SLAs the data should conform to, and the data The meaning of

Data contracts alone will not largely solve the data usability and data quality challenges – rather it should be considered as the foundation upon which to build.

Earlier in February, a blog post about Airflow’s unbundling sparked a heated discussion not only on Hacker News, but in the data community.

The unbundling or bundling (and best-of-breed or monolith) debate is a recurring discussion in any technological infrastructure space that reaches a certain level of maturity. I saw the same discussion in the marketing technology space in 2015/16, when the space peaked – inspired by Scott Brinker’s famous MarTech landscape.

I think Mark Lamberty summarized the unbundling vs bundling discussion in the context of the modern data stack:

It’s not about whether you’re a bundling pro or an unbundling pro.

It is about being able to make accurate, complete and reliable decisions based on data.

The more tools you add to your stack, the more likely you are to lose global context of your data and impact your end-users.

It’s the same as when you assemble an Ikea cabinet. It looks cool but it takes you hours to assemble a piece of furniture when you could have already assembled it and bought it.

less is more.

Modern data stacks get a lot of publicity. But there is also a lot of pushback on this post from the part of the engineers. What exactly makes up a stack? data lakes? data warehouse? DBT? Fivetran? the spark? Kafka? Databricks? Air flow? Observer?

There are endless options and no clear consensus on what a stack really is – although it is represented by the majority as a set of tools and technologies connected to a cloud data warehouse (especially by non-data engineers).

There are more and more voices in the data community – especially among engineers – talking about how the modern data stack hype has gone too far. as Robert Sahlin (Data Engineering Lead at MatHem) noted on twitter Last year:

Can we please stop naming it the “modern” data stack. There are not so many modern things with that stack. Not sure what to name it, but this setup is best for one-person/small data teams Good for those who require reporting/dashboard installed.”

I think Chad Sanderson put it well in another of his LinkedIn posts:

“I don’t think the modern data stack is dead. I believe in the power of technology to solve business problems. I think the cloud is a revolutionary technology, and I like most of our modern tools. However, one Tool adoption at the cost of a consistent strategy leads to an entirely unstable data environment.”

Let’s face it, “modern” has always been a problematic description: it’s like “new.” And just as you get programming names like new_really_new_newest_the latest_new, modern means at a specific point in the present time.

We all understand this naturally.

So, given that we will soon be celebrating the 10th anniversary of the piece of technology that paved the way for modern data stack hype: is it time to reinvent or think again?

I recently spoke to a data engineer (who wishes to remain anonymous) who said that it seems that the Modern Data Stack is what the market refers to every month whenever there is something new in data engineering. It is always the last “silver bullet”.

Architecture should support the process and the people, rather than reflect the current advertising and the next anticipated silver bullet.

Samuel L. Jackson Character Pointing Gun Meme: Caption: Say One More Time to the Modern Data Stack

what do you think

Leave a Comment