Big Data to Modern Data Stack: Evolution and revolution

Dhairya Jain
Analytics Vidhya
Published in
5 min readJun 19, 2021

--

What is a Modern Data Stack (MDS)?

At a grand scale, the MDS is the evolution of the old and brittle tools of broken processes from the relic archaic systems ( which earlier was called Big Data) which require consistent maintenance and QA to a Modern data system that automates, simplify and speeds the ability of companies to get their data and make strong sound business decisions.

The Modern Data Stack is a collection of different components, which involves ingestion, transformation, data storage, and BI platforms. Each component is a complete product solution in itself and provides solutions to specific problems in data processing. Thus the total scope of MDS is quite wide. For instance, ingestion tools provide extraction functionality for the system which extracts information from various sources and transformation tools help transform all that data into a proper format. All the collected data is then stored in storage warehouses and data lakes, displayed in BI tools that present data analytics results to internal and external users.

The modern data stack infrastructure collects components that bring insights into performances and congregates the data into a user-worthy form. This genesis of the modern data stack is associated with the birth of Amazon’s Redshift, which introduced the MPP format of data processing at a minuscule amount. Since then, the much smaller teams’ adoption of the data tools skyrocketed and led to a series of new products fostering the overall ecosystem’s growth.

The overall potential for the modern data stack companies is much bigger and larger since there is massive adoption of Data technologies by a broader set of companies, ranging from medium-sized to the very largest multinationals. Since the implementation of MDS technology is gaining traction the smaller companies are also looking at the importance of such technology in their organization. However, evolution from the old legacy Big data infrastructure for large enterprise companies is slow and will take some time to mature and benefit the MDS companies. The impact of COVID-19 is a positive one, catalysing the adoption of data infrastructure at a rapid pace, even for large enterprises.

The current market size estimates for the MDS companies is 65.7 billion USD as of 2020 growing at 19.4% annually

Subcategories for MDS

Several tools collectively form the Modern Data Stack. These tools form the branches for working on Data and together form the complete system. The branches of the MDS are

  1. Data Warehouse
  2. Ingestion
  3. Transformation
  4. Business Analytics
  5. Data Lakes
  6. Governance
  7. Data Quality
MDS Map

The former five software tools listed above are the minimum tools required to work on MDS. The other is relatively new tools that help professionals with lineage and assures the quality of data being fed into the system.

The market size of the top five sub-categories the above list forms 90% of the market and the rest are relatively new categories that were introduced in the recent year, accounts for only 10% of the market share and thus have a market size in the same proportion.

To simplify this thesis and condense it into four pages, I’ve concentrated more on the top four tools listed above for the discussion of the business model.

Business Models of different MDS Tools

MDS’s Current State, Challenges and Opportunities

MDS Timeline

The MDS has come a long way from its initial days and currently in its second phase of development. This second phase of MDS consist of

  1. Cloud Services
  2. Data Governance and Quality tools
  3. Simplified User Interface
Emerging Themes

But still, few friction points need to be solved which are

  1. Lack of feedback to operational tools
  2. Lack of horizontal Interface for unified data interaction
  3. Data Steaming incapabilities
  4. Immature Governance

These limitations of the current state of MDS forms the groundwork for the opportunities that are yet to be tapped in this space. On top of that, there are some issues like data quality, ETL->ELT tools and tools that converge data warehouses and lake into one application that some young startups address but still hold immense potential for others to explore this specific arena.

Focus Areas

Following are the startups that are killing it in the MDS space worldwide

  1. Dbt (Transformation)
  2. Seekwell(BI/Analytics)
  3. Tray.io (Data Ingestion/Automation)
  4. Hevo Data [India] ( Ingestion and Transformation )
  5. Materialize (Ingestion/Data Streaming)
  6. Census (BI/Analytics)
  7. Fivetran(Ingestion/ETL)
  8. Infoworks.io (Data Warehouse)
  9. Firebolt (Data Warehouse)
  10. Alation(Data Governance )
  11. Atlan [India] ( Data Governance, Ingestion, Data Quality)

--

--

Dhairya Jain
Analytics Vidhya

Dhairya is a startup worm and enjoys working on upcoming tech.