2023 Highlights in Data

Sergi Gomez
Co-founder of Saivo

I was preparing a year-end review session with my team at Saivo and went through all my notes from my conversations with startup founders and investors throughout 2023. Here are the trends that distilled from all those interactions.

1.Primary needs remain unmet

Despite the excitement around Generative AI, many startups and growing companies are still grappling with the same fundamental needs in analytics:

  • (A) Full business visibility: Even the most fundamental business metrics that give us clarity on customer engagement, sales conversation rates, unit economics,… can be very challenging to obtain as we have to merge multiple data sources. I’ve often heard startup CEOs lament, “I simply don’t know how many new active users we got this month”.
  • (B) Trust in the data: Metrics that we get off the shelf from apps often lack credibility. Many have expressed concerns like, “I don’t trust the MRR shown in my payments app”. Since they are black boxes, we don’t know what assumptions and business rules they applied to get to that number. Instead, companies should get the raw data and create their own data models that accurately reflect their business logic and semantics.
  • (C) Self-service in data consumption: “We empower your business teams to autonomously answer questions and get insights from data”. Despite these BI platforms’ promises, I still see business teams depending heavily on tech and data teams for answering ad-hoc questions, exploring the data, creating reports, etc. This creates lots of inefficiencies that prevent organizations from unleashing the full potential in their data — More on that in point 2 below.

2. Self-service analytics: We are not there yet

LLMs are pushing self-service to new heights. We can now ask a question in natural language, and an AI-powered apps create the query for us, run the query against our database, return the resulting values of that query, and can even create nice visualizations. I am seeing many players in the BI space building a conversational UI/UX, from new and modern ones such as Zenlytic to bigger ones such as Thoughspot.

If we only use LLMs to write queries against a badly designed data schema, or if we don’t provide the right context, these models will generate inaccurate and inconsistent results. AI needs to be paired with the following pieces:

  • (A) “Semantic LLMs”: This year has confirmed the necessity of the Semantic Layer as an intermediate layer between the database and the LLMs. Put simply, the Semantic Layer gives the AI model the context and the guardrails required to ensure accurate and consistent results — More on the Semantic Layer in point 3 below.
  • (B) The right data modeling: The hard problem in data is still to create foundation datasets of your business that integrate various data sources and are flexible enough so data consumers (or LLMs) can calculate any metric and run any type of analysis on top. You can have a very powerful AI model wrapped in a user-friendly UI to query the data, but if your data schema is wrong, you will not get the numbers accurate (“Garbage In, Garbage Out”).

3. Semantic Layer: Slower adoption than expected

If you want an intro to the Semantic Layer, I strongly recommend this series of articles by David Jayatillake.

In 2023, at Saivo we extensively tested Semantic Layer implementations like Cube’s and dbt’s. Our preference is increasingly towards dbt-MetricFlow.

As we implemented the dbt Semantic Layer in many projects, we saw the huge potential (in autonomy, flexibility, and scalability in analytics), and we were very hopeful about a quick adoption in the market.

Yet, when I talked to data leaders, even if they agreed that the centralized source of truth for metrics is very powerful, the concept of the Semantic Layer was unknown to most of them. We are at the Early Adopters phase in the S-curve. However, for the Semantic Layer to gain broader adoption, two things need to happen:

  • (A) Compatibility with first-class BIs: The most common feedback is, “I like the idea of the semantic layer, but I will not invest in it until it works seamlessly with our existing BI” — More on BIs in point 4 below.
  • (B) Make the Semantic Layer easier to work with: There is too much heavy lifting (YML files, branching, CI/CD…). We need to have greater agility when working on the semantic layer (editing metrics, adding dimensions, etc.) without compromising on control.

4. BI platforms: Established players remain dominant

In my discussions, about 6 out of 10 companies using a BI tool are on Power BI. About 3 out of 10 use other major BIs (e.g., Tableau or Looker), and the remaining 10% use lower-end or modern platforms such as Data Studio or Metabase.

Even if new entrants in the market offer flashy features, customers have invested so much in creating the assets on the BI (charts, dashboards,…) and training their colleagues to adopt the tool, this makes the switching cost incredibly high. The only driver to switch is cutting costs — more on that in point 5 below.

5. Focus on profitability

2023 has been a difficult year for fundraising, especially in the growth stage. This has led to tighter budgets, especially in data initiatives which are a second-order need for the business.

When I speak with decision-makers, conversations often echo sentiments like, “This is not a priority for us” or “We need to bring the costs of our data infrastructure down”. They don’t need vitamins, they need painkillers, understandably so.

The macroeconomic indicators projected for 2024, such as interest rates staying almost flat, don’t suggest significant optimism for a substantial upturn in the venture market. Consequently, I expect continued focus on unit economics and profitability, which means a strong focus on ROI for everything related to data. Will this scenario lead to consolidation in the market, as many people say? Maybe, but I hold a nuanced view on this — More on that in point 6. In either case, vendors and service providers need to be ready to demonstrate tangible and high value to businesses.

Interestingly though, the current climate may have a counter-effect. Startups that want to raise have to show strong metrics, maintain meticulous reporting, and illustrate their unit economics with great clarity and accuracy. When I chat with partners at VC firms, a recurring theme starts to emerge: “We need our portfolio companies to be super diligent and effective in their reporting”. In this situation, data becomes a painkiller…

6. Consolidation of the Data Stack

As Jim Barksdale famously said, “There are only two ways to make money in business: One is to bundle; the other is unbundle.”

The question in the data industry is: where do greater opportunities lie? Is it in building “all in one” data platforms (as Microsoft is trying to do with Fabric) or in fostering “best of breed” yet narrowly focused SaaS solutions? Many data leaders have offered insightful perspectives on this debate. I recommend this great article by David Jayatillake.

From my conversations throughout 2023 with startups and small organizations, I see a preference for more integrated, bundled solutions. This trend resonates with the economic realities discussed in point 5 — bear markets and budget constraints.

While I am certainly not qualified to make predictions on the market dynamics, my opinion is that the modern data stack needs to be simplified, especially for small organizations. If you ask me, I would advocate for a leaner data stack like the following:

  1. Airbyte or Fivetran for data integration.
  2. Snowflake or BigQuery as the Data Warehouse.
  3. dbt Cloud for development, pipeline orchestration, CI/CD, documentation, and observability.
  4. BI tool of your choice.

You don’t need more than that.

7. Increased focus on data governance

Earlier in 2023, GitHub CEO said that “AI is lowering the cost of building great software”. I foresee this impact in the data space as well. The cost of creating data assets (metrics, charts, dashboards,…) will be decreasing rapidly with Gen AI and there will be an increasing number of data artifacts within organizations. This is undoubtedly beneficial, but it’s not without its challenges.

Imagine the following scenario: More people creating new metrics (some of them already existed), creating dashboards (that are rarely consumed),… This situation may lead to a certain degree of chaos in the organization. So, what’s the solution? Like it or not, it seems inevitable that governance will become necessary. This idea might seem contrarian with the ethos of agile, ‘move fast and break things’ startups. However, in my conversations in 2023, there is a growing call for this need, “We need to bring more control and better collaboration in our data analytics”.

Yet, we need to approach data governance more efficiently, without burdening startups with costly subscriptions for specialized tools. The solution, I believe, lies in harnessing the very technology that’s driving this problem — AI itself.

Looking ahead to 2024, it’s clear we’re all in for a challenging year in the data industry, especially for startups. We’ll need to be flexible, think creatively, and keep pushing forward. I’m really interested to see how we all, from small startups to established companies, tackle these challenges and keep growing in the data world.