Why you should not develop a Data Warehouse yourself

  • 24/10/2022
Quite often we hear that an organization chooses to build a Data Warehouse itself. In this blogpost we explain why that isn't the right strategy.

For the past 10 years we have been following the developments in these companies and very regularly, after a few months or years, we are asked to help and think about Data Warehouse (Automation). In this article we share the lessons we have learned over the past few years.

To bring this subject right to the front, I would like to ask you to think about whether you have ever considered developing your own CRM system or accounting program. Maybe a decade ago, but by now everyone is convinced of the benefits of standardization and using tools that are available. Developing these tools yourself is time-consuming, cost-intensive, you need scarce developers and you will probably encounter a lot of complexity during the process. Now back to the subject, why would an organization want to develop its own Data Warehouse?

What is a Data Warehouse?

A brief introduction if you are less familiar with Data Warehousing. A data warehouse is a data management system designed to enable and support business intelligence (BI) activities, particularly analytics. Data warehouses are systems designed to perform searches and analysis and often contain large amounts of historical data. The data in a data warehouse is usually extracted from a variety of sources, such as application logs and transactional applications.

Data Warehouse versus Data Lake

Perhaps when you think of a "data management system" you think of a Data Lake. Hence, we will briefly explain the differences between a Data Lake and a Data Warehouse. Both terms are widely used for storing big data, but they are not interchangeable terms.

A Data Lake is a (huge) collection of raw data whose purpose has not yet been defined. A Data Warehouse is a repository for structured, filtered data that has already been processed for a specific purpose. An example to make the difference concrete is that a Data Lake is like a stock of products in a building. Data Warehouse is the same building with the same products but sorted, tagged and stored.

This is how a Data Warehouse takes your organization forward

Easier to understand and retrieve data - simplified single model. No more duplicate tables, confusing column names or mysterious values. Faster for the data team to use. Less time needed to clean up and transform data to perform analysis.

Friendlier to work with for the 'business users'. Complex connections are reduced and the right column is clear. Reliable, consistent source of answers. Everyone generates insights from the same data; no more varying answers to the same question.

Maintainable with less time and effort. After you adopt naming conventions and a style guide, you can maintain them as you add data. Separated from the transactional data schema. Queries do not affect app performance and are not affected by rapid data changes.

Why not develop a Data Warehouse yourself

I hope that after reading the above information you are convinced of the added value of a Data Warehouse. The organizations that indicate that they want to set up their own Data Warehouse often do so for the following reasons: 1) To be in control and/or 2) Building it themselves seems cheaper.

And there is certainly value in both arguments. It is good to be in control, especially when it comes to data. And if you have someone sitting around, it does seem cheaper to take it on yourself. But it is still good to run through the following scenarios. These are the results of what we see when companies do it themselves.

You don't know what you don't know.

If you've never done a BI implementation before, it's hard to imagine what you could potentially run into. If you decide to do everything yourself it means that you are also responsible for hosting, integrations, performance, version and release management, etc. Do you understand all these things or do you put everything in the hands of one person?

Data without insights is nothing

A data warehouse really provides no value to "the business" unless the data is accessible and the decision makers can get the dashboards, reports, KPIs, alerts and insights they need to do their jobs. But those who built the data warehouse are usually not UI experts or business experts.

That means IT has to look for visualization tools to plug into their data warehouse. This again means additional costs because without visualization, you basically have nothing.

Systems will change and software upgrades will be needed

Critical business applications such as ERP and CRM may change their APIs, system upgrades follow each other faster and faster, and you may discover that the person who built your data warehouse has been promoted or left the company for another job. Corrections may be harder than expected to achieve. Or maybe the database that stores your data needs a new version to make it security compliant. Or maybe the volume of your data has grown so large that backups take too long or even fail. We see customers who have built their own data warehouse still looking for ETL tools, a DBA and a few other expensive resources and people to keep the data engine alive.

Subscribe to our newsletter