What is a Data Warehouse?

Unlock how data warehouses centralize and streamline business data for smarter analysis and decision-making. Learn more

Ceyhun Enki Aksan
Ceyhun Enki Aksan Entrepreneur, Maker

The data management process is quite comprehensive, encompassing many related systems and layers. However, the scope of involvement in each area may vary depending on the brands and projects you’re working with. Data warehouse holds a significant position and concept within this process.

My experiences in the data warehouse context originated from a short-term project involving Segment DataWarehouse. However, since this was a short-term process, the knowledge I gained remained confined to a specific context. The integration of Google Analytics 4 with BigQuery provided a valuable opportunity in this regard. As I continue to gain more knowledge and experiences related to data warehouses and other related concepts, I will share them accordingly. In this article, I will first present my notes on data warehouses.

The primary objective of a data professional can be summarized as performing queries and deriving insights using data transferred through a created data flow (data pipeline), either structured or unstructured. Additionally, you may refer to my article on ETL (Extract, Transform, Load) for further details.

Data worker (data worker) can be defined as the individual responsible for collecting, storing, managing, and analyzing data, either as a primary activity or as part of a broader set of activities. 1

Of course, this holistic process includes numerous internal processes and layers. Let’s continue using the flow metaphor. Similar to a river, data from one or more sources must be temporarily stored in a designated area—such as a lake or ocean—for processing. In short-term stages, data fragments are used for specific purposes. However, if the data is not consolidated in a fixed location, changes in its value (positive or negative) will be lost.

Important Rules:

  1. Maintain the original formatting (markdown, HTML tags, links, etc.)
  2. Keep technical terms and proper nouns as appropriate
  3. Preserve code blocks and technical syntax exactly
  4. Maintain the same tone and style
  5. Only output the translated text, no explanations or comments

The field of data flow, whether short-term or long-term, has various definitions within the context of technology and/or functionality; for example, database, data table, data lake, etc.

Since I intend this article to serve as the foundation for my next post on BigQuery, I will not expand on the topic extensively and will directly discuss the data lake. I will continue addressing other topics related to data management in separate posts.

Data Lake

A data lake (data warehouse), in simple terms, is a centralized storage solution that aggregates data from various sources (internal and external), emerged as a means to reduce the load on the database by enabling business intelligence (BI) activities. It consists of a collection of sub-datasets. Unlike the database structure used in non-analytical applications, a data lake provides the capability to perform fast, large-scale queries and analyses (select operations) involving millions of rows 2 3. The concept of a data lake was first introduced by computer scientist Bill Inmon 4 5.

Thanks to the data lake, data extraction (data extraction), cleansing, transformation, and loading/refreshing processes can be efficiently carried out. However, treating this concept as a final, end-all solution would be incorrect. Other concepts, such as the data lake, have emerged in response to various challenges and should also be considered within the process. To enable updates to data stored in a data lake, ETL / ELT processes must be implemented.

To illustrate a typical process visually:

[mermaid] flowchart TD 1[Operational system] 2[Flat files] 3[APIs] 4[Zapier] 5[CRM] 6[Website] 7[Marketing] 8[Sales] 9[ERP] 10[Apps] 11[Other…] 12((ETL)) 13[Data Warehouse] 14[OLAP] 15[Reporting] 16[Data Mining] 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 —> 12 12 ==> 13 13 —> 14 & 15 & 16 subgraph s1 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 end subgraph s2 14 & 15 & 16 end [/mermaid]

A data warehouse can be described structurally as a data mart, and also as the aggregate of subsets (models) specific to a particular business function (e.g., marketing, sales) or team.

Data mart (data mart) is a subset of data stored within a data warehouse, representing data specific to a single business unit within an organization.

Data warehouses are generally used for the following two purposes:

Accumulation of strategic data : Data is archived for future reuse. A data warehouse is a repository where data can be queried and analyzed.

Capturing business opportunities : It helps identify new opportunities in the market, contributes to competitive advantage, enables efficient project transitions, reduces costs related to operations, inventory, and product expenses, and links data related to different business functions, thereby supporting decision-making and enabling rapid response to information.

A typical data warehouse generally includes the following components:

  • A relational database system designed for data storage and management6
  • An ETL/ELT solution for preparing data for analysis
  • Capabilities for analysis, reporting, and data mining
  • More complex analytical applications that leverage data science and artificial intelligence (AI) algorithms to generate actionable insights7

In general, data warehouses have a three-tiered architecture composed of the following layers:

Lower layer : Typically a relational database system that collects data from various sources and performs extraction, transformation, and loading (ETL) operations.

Middle layer
: This layer provides fast query operations. Three types of OLAP models—ROLAP, MOLAP, and HOLAP—are available in this layer8. The type of OLAP model used depends on the type of existing database system.

Upper layer
: This layer includes a user interface or reporting tool that enables end users to perform operations on their business data.

Cloud Data Warehouse

Data warehouses typically involve a costly process and are usually hosted on on-premises servers. Although on-premises data warehouses offer many advantages (such as security and speed), they are generally less flexible and cannot easily scale to meet future needs9. Cloud data warehouses are also solutions that emerged to collect and store data from various data sources, but they leverage cloud technologies to perform these operations10.

Compared to on-premises data warehouses, cloud data warehouses offer the following advantages11:

  • Flexibility
  • Easy scalability in response to IT or storage requirements
  • Ease of management and usage
  • Cost savings

Leading Cloud Data Warehouse Solutions

The following are leading data warehouse solutions12 13. Among the options, Google BigQuery offers a convenient way to monitor both personal and enterprise-level data, thanks to its integration with Google Analytics 4. However, as time progresses, I will endeavor to gather and share information regarding the other solutions as well.

  • Amazon Redshift
  • Google BigQuery14
  • IBM Db2 Warehouse
  • Microsoft Azure SQL Data Warehouse
  • Oracle Autonomous Data Warehouse
  • SAP Data Warehouse Cloud
  • Snowflake

Although not listed above (due to differences in focus areas), Segment, a web analytics solution, also presents significant options15.

*[DWH]: Data Warehouse

What Are The Benefits Of Cloud Data Warehousing?](https://medium.com/smb-lite/what-are-the-benefits-of-cloud-data-warehousing-a7322947a479?target=_blank&rel=noopener,noreferrer)

Footnotes

  1. European Data Market SMART 2013/0063 Final Report and Final Results of the European Data Market Study measuring the size and trends of the EU data economy
  2. IBM Cloud Education. (2020). Data Warehouse
  3. Jacqueline Biscobing. (2019). Data Warehouse
  4. Bill Inmon. Wikipedia
  5. Data Warehouse. DataBrick
  6. What is a Relational Database? Oracle Türkiye
  7. What is Data Science? Oracle Türkiye
  8. What is OLAP? Sisense
  9. Best Software to Build a Data Warehouse in the Cloud: Features, Benefits, Costs. ScienceSoft
  10. Cloud Data Warehouse vs Traditional Data Warehouse Concepts. Panoply
  11. [SeattleDataGuy. (2020).
  12. 5 steps for choosing a cloud data warehouse. Stitch
  13. Sean Michael Kerner. (2019). Top 8 Cloud Data Warehouses
  14. What is a data warehouse? Google Cloud
  15. Segment DataWarehouse