The Modern Data Estate: Data Lake vs. Data Warehouse

June 10, 2020

A modern data estate should provide multiple methods of ingesting and storing the various data that businesses generate. Data comes at us fast and in many forms. These different forms can include structured, semi-structured, and unstructured data. Let’s look further at these different types of data:

  1. Structured – traditional databases such as the transactional database for your ERP or CRM system with formal column and table definitions
  2. Semi-Structured – files such as XML or JSON that are self-describing with tags for elements and hierarchies
  3. Unstructured – images, video, audio, and other binary data

Traditional data warehouse designs have been around for many decades while the concept, or at least the term, data lake is a somewhat newer construct. Each of these has a place in your organization’s data estate.


A data warehouse is a formal design, frequently based on design guidelines established by either Ralph Kimball or Bill Inmon, that implements for formal ETL (Extract-Transform-Load) process to consume raw, structured data sets and load them into a model designed for reporting. Data warehouses are built on relational databases like Microsoft SQL Server. SQL Server is designed to store structured data into tables with traditional rows and columns but does have capability to store semi-structured data like XML and JSON.

A data lake flips the concept of ETL on its head and implements an ELT (Extract-Load-Transform) process. Ingesting data into the data lake is essentially just throwing everything you think may be valuable at some point into a large storage area regardless of data type or structure. Data lakes can store structured, semi-structured, and unstructured data. Data lakes delivered in Microsoft Azure are built on storage accounts with Data Lake Storage Gen2 enabled when creating the storage account.

The thought behind a data lake is you want to consume all the data and will sort through it at a later point while the data warehouse requires identifying the value upfront with significant investment developing the ingestion. Due to the heavy, upfront investment typically required to develop a data warehouse if it is later determined that you need data that wasn’t brought in initially there is a risk the source data is no longer available and thus, potentially gone forever.

What now?

Data lakes are a cost-effective way to store large amounts of data from many sources. Allowing data of any structure reduces cost because data is more flexible and scalable as the data does not need to fit a specific pattern. However, structured data is easier to analyze because it is cleaner and has a uniform schema to query from. By restricting data to a schema, data warehouses are very efficient for analyzing historical data for specific data decisions. Both a proper data warehouse and a data lake are critical to the future success of your organization and belong in your modern data estate.

Microsoft Dynamics 365 Pre-Built Data Warehouse, DataCONNECT

Building a data warehouse can be very expensive and time consuming to properly review your source systems, design a data model, and create the necessary ETL to process it. MCA Connect developed our DataCONNECT Data Warehouse solution for Microsoft Dynamics AX, Dynamics 365 Finance and Customer Engagement. This solution greatly accelerates the timeline for delivery of a comprehensive data warehouse solution while reducing implementation costs.

DataCONNECT can fuel organizations with fast, accurate information, giving them the ability to predict, adapt and shape operations with precision. You will be able to quickly pull validated data into forecasting models, so you can begin your planning cycles for areas of your business. If you’d like to learn more about how the DataCONNECT Data Warehouse or a data lake can help your company store big data, contact us. One of our experts will be glad to guide you in the right direction.

Author: Kirk Donahoe, Business Analytics Solution Architect

Other articles you might be interested in:
{% set pop_posts = blog_popular_posts(‘default’, 3, ‘iot-analytics’) %} {% for pop_post in pop_posts %}

{{ }}

{% endfor %}

MCA Connect