Approaches to building a Data Lake

Different organizations would prefer to build the data lake in different ways, depending on where the organisation is in terms of the business, processes, and systems.

A simple data lake may be as good as defining a central data source, and all systems may use this central data source for all the data needs. Though this approach may be simple and look very lucrative, it may not be a very practical way for the following reasons:

  • This approach would be feasible only if the organizations are building their information systems from scratch
  • This approach does not solve the problems of existing systems
  • Even if organization decides to build the data lake with this approach, there is a lack of clarity of responsibility and separation of concerns
  • Such systems often try to do everything in a single shot, but eventually lose out with increasing demand of data transactions, analysis, and processing

A better way to build a data lake would be to look at the organization and its information systems as a whole, classify the data ownership, and define a unified enterprise model. This approach, while it may have process-specific challenges and may take more effort to get defined, will nonetheless provide the required flexibility, control, and clear data definition and separation of concerns between the entities of various systems in an enterprise. Such Data Lakes can also have independent mechanisms to capture, process, analyze and serve enterprise data to the consuming applications.