- Data Lake for Enterprises
- Tomcy John Pankaj Misra
- 398字
- 2021-07-02 22:46:54
Data stores or persistent stores (RDBMS or NoSQL)
This data, whether on premises (enterprise infrastructure) or in cloud, is stored as structured data in the so-called traditional RDBMS or new generation NoSQL persistent stores. This data comes into these stores through business applications, and most of the data is scattered in nature, and enterprises can easily find a sense of each and every data captured without much trouble. The main issue when data is stored in a traditional RDBMS kind of store is when the amount of data grows beyond an acceptable state. In that situation, the amount of analysis that we can make of the data takes a good amount of effort and time. Because of this, enterprises force themselves to segregate this data into production (data that can be queried and made use of by the business application) and non-production (data that is old and not in the production system, rather moved to a different storage).
Because of this segregation, analysis usually spans a few years and doesn't give enterprises a large span of how the business was dealing with certain business parameters. Say for example, if the production has five years of sales data, and 15 years of sales data is in the non-production storage, the users, when dealing with sales data analysis, just have a view of the last five years of data. There might be trends that are changing every five years, and this can only be known when we do an analysis of 20 years of sales data. Most of the time, because of RDBMS, storing and analyzing huge data is not possible. Even if this is possible, it is time consuming and doesn't give a great deal of flexibility, which an analyst looks for. This renders to the analyst a certain restricted analysis, which can be a big problem if the enterprise is looking into this data for business process tweaks.
The so-called new generation NoSQL (different databases in this space have different capabilities) gives more flexibility on analysis and the amount of data storage. It also gives the kind of performance and other aspects that analysts look for, but it still lacks certain aspects.
Even though the data is stored in an individual business application, it doesn't have a single view from various business application data, and that is what implementing a proper Data lake would bring into the enterprise.