- Data Lake for Enterprises
- Tomcy John Pankaj Misra
- 145字
- 2021-07-02 22:47:07
Integration readiness
Many of these distributions provide specific capabilities for integration with other data systems, both inbound as well as outbound. This, at times, becomes a crucial factor for selection of a particular Hadoop distribution:
Figure 02: Hadoop distributions and their features
While we see Apache Hadoop as a great open source framework for big data processing, it may be critical for enterprises to get professional services around these capabilities. For that purpose, every enterprise should evaluate these Hadoop distributions for fitment into their organizations, since every organization would have its own processes, standards, monitoring, and alerting expectations, and above all, its own skill set. However, for the purpose of this book, we prefer to keep the understanding to be very neutral around building a Data Lake with a Lambda Architecture; hence we will resort to open source Apache Hadoop for all examples and samples.