Data services with data indexes

The data indexes are used for quick searches for data and are generally used by data services in the data access layer. These can optionally also be exposed as REST/SOAP endpoints. This indexing layer is generally based on Lucene based indexing engines and are very fast when it comes to searches that need to reflect the changes in near-real-time. The indexes could optionally also serve the complete data and in certain time critical use cases that is helpful as well. It needs to be ensured that these indexes are built to support performance and scalability since these handle more of the real time service load.

The most common framework to build data services is Spring Boot, closely followed by Dropwizard. All of these frameworks support JAX-RS 2.0 specifications and integrate well with service definition tools such as Swagger, providing a well rounded capability for building and publishing REST services in general.

In order to maintain the data consistency of the Data Lake, it is important to consider that these services are all read-only services, since their primary role is to deliver data, and should not ideally expose endpoints to change the data, since the data should only be altered by data processing cycles in a Lambda Architecture as discussed earlier.