书名：Data Lake for Enterprises
作者名：Tomcy John Pankaj Misra
本章字数：338字
更新时间：2025-04-04 19:11:42

Sqoop 2 architecture

The workings of Sqoop 2 are very much in line with Sqoop 1. However, Sqoop 2 brings in more user-friendly and easy-to-use features by taking difficult parts of Sqoop 1 away from the user. It brings in a new web browser based tool along with the client (this is the only option in Sqoop 1) and also helps the user install Sqoop once on a machine, giving provision for the user to access it from multiple places. It also gives a good amount of RESTful API’s (more details can be found in the Apache Sqoop documentation at https://sqoop.apache.org/docs/1.99.5/RESTAPI.html), which aids in many of the integrations that Sqoop needs to support for effective use in the context of a data lake.

The following figure (Figure 05) shows the detailed architecture of how Sqoop 2 works as compared to Sqoop 1. To bring in comparison between Sqoop 1, additional parts brought in by Sqoop 2 are shaded. The architecture figure is referred from the Sqoop documentation (https://sqoop.apache.org/docs/1.99.5/) and changed to decipher the context of this book.

Figure 05: Conceptual architecture of Sqoop 2

As shown in Figure 05, the shaded sections are new in Sqoop 2. Sqoop 2 has introduced a server component and has also given a new client in the form of a browser, using which users can now interact with Sqoop and this interface shields the user from clunky commands and hides the complexity behind the browser interface. Due to the server component, users can now interact with Sqoop from other machines as well, as opposed to Sqoop 1. There are a number of components inside Sqoop the server component, enabling this new set of features, and these are shown in the above figure. Also, with that, a new block in the server, namely Metadata, which stores so-called data for data so that it is quite easy for the user, takes away much of the commonly repeated stuff, and allows us to use this data stored in the repository.