- Mastering Spark for Data Science
- Andrew Morgan Antoine Amend David George Matthew Hallett
- 86字
- 2025-04-04 19:38:10
Summary
In this chapter, we walked through the full setup of an Apache NiFi GDELT ingest pipeline, complete with metadata forks and a brief introduction to visualizing the resulting data. This section is particularly important as GDELT is used extensively throughout the book and the NiFi method is a highly effective way to source data in a scalable and modular way.
In the next chapter, we will get to grips with what to do with the data once it's landed, by looking at schemas and formats.