封面
版权信息
Credits
Foreword
About the Authors
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface
Chapter 1. The Big Data Science Ecosystem
Introducing the Big Data ecosystem
Overall architecture
Data technologies
Companion tools
Summary
Chapter 2. Data Acquisition
Data pipelines
Content registry
Quality assurance
Summary
Chapter 3. Input Formats and Schema
A structured life is a good life
GDELT dimensional modeling
Loading your data
Avro
Parquet
Summary
Chapter 4. Exploratory Data Analysis
The problem principles and planning
Preparation
Exploring GDELT
Summary
Chapter 5. Spark for Geographic Analysis
GDELT and oil
Formulating a plan of action
GeoMesa
Gauging oil prices
Summary
Chapter 6. Scraping Link-Based External Data
Building a web scale news scanner
Named entity recognition
GIS lookup
Names de-duplication
News index dashboard
Summary
Chapter 7. Building Communities
Building a graph of persons
Using the Accumulo database
Community detection algorithm
GDELT dataset
Summary
Chapter 8. Building a Recommendation System
Different approaches
Uninformed data
Building a song analyzer
Building a recommender
Summary
Chapter 9. News Dictionary and Real-Time Tagging System
The mechanical Turk
Designing a Spark Streaming application
Consuming data streams
Processing Twitter data
Fetching HTML content
Using Elasticsearch as a caching layer
Classifying data
Our Twitter mechanical Turk
Summary
Chapter 10. Story De-duplication and Mutation
Detecting near duplicates
Building stories
Story mutation
Summary
Chapter 11. Anomaly Detection on Sentiment Analysis
Following the US elections on Twitter
Analysing sentiment
Using Timely as a time series database
Twitter and the Godwin point
A Small Step into sarcasm detection
Summary
Chapter 12. TrendCalculus
Studying trends
The TrendCalculus algorithm
Practical applications
Summary
Chapter 13. Secure Data
Data security
Authentication and authorization
Access
Encryption
Data disposal
Kerberos authentication
Security ecosystem
Your Secure Responsibility
Summary
Chapter 14. Scalable Algorithms
General principles
Spark architecture
Challenges
Plotting your course
Design patterns and techniques
Summary
更新时间:2021-07-09 18:49:53