- Data Lake for Enterprises
- Tomcy John Pankaj Misra
- 75字
- 2021-07-02 22:47:10
Checkpointing
Spark streaming supports both metadata checkpointing as well as data checkpointing in order to provide the required fault tolerance for critical 24/7 applications. Metadata checkpointing includes configurations, DStream operations, and batches to recover the overall process, while data checkpointing includes persisting the in-flight RDDs to a reliable storage. Checkpointing can be enabled for operations that involve data transformations. However, for simple processing, where certain failure levels can be tolerated, it may not be required.