What Should I Do If Running a Checkpoint Is Slow When RocksDBStateBackend is Set for the Checkpoint and a Large Amount of Data Exists?

Issue

What should I do if running a checkpoint is slow when RocksDBStateBackend is set for the checkpoint and a large amount of data exists?

Possible Causes

Customized windows are used and the window state is ListState. There are many values under the same key. The merge() operation of RocksDB is used every time when a new value is added. When calculation is triggered, all values under the key are read.

The RocksDB mode is merge()->merge()....->merge()->read(), which is time-consuming during data reading, as shown in Figure 1.
When a source operator sends a large amount of data in an instant, the key values of all data are the same, which slows down window operator processing. As a result, the barriers are accumulated in the buffer and the completion of snapshot creation is delayed. The window operator fails to report a snapshot creation success to CheckpointCoordinator on time so that CheckpointCoordinator considers that the snapshot fails to be created. Figure 2 shows a data flow.
Figure 1 Time monitoring information

Figure 2 Data flow

Answer

Flink introduces the third-party software package RocksDB, whose defect causes the problem. You are advised to set checkpoint to FsStateBackend.

The following provides an example to show how to set checkpoint to FsStateBackend in the application code. The following provides an example:

 env.setStateBackend(new FsStateBackend("hdfs://hacluster/flink-checkpoint/checkpoint/"));

Parent topic: FAQs

Previous topic: Savepoints FAQs

Next topic: What Should I Do If yarn-session Failed to Be Started When blob.storage.directory Is Set to /home?

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel

For any further questions, feel free to contact us through the chatbot.

Chatbot