Updated on 2022-06-01 GMT+08:00

HBase File Storage Configuration

Prerequisites

The cluster version is earlier than MRS 3.x.

Scenario

HBase FileStream (HFS) is an independent HBase file storage module. It is used in MRS upper-layer applications by encapsulating HBase and HDFS interfaces to provide these upper-layer applications with functions such as file storage, read, and deletion.

In the Hadoop ecosystem, both HDFS and HBase face tough problems in massive file storage in some scenarios:

  • If a large number of small files are stored in HDFS, the NameNode will be under great pressure.
  • Some large files cannot be directly stored on HBase due to HBase APIs and internal mechanisms.

HFS is developed for the mixed storage of massive small files and some large files in Hadoop. In a word, both massive amounts of small files (smaller than 10 MB) and some large files (greater than 10 MB) need to be stored in HBase tables.

For such a scenario, HFS provides unified operation APIs similar to HBase function APIs. You must add org.apache.hadoop.hbase.filestream.coprocessor.FileStreamMasterObserver to the hbase.coprocessor.master.classes HBase configuration parameter.

  • If only small files are stored, HBase original APIs are recommended.
  • HFS APIs need to perform operations on both HBase and HDFS at the same time. Therefore, client users must have operation permissions of both components.
  • When directly storing large files in HDFS, HFS will add some metadata information. Therefore, the stored files are not the original ones. When you use these files, use HFS APIs to read them instead of directly moving them out of HDFS.
  • Backup and disaster recovery are not supported for data stored in HDFS by using HFS APIs.

Procedure

  1. Log in to MRS Manager.
  2. Choose Service > HBase > Service Configuration, and set Type to All. Choose HMaster > System on the left.
  3. In the hbase.coprocessor.master.classes configuration item, add org.apache.hadoop.hbase.filestream.coprocessor.FileStreamMasterObserver.
  4. Click Save Configuration. In the window that is displayed, select Restart the affected services or instances and click Yes to restart the HBase service.