Help Center > > User Guide> Managing Active Clusters> Managing Data Files

Managing Data Files

Updated at: Dec 31, 2019 GMT+08:00
  • When the cluster version is MRS 2.0.1 or earlier than MRS 1.8.7 and Kerberos authentication is disabled, you can create and delete folders, as well as import, export, and delete files on the Files tab page. Currently, file creation is not supported. When Kerberos authentication is enabled, files cannot be managed on the GUI and the Files tab page is not displayed.
  • When the cluster version is MRS 1.8.7 or later than 2.0.1, no matter whether Kerberos authentication is disabled or not, you can create and delete folders, as well as import, export, and delete files on the Files tab page. Currently, file creation is not supported. In a cluster with Kerberos authentication enabled, the permissions on folders in the root directory are restricted. To read and write these folders, add a role that has permissions on the folders by referring to Creating a Role. Then, change the user group to which the user who submits the job belongs and add the new role to the user group by referring to Related Tasks.

Background

Data to be processed by MRS is stored in either OBS or HDFS. OBS provides you with massive, highly reliable, and secure data storage capabilities at a low cost. You can view, manage, and use data through OBS Console or OBS Browser. In addition, you can use the REST APIs to manage or access data. The REST APIs can be used alone or it can be integrated with service programs.

Before creating jobs, upload the local data to OBS for computing and analysis. MRS allows data to be exported from OBS to HDFS for computing and analysis. After the analysis and computing are complete, you can either store the data in HDFS or export it to OBS. HDFS and OBS can store compressed data in the format of bz2 or gz.

Importing Data

MRS supports data import from the OBS system to HDFS. This function is recommended if the data size is small, because the upload speed reduces as the file size increases.

Both files and folders containing files can be imported. The operations are as follows:

  1. Log in to the MRS management console.
  2. Choose Clusters > Active Clusters, select a cluster, and click its name to switch to the cluster details page.
  3. Click Files to go to the Files tab page.
  4. Select HDFS File List.
  5. Click the data storage directory, for example, bd_app1.

    bd_app1 is just an example. The storage directory can be any directory on the page. You can create a directory by clicking Create Folder.

    The name of the created directory must meet the following requirements:

    • Contains a maximum of 255 characters.
    • Cannot be empty.
    • Cannot contain special characters (/:*?"<|>\;&,'$).
    • Cannot start or end with a period (.).
  6. Click Import Data and configure the paths for HDFS and OBS.

    When configuring the OBS or HDFS path, click Browse, select the file path, and click OK.

    • The path for OBS
      • It must start with s3a://.
      • Files and programs encrypted by the KMS cannot be imported.
      • Empty folders cannot be imported.
      • Directories and file names can contain letters, Chinese characters, digits, hyphens (-), or underscores (_), but cannot contain special characters (;|&><'$*?\).
      • Directories and file names cannot start or end with spaces, but can have spaces between other characters.
      • The full path of OBS contains a maximum of 255 characters.
    • The path for HDFS
      • It starts with /user by default.
      • Directories and file names can contain letters, Chinese characters, digits, hyphens (-), or underscores (_), but cannot contain special characters (;|&><'$*?\).
      • Directories and file names cannot start or end with spaces, but can have spaces between other characters.
      • The full path of HDFS contains a maximum of 255 characters.
  7. Click OK.

    View the upload progress in File Operation Records. The data import operation is operated as a Distcp job by MRS. You can check whether the Distcp job is successfully executed on the Jobs tab page.

Exporting Data

After data is processed and analyzed, you can either store the data in HDFS or export it to the OBS system.

Both files and folders containing files can be exported. The operations are as follows:

  1. Log in to the MRS management console.
  2. Choose Clusters > Active Clusters, select a cluster, and click its name to switch to the cluster details page.
  3. Click Files to go to the Files tab page.
  4. Select HDFS File List.
  5. Click the data storage directory, for example, bd_app1.
  6. Click Export Data and configure the paths for HDFS and OBS.

    When configuring the OBS or HDFS path, click Browse, select the file path, and click OK.

    • The path for OBS
      • It must start with s3a://.
      • Directories and file names can contain letters, Chinese characters, digits, hyphens (-), or underscores (_), but cannot contain special characters (;|&><'$*?\).
      • Directories and file names cannot start or end with spaces, but can have spaces between other characters.
      • The full path of OBS contains a maximum of 255 characters.
    • The path for HDFS
      • It starts with /user by default.
      • Directories and file names can contain letters, Chinese characters, digits, hyphens (-), or underscores (_), but cannot contain special characters (;|&><'$*?\).
      • Directories and file names cannot start or end with spaces, but can have spaces between other characters.
      • The full path of HDFS contains a maximum of 255 characters.

    Ensure that the exported folder is not empty. If an empty folder is exported to the OBS system, the folder is exported as a file. After the folder is exported, its name is changed, for example, from test to test-$folder$, and its type is file.

  7. Click OK.

    View the upload progress in File Operation Records. The data export operation is operated as a Distcp job by MRS. You can check whether the Distcp job is successfully executed in Jobs.

Viewing File Operation Records

When importing or exporting data on the MRS management console, you can choose Files > File Operation Records to view the import or export progress.

Table 1 lists the parameters in file operation records.

Table 1 Parameters in file operation records

Parameter

Description

Created

Time when data import or export is started

Source Path

Source path of data

  • In data import, Source Path is the OBS path.
  • In data export, Source Path is the HDFS path.

Target Path

Target path of data

  • In data import, Target Path is the HDFS path.
  • In data export, Target Path is the OBS path.

Status

Status of the data import or export operation
  • Running
  • Completed
  • Terminated
  • Abnormal

Duration (min)

Total time used by data import or export

Unit: minute

Result

Data import or export result

  • Successful
  • Failed

Operation

View Log: You can click View Log to view log information of a job. For details, see Viewing Job Configurations and Logs.

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel