Help Center > > User Guide> Managing Active Clusters> Managing Jobs> Introduction to Jobs

Introduction to Jobs

Updated at: Nov 06, 2019 GMT+08:00

A job is a program execution platform provided by MRS for users to process and analyze user data. After a job is created, all job information is displayed on the Jobs tab page. You can view a list of all jobs and create and manage jobs. If the Jobs tab is not displayed on the cluster details page, submit a job in the background.

Data sources processed by MRS are from OBS or HDFS. OBS is an object-based storage service that provides customers with massive, secure, reliable, and cost-effective data storage capabilities. MRS can process data in OBS directly. You can view, manage, and use data by using the web page of the management console or OBS Client. In addition, you can use REST APIs independently or integrate APIs to service applications to manage and access data.

Before creating a job, you need to upload local data to OBS so that MRS can use data in OBS for computing and analysis. You can also import data from OBS to HDFS for computing and analysis. After data is processed and analyzed, you can store the data in HDFS or export the data from the cluster to OBS. Both HDFS and OBS can also store compressed data in bz2 and gz formats.

Job Types

MRS clusters enable you to create and manage the following types of jobs. If a cluster in the Running state fails to create a job, check the health status of related components on the cluster management page. For details, see Viewing and Customizing Cluster Monitoring Metrics.

  • MapReduce: a distributed data processing framework that implements rapid, parallel processing of massive amounts of data MRS supports the submission of MapReduce JAR programs.
  • Spark: a distributed in-memory computing framework. MRS supports SparkSubmit, Spark Script, and Spark SQL jobs.
    • SparkSubmit: submits Spark JAR and Python programs, executes Spark applications, and computes and processes user data.
    • Spark Script: submits the Spark Script scripts and batch executes Spark SQL statements.
    • Spark SQL: uses Spark SQL statements (similar to SQL statements) to query and analyze user data in real time.
  • Hive: an open source data warehouse based on Hadoop MRS allows you to submit Hive Script scripts and execute Hive SQL statements.
  • Flink: a distributed big data processing engine that can perform stateful computations over both finite and infinite data streams.

Job List

Jobs are listed in chronological order by default in the job list, with the most recent jobs displayed at the top. Table 1 describes parameters of the job list.

Table 1 Parameters of the job list

Parameter

Description

Name

Job name

This parameter is set when a job is added.

ID

Unique identifier of a job

This parameter is automatically assigned when a job is added.

User Name

Name of the user who submits a job

Type

Job type

Possible types include:

  • Distcp (data import and export)
  • MapReduce
  • Spark
  • SparkSubmit
  • SparkScript
  • SparkSQL
  • HiveSQL
  • HiveScript
  • Flink
    NOTE:
    • After importing and exporting files on the Files tab page, you can view the DistCp job on the Jobs tab page.
    • Spark, Hive, and Flink jobs can be added only when the Spark, Hive, and Flink components are selected during cluster creation and the cluster is running.

Status

Job status

  • Accepted
  • Running
  • Completed
  • Terminated
  • Abnormal

Result

Execution result of a job

  • Undefined: job that is being executed
  • Successful: job that has been successfully executed
  • Terminated: job that is manually terminated during execution
  • Failed: job that fails to be executed
NOTE:

You cannot execute a successful or failed job, but can add or copy the job. After setting job parameters, you can submit the job again.

Submit Time

Time when a job is submitted

Duration (min)

Duration of executing a job, specifically from the time when a job is started to the time when the job is completed or stopped.

Unit: minute

Operation

  • View Log: You can click View Log to view job details. For details, see Viewing Job Configurations and Logs.
  • View Details: You can click View Details to view job configuration details. For details, see Viewing Job Configurations and Logs.
  • More
    • Stop: You can click Stop to stop a running job. For details, see Stopping Jobs.
    • Copy: You can click Copy to copy and add a job. For details, see Copying Jobs.
    • Delete: You can click Delete to delete a job. For details, see Deleting Jobs.
    • View Result: Click View Result to view the execution results of SparkSql and SparkScript jobs whose status is Completed and result is Successful.
    NOTE:
    • Spark SQL jobs cannot be stopped.
    • Deleted jobs cannot be recovered. Therefore, exercise caution when deleting a job.
    • If you configure the system to save job logs to an HDFS or OBS path, the system compresses the logs and saves them to the specified path after job execution is complete. In this case, the job remains in the Running state after execution is complete and changes to the Completed state after the logs are successfully saved. The time required for saving the logs depends on the log size. The process generally takes a few minutes.
Table 2 Button description

Button

Description

Select a time range for job submission to filter jobs submitted in the time range.

In the drop-down list, select a job state to filter jobs.

  • All (Num): displays all jobs.
  • Completed (Num): displays jobs in the Completed state.
  • Running (Num): displays jobs in the Running state.
  • Terminated (Num): displays jobs in the Terminated state.
  • Abnormal (Num): displays jobs in the Abnormal state.

Select a job type from the drop-down list to filter jobs of the type.

  • MapReduce
  • HiveScript
  • Distcp
  • SparkScript
  • SparkSql
  • SparkSubmit
  • Flink
  • HiveSql

Enter a job name in the search bar and click to search for a job.

Click to manually refresh the job list.

Job Execution Permission Description

For a security cluster with Kerberos authentication enabled, a user needs to synchronize an IAM user before submitting a job on the MRS GUI. After the synchronization is complete, the MRS system generates a user with the same IAM username. Whether a user synchronized with IAM has permission to submit a job depends on IAM policies bound to the user during IAM synchronization. For details about the job submission policies, see Table1 Permission policy synchronization comparison in Synchronizing IAM Users to MRS.

When a user submits a job that involves the resource usage of a specific component, such as accessing HDFS directories and Hive tables, user admin (MRS Manager administrator) must grant the relevant permission to the user. Perform the following operations:

  1. Log in to MRS Manager as user admin.
  2. Add the role of the component whose permission is required by the user. For details, see Creating a Role.
  3. Change the user group to which the user who submits the job belongs and add the new component role to the user group. For details, see Related Tasks.

    After the component role bound to the user group to which the user belongs is modified, it takes some time for the role permissions to take effect.

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel