Help Center> > Getting Started> Getting Started with Hadoop
Getting Started with Hadoop

Getting Started with Hadoop

Updated at: May 16, 2019 20:57
  • MapReduce Service (MRS) provides enterprise-level big data clusters on the cloud. Tenants can fully control clusters and easily run big data components such as Hadoop, Spark, HBase, Kafka, and Storm.

  • This document describes how to use Hadoop to submit a wordcount job in normal and security clusters from scratch. A wordcount job is the most classic Hadoop job that counts words in massive amounts of text.

  • Buy a cluster. -> Prepare the Hadoop sample program and data files. -> Upload data to OBS. -> Create a job. -> View the job execution results.

Step 1: Buy a Cluster

1. Log in to the public cloud management console.
2. Choose Service List > EI Enterprise Intelligence > MapReduce Service.
The MRS management console is displayed.
3. Click Buy Cluster to switch to the Buy Cluster page.

1

MapReduce Service

Apply for a VPC.

2

Buy a Cluster

Apply for an ECS.

View Image

Step 2: Configure the Cluster

1. Select a billing mode. In this example, Billing Mode is set to Pay-per-use.
2. Configure basic information of the cluster. The figure on the right is for your reference.
    If Kerberos authentication is disabled, a normal cluster is created. You can submit jobs using the job management function on the interface. For more information, go to Step 6.
    If Kerberos authentication is enabled, a security cluster is created. You need to submit jobs in the cluster background, instead of using the job management function on the interface. For more information, go to Step 7.
3. Configure the password and advanced settings of the cluster. The figure on the right is for your reference.

4. After the configuration is complete, click Next.
Click here to view more parameter description.

1

Configure the Cluster-01

Select the charging mode.

2

Configure the Cluster-02

选择实例配置

3

Configure the Cluster-03

选择实例配置

View Image

Step 3: Confirm Your Cluster Configurations

After confirming the information, click Submit. The system automatically creates the cluster for you.
It takes some time to create the MRS cluster. After you submit the cluster creation request, the initial status of the cluster is Starting. After the cluster has been created, its status changes to Running.

1

Confirm Your Cluster Configurations

Obtain the instance's connection address.

View Image

Step 4: Prepare the Hadoop Sample Program and Data Files

1. Prepare the wordcount program.
You can download the Hadoop example program at here.
For example, select hadoop-2.7.4.tar.gz and decompress it to the hadoop-2.7.4\share\hadoop\mapreduce directory. Obtain hadoop-mapreduce-examples-2.7.4.jar, which is the Hadoop example program.
2. Prepare data files.
There is no format requirement for data files. You can prepare two TXT files.
In this example, files wordcount1.txt and wordcount2.txt are used.

1

Sample Program

Obtain the instance's connection address.

View Image

Step 5: Upload Data to OBS

1. Log in to the OBS Console and click Create Bucket to create a bucket named mrs-word.
2. Click the bucket name mrs-word to switch to the Bucket List page. On the Objects page, click Create Folder to create folders program, input, and log. The figure on the right is for your reference.
3. Enter the program folder and upload the downloaded Hadoop sample program.
4. Enter the input folder and upload the data files wordcount1.txt and wordcount2.txt.

5. If the cluster is a normal cluster, go to Step 6.
    If the cluster is a security cluster, go to Step 7.

1

Upload Data to OBS

Obtain the instance's connection address.

View Image

Step 6: Creating a Job in a Normal Cluster

1. Log in to the MRS management console and click the name of the cluster created in Step 2. The cluster details page is displayed.
2. On the cluster details page, click the Job Management tab. On the Jobs tab page, click Create. The Create Job page is displayed. If the Job Management tab is not displayed on the cluster details page, perform operations by referring to Step 7.
3. Configure parameters as required and click OK. The figure on the right is for your reference.
After the job has been submitted, the job status is Running by default. You do not need to manually execute the job.
Click here to view more information.
4. Switch to the Job Management page. On the Jobs tab page, view the job execution status. And then, go to Step 8 to view the job execution result.

1

Create a Job

Obtain the instance's connection address.

2

View the Job Execution Result

Obtain the instance's connection address.

View Image

Step 7: Creating a Job in a Security Cluster

1. Log in to the MRS management console and click the name of the cluster created in Step 2. The cluster details page is displayed.

2. On the Nodes tab page, click the name of a Master node to access the Elastic Cloud Server (ECS) management console.

3. Click Remote Login in the upper right corner of the page.

4. Enter the username and password of the Master node as prompted. The username is root and the password is the one set during cluster creation.

5. Run the source /opt/client/bigdata_env command to configure environment variables.

6. Run the kinit MRS cluster username command to authenticate the current user, for example, kinit admin.

7. Run the following command to copy the sample program in the OBS bucket to the Master node in the cluster:

hadoop fs -Dfs.s3a.access.key=AK -Dfs.s3a.secret.key=SK -copyToLocal source_path.jar target_path.jar
    Example: hadoop fs -Dfs.s3a.access.key=XXXX -Dfs.s3a.secret.key=XXXX -copyToLocal "s3a://mrs-word/program/hadoop-mapreduce-examples-XXX.jar" "/home/omm/hadoop-mapreduce-examples-XXX.jar"
You can log in to OBS Console using AK/SK. To obtain AK/SK information, click the username in the upper right corner of the management console and choose My Credential > Access Keys.

8. Run the following command to submit the wordcount job. If data needs to be read from OBS or output to OBS, the AK/SK parameters need to be added.
source /opt/client/bigdata_env;hadoop jar execute_jar wordcount input_path output_path
    Example: source /opt/client/bigdata_env;hadoop jar /home/omm/hadoop-mapreduce-examples-XXX.jar wordcount -Dfs.s3a.access.key=XXXX -Dfs.s3a.secret.key=XXXX "s3a://mrs-word/input/*" "s3a://mrs-word/output/"
In the preceding command, input_path indicates a path for storing job input files on OBS, and output_path indicates a path for storing job output files on OBS and needs to be set to a directory that does not exist.

1

Log In to the Master Node

Obtain the instance's connection address.

View Image

Step 8: View the Job Execution Result

1. Log in to the OBS Console. Switch to the output directory of the mrs-word bucket to view the job output files. You need to download the files to your local PC and view them in text mode.
2. Switch to the log directory of the mrs-word bucket and query detailed job execution log information based on the job ID. You need to download the logs to your local PC and view them in text mode.

Note: Job execution logs of a normal cluster are stored in the log folder of the corresponding OBS bucket. Job execution logs of a security cluster are directly displayed on the command line interface after a job is executed.

1

View the Job Execution Result

Obtain the instance's connection address.

2

View the Job Execution Logs

Obtain the instance's connection address.

View Image

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel