Getting Started with Hadoop
MapReduce Service (MRS) provides enterprise-level big data clusters on the cloud. Tenants can fully control clusters and easily run big data components such as Hadoop, Spark, HBase, Kafka, and Storm.
This guide describes how to use Hadoop to submit a wordcount job. Wordcount, a typical Hadoop job, is used to count the words in texts.
Step 1: Purchase a cluster. -> Step 2: Prepare the Hadoop sample program and data files. -> Step 3: Upload data to OBS. -> Step 4: Create a job. -> Step 5: View the job execution results.
Step 1: Purchase a Cluster
1. Log in to the public cloud management console.
2. Choose Service List > EI Enterprise Intelligence > MapReduce Service.
The MRS management console is displayed.
3. Click Purchase Cluster to switch to the Purchase Cluster page.
Purchase a Cluster.
Step 2: Configure the Cluster
1. Select the billing mode. In this example, Billing Mode is set to On-demand.
2. Configure basic information of the cluster. Refer to the figure on the right.
3. Configure the password, log record, and advanced configuration of the cluster (see the picture on the right).
4. After the configuration is complete, click Buy Now.
Click here to view more parameter description.
Configure the Cluster-01
Configure the Cluster-02
Configure the Cluster-03
Step 3: Confirm You Cluster Configuration
After confirming the information, click Submit Application. The system automatically creates the cluster for you.
It takes some time to create the MRS cluster. The initial status of a created cluster is Starting. After the cluster is created successfully, its status changes to Running.
Confirm You Cluster Configuration
Step 4: Prepare the Hadoop Sample Program and Data Files
1. Prepare the wordcount program.
You can download the Hadoop example program at here.
For example, select and decompress hadoop-2.7.4.tar.gz to the hadoop-2.7.4\share\hadoop\mapreduce directory. Obtain hadoop-mapreduce-examples-2.7.4.jar, which is the Hadoop example program.
2. Prepare data files.
There is no format requirement for data files. You can prepare two TXT files.
In this example, files wordcount1.txt and wordcount2.txt are used.
Step 5: Upload Data to OBS
1. Log in to the OBS console and click Create Bucket to create a bucket named mrs-word.
2. Click the bucket name mrs-word to switch to the Bucket List page. On the Objects page, click Create Folder to create files program, input, and log. See the figure on the right.
3. Enter the program folder and upload the downloaded Hadoop sample program.
4. Enter the input folder and upload the data files wordcount1.txt and wordcount2.txt.
Upload Data to OBS
Step 6: Create a Job
1. From the left navigation pane of the MRS console, choose Clusters > Active Clusters and click the cluster whose name is mrs_test.
2. Switch to the Job Management page, and click Create on the Job page to create a job.
3. Configure parameters as required and click OK. See the figure on the right.
After the job is submitted successfully, the job status is Running by default. You do not need to manually execute the job.
Click here to view more information.
Create a Job
Step 7: View the Job Execution Result
1. Switch to the Job Management page. On the Job page, view the job execution status.
2. Log in to the OBS console. Switch to the output directory of the mrs-word bucket to view the job output files. You need to download the files to your local PC and view them in text mode.
3. Switch to the log directory of the mrs-word bucket and query detailed job execution log information based on the job ID. You need to download the logs to your local PC and view them in text mode.
View the Job Execution Result-01
View the Job Execution Result-02
View the Job Execution Result-03
Next Article: Getting Started with Kafka