Help Center> > Getting Started> Getting Started with Hadoop
Getting Started with Hadoop

Getting Started with Hadoop

Updated at: 2019-01-23 13:49
  • MapReduce Service (MRS) provides enterprise-level big data clusters on the cloud. Tenants can fully control clusters and easily run big data components such as Hadoop, Spark, HBase, Kafka, and Storm.

  • This guide describes how to use Hadoop to submit a wordcount job. Wordcount, a typical Hadoop job, is used to count the words in texts.

  • Step 1: Purchase a cluster. -> Step 2: Prepare the Hadoop sample program and data files. -> Step 3: Upload data to OBS. -> Step 4: Create a job. -> Step 5: View the job execution results.

Step 1: Purchase a Cluster

1. Log in to the public cloud management console.
2. Choose Service List > EI Enterprise Intelligence > MapReduce Service.
The MRS management console is displayed.
3. Click Purchase Cluster to switch to the Purchase Cluster page.

1

MapReduce Service.

Apply for a VPC.

2

Purchase a Cluster.

Apply for an ECS.

View Image

Step 2: Configure the Cluster

1. Select the billing mode. In this example, Billing Mode is set to On-demand.
2. Configure basic information of the cluster. Refer to the figure on the right.

3. Configure the password, log record, and advanced configuration of the cluster (see the picture on the right).

4. After the configuration is complete, click Buy Now.
Click here to view more parameter description.

1

Configure the Cluster-01

Select the charging mode.

2

Configure the Cluster-02

选择实例配置

3

Configure the Cluster-03

选择实例配置

View Image

Step 3: Confirm You Cluster Configuration

After confirming the information, click Submit Application. The system automatically creates the cluster for you.
It takes some time to create the MRS cluster. The initial status of a created cluster is Starting. After the cluster is created successfully, its status changes to Running.

1

Confirm You Cluster Configuration 

Obtain the instance's connection address.

View Image

Step 4: Prepare the Hadoop Sample Program and Data Files

1. Prepare the wordcount program.
You can download the Hadoop example program at here.
For example, select and decompress hadoop-2.7.4.tar.gz to the hadoop-2.7.4\share\hadoop\mapreduce directory. Obtain hadoop-mapreduce-examples-2.7.4.jar, which is the Hadoop example program.
2. Prepare data files.
There is no format requirement for data files. You can prepare two TXT files.
In this example, files wordcount1.txt and wordcount2.txt are used.

1

Sample Program

Obtain the instance's connection address.

View Image

Step 5: Upload Data to OBS

1. Log in to the OBS console and click Create Bucket to create a bucket named mrs-word.
2. Click the bucket name mrs-word to switch to the Bucket List page. On the Objects page, click Create Folder to create files program, input, and log. See the figure on the right.
3. Enter the program folder and upload the downloaded Hadoop sample program.
4. Enter the input folder and upload the data files wordcount1.txt and wordcount2.txt.

1

Upload Data to OBS

Obtain the instance's connection address.

View Image

Step 6: Create a Job

1. From the left navigation pane of the MRS console, choose Clusters > Active Clusters and click the cluster whose name is mrs_test.
2. Switch to the Job Management page, and click Create on the Job page to create a job.
3. Configure parameters as required and click OK. See the figure on the right.
After the job is submitted successfully, the job status is Running by default. You do not need to manually execute the job.
Click here to view more information.

1

Create a Job

Obtain the instance's connection address.

View Image

Step 7: View the Job Execution Result

1. Switch to the Job Management page. On the Job page, view the job execution status.
2. Log in to the OBS console. Switch to the output directory of the mrs-word bucket to view the job output files. You need to download the files to your local PC and view them in text mode.
3. Switch to the log directory of the mrs-word bucket and query detailed job execution log information based on the job ID. You need to download the logs to your local PC and view them in text mode.

1

View the Job Execution Result-01

Obtain the instance's connection address.

2

View the Job Execution Result-02

Obtain the instance's connection address.

3

View the Job Execution Result-03

Obtain the instance's connection address.

View Image

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 100 character

Content is empty.

OK Cancel