Help Center > > User Guide> MRS Cluster Component Operation Guide> Using Spark> Using Spark from Scratch

Using Spark from Scratch

Updated at:Apr 28, 2020 GMT+08:00

This section describes how to use Spark to submit a sparkPi job. SparkPi, a typical Spark job, is used to calculate the value of pi (π).

Procedure

  1. Prepare the sparkPi program.

    The open source Spark example program contains the sparkPi program. You can download the Spark example program at https://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz

    Decompress the Spark example program to obtain the spark-examples_2.11-2.1.0.jar file in the spark-2.1.0-bin-hadoop2.7/examples/jars directory. The spark-examples_2.11-2.1.0.jar example program contains the sparkPi program.

  2. Upload data to OBS.

    1. Log in to the OBS console.
    2. Click Create Bucket to create a bucket and name it. The name must be unique; otherwise the bucket cannot be created. Here name sparkPi will be used as an example.
    3. In the sparkpi bucket, click Create Folder to create the program, output, and log folders.
    4. Go to the program folder, click to select the program package downloaded in 1, and click Upload.

  3. Log in to the MRS management console. In the navigation tree on the left, choose Clusters > Active Clusters and click the cluster named mrs_20160907. The mrs_20160907 cluster was created in section Creating a Cluster.
  4. Submit a sparkPi job.

    1. Select Jobs. On the Jobs tab page, click Create to go to the Create Job page. For details, see Running a Spark Job.

      Only when the mrs_20160907 cluster is in the running state can jobs be submitted.

      A job will be executed immediately after being created successfully.

  5. View the job execution results.

    1. Go to the Jobs tab page. On the Jobs tab page, check whether the jobs are complete.

      The job operation takes a while. After the jobs are complete, refresh the job list.

      You cannot execute a successful or failed job, but can add or copy the job. After setting job parameters, you can submit the job again.

    2. Go to the OBS directory and query job output information.

      In the sparkpi > output directory of OBS, you can query and download the job output files.

    3. Go to the OBS directory and check the detailed job execution results.

      In the sparkpi > log directory of OBS, you can query and download the job execution logs by job ID.

  6. Terminate a cluster.

    For details, see Terminating a Cluster.

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel