Updated on 2022-06-01 GMT+08:00

Application Development Process

Spark includes Spark Core, Spark SQL, and Spark Streaming, whose development processes are the same.

Figure 1 and Table 1 describe the phases in the development process.

Figure 1 Spark application development process
Table 1 Description of the Spark application development process

Phase

Description

Reference

Understand basic concepts.

Before application development, learn basic concepts of Spark. Select basic concepts of Spark Core, Spark SQL, and Spark Streaming to learn based on actual scenarios.

Basic Concepts

Prepare a development environment.

Spark applications can be developed in Scala, Java, and Python. You are advised to use the IDEA tool to configure development environments in different languages according to the guide.

Preparing a Java Development Environment to Preparing a Python Development Environment

Prepare an operating environment.

The Spark operating environment is a Spark client. Install and configure the client according to the guide.

Preparing an Operating Environment

Obtain and import a sample project

or create a new project.

Spark provides sample projects for different scenarios. You can import a sample project to learn the application. You can also create a Spark project according to the guide.

Downloading and Importing a Sample Project

Develop a project based on the scenario.

Sample projects in Scala, Java, and Python are provided. Sample projects in different scenarios including Streaming, SQL, JDBC client program, and Spark on HBase are also provided. This helps you quickly learn APIs of all Spark components.

Scenario Description to Scala Sample Code

Compile and run applications.

You can compile the developed application and submit it for running.

Compiling and Running Applications

View application running results.

Application running results are exported to a path you specify. You can also view the application running status on the UI.

Viewing Commissioning Results

Optimize the application.

You can optimize a program based on its running status to meet the performance requirement in the current service scenario.

After optimizing the application, compile and run it again.

Data Serialization to Spark CBO Tuning