Help Center > > User Guide> Cluster Operation Guide> Creating a Cluster

Creating a Cluster

Updated at: Nov 08, 2019 GMT+08:00

To use MRS, you must purchase cluster resources first. This section describes how to create a cluster using MRS.

After registering a HUAWEI CLOUD account, you can create an IAM user and a user group on the IAM console and grant them specific operation permissions, to perform refined management on HUAWEI CLOUD resources. For details, see IAM Permissions Management.

Background

Currently, the commercial version of MRS is charged based on ECSs in a cluster. Cluster nodes can be purchased in Yearly/Monthly mode or Pay-per-use mode.

  • Yearly/Monthly: The duration ranges from one month to one year. You need to pay in full when purchasing a cluster.
  • Pay-per-use: Nodes are charged by actual duration of use, with a billing cycle of one hour.
    • The fee here contains only the cost on clusters. The costs on data storage, bandwidth, and traffic on MRS are excluded.
    • You will be notified of renewal if there is no sufficient balance for fee deduction. Cluster resources will be frozen during a retention period and unfrozen after your renewal.
    • Yearly/Monthly clusters cannot be restored after being deleted, and you will not receive refund. Exercise caution when deleting a yearly/monthly cluster.
    • You can continue to use a yearly/monthly cluster after it is overdue. However, its pay-per-use services will be unavailable. That is, you cannot submit jobs via the OBS system.

Creating an MRS 2.0.3 Cluster

If you want to create a cluster of MRS history versions, follow instructions in Creating a Cluster (History Versions).

  1. Log in to the MRS management console.
  2. Click Buy Cluster and open the Buy Cluster page.

    Note the usage of quotas when you create a cluster. If the resource quotas are insufficient, increase quotas as prompted before creating a cluster.

  3. Configure basic cluster information by referring to the following tables.

    Table 1 Basic cluster configuration information

    Parameter

    Description

    Billing Mode

    MRS provides two billing modes:
    • Pay-per-use
    • Yearly/Monthly

    Region

    Select a region.

    AZ

    An AZ is a physical area that uses independent power and network resources. In this way, applications are interconnected using internal networks but are physically isolated. As a result, application availability is improved. It is recommended that you create clusters in different AZs.

    Select an AZ of the region in the cluster.

    Cluster Name

    Cluster name, which is globally unique.

    A cluster name can contain only 1 to 64 characters. Only letters, digits, hyphens (-), and underscores (_) are allowed.

    The default name is mrs_xxxx, where xxxx is a random combination of four letters and numbers.

    Cluster Version

    Currently, MRS 1.8.7, MRS 2.0.1 and MRS 2.0.3 are supported.

    The latest version of MRS is used by default.

    Enterprise Project

    Select the enterprise project to which the cluster belongs. To use an enterprise project, create one on the Enterprise Project Management page of the Enterprise Management console.

    The Enterprise Management console of the enterprise project is designed for resource management. It helps enterprises manage cloud-based personnel, resources, permissions, and finance in a hierarchical manner, such as management of companies, departments, and projects.

    Kerberos Authentication

    Indicates whether to enable Kerberos authentication when logging in to MRS Manager. Possible values are as follows:

    • If Kerberos authentication is disabled, you can use all functions of an MRS cluster. You are advised to disable Kerberos authentication in single-user scenarios. If Kerberos authentication is disabled, you can follow instructions in Security Configuration Suggestions for Clusters with Kerberos Authentication Disabled to perform security configuration.

    • If Kerberos authentication is enabled, common users cannot use the file and job management functions of an MRS cluster and cannot view cluster resource usage or the job records for Hadoop and Spark. To use more cluster functions, the users must contact the MRS Manager administrator to assign more permissions. You are advised to enable Kerberos authentication in multi-user scenarios.

    You can click or to disable or enable Kerberos authentication, respectively.

    After creating MRS clusters with Kerberos authentication enabled, users can manage running clusters on MRS Manager. For details, see Accessing MRS Manager Supporting Kerberos Authentication.

    Username

    Indicates the username for the administrator of MRS Manager. admin is used by default.

    Password

    Indicates the password of the MRS Manager administrator.

    • Must contain 8 to 32 characters.
    • Must contain at least three types of the following:
      • Lowercase letters
      • Uppercase letters
      • Digits
      • Special characters: `~!@#$%^&*()-_=+\|[{}];:'",<.>/?
      • Spaces
    • Must be different from the username.
    • Must be different from the username written in reverse order.

    Password strength: The colorbar in red, orange, and green indicates weak, medium, and strong password, respectively.

    Confirm Password

    Enter the user password again.

    Cluster Type

    Provides three types of clusters:
    • Analysis cluster: is used for offline data analysis and provides Hadoop components.
    • Streaming cluster: is used for streaming tasks and provides stream processing components.
    • Hybrid cluster: is used for both offline data analysis and streaming processing and provides Hadoop components and streaming processing components. You are advised to use a hybrid cluster to perform offline data analysis and streaming processing tasks at the same time. (MRS 1.8.5 or later supports hybrid clusters.)
    NOTE:

    MRS streaming clusters do not support job and file management functions.

    Component

    • MRS 2.0.3 supports the following components:
      Components of an analysis cluster:
      • Presto 308: open source and distributed SQL query engine
      • Hadoop 3.1.1: distributed system architecture
      • Spark 2.3.2: in-memory distributed computing framework
      • HBase 2.1.1: distributed column store database
      • Hive 3.1.0: data warehouse framework built on Hadoop
      • Tez 0.9.1: providing an application framework which allows for a complex directed-acyclic-graph of tasks for processing data.
      • Hue 3.11.0: providing the Hadoop UI capability, which enables users to analyze and process Hadoop cluster data on browsers
      • Loader 2.0.0: a tool based on source Sqoop 1.99.7, designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.

        Hadoop is mandatory, and Spark and Hive must be used together. Select components based on services.

      Components of a streaming cluster:
      • Kafka 1.1.0: distributed message subscription system
      • Storm 1.2.1: distributed real-time computing system
      • Flume 1.6.0: distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.

    Hive Uses External Data Source to Store Metadata

    Whether to use external data sources to store Hive metadata. Click to enable this function. If this function is enabled, Hive metadata will not be affected if a cluster is abnormal or deleted. This function applies to scenarios where storage and computing are separated.

    MRS 2.0.3 or later supports this function.

    Data Connection Type

    This parameter is valid only when Hive Uses External Data Source to Store Metadata is enabled. It indicates the type of an external data source.

    • RDS POSTGRES database
    • Local database

    Data Connection Instance

    This parameter is valid only when Data Connection Type is set to RDS POSTGRES database. This parameter indicates the name of the connection between the MRS cluster and the RDS POSTGRES database. This instance must be created before being referenced here. You can click Create Data Connection to create a data connection. For details, see Managing Data Connections.

    VPC

    A VPC is a secure, isolated, and logical network environment.

    Select the VPC for which you want to create a cluster and click View VPC to view the name and ID of the VPC. If no VPC is available, create one.

    Subnet

    A subnet provides dedicated network resources that are isolated from other networks, improving network security.

    Select the subnet for which you want to create a cluster to enter the VPC and view the name and ID of the subnet.

    If no subnet is created under the VPC, click Create Subnet to create one.

    WARNING:

    Do not associate the subnet with the network ACL.

    Security Group

    A security group is a set of ECS access rules. It provides access policies for ECSs that have the same security protection requirements and are mutually trusted in a VPC.

    When you create an MRS cluster, you can select Auto create from the drop-down list of Security Group to create a security group or select an existing security group.

    NOTE:

    When you select a security group created by yourself, ensure that the inbound rule contains a rule in which Protocol is set to All, Port is set to All, and Source is set to a trusted accessible IP address range. Do not use 0.0.0.0/0 as a source address. Otherwise, security risks may occur. If you do not know the trusted accessible IP address range, select Auto create.

    EIP

    After binding an EIP to an MRS cluster, you can use the EIP to access the MRS Manager page of the cluster.

    When creating a cluster, you can select an available EIP from the drop-down list and bind it. If no EIP is available in the drop-down list, click Manage EIP to access the EIPs page to purchase one.

    NOTE:
    • The EIP must be in the same region as the cluster.

    Cluster HA

    Cluster HA specifies whether to enable high availability for a cluster. This parameter is enabled by default.

    If you enable this option, the management processes of all components will be deployed on both Master nodes to achieve hot standby and prevent single-node failure, improving reliability. If you disable this option, they will be deployed on only one Master node. As a result, if a process of a component becomes abnormal, the component will fail to provide services.

    • : Disabled. When Cluster HA is disabled, there is only one Master node and the number of Core nodes is three by default. However, you can decrease the number of Core nodes to 1.
    • : Enabled. When Cluster HA is enabled, there are two Master nodes and the number of Core nodes is three by default. However, you can decrease the number of Core nodes to 1.

    You can click or to disable or enable high availability, respectively.

    Table 2 Cluster node information

    Parameter

    Description

    Type

    MRS provides three types of nodes:

    • Master: A Master node in an MRS cluster manages the cluster, assigns cluster executable files to Core nodes, traces the execution status of each job, and monitors the DataNode running status.
    • Core: A Core node in a cluster processes data and stores process data in HDFS. Analysis Core nodes are created in an analysis cluster. Streaming Core nodes are created in a streaming cluster. Both analysis and streaming Core nodes are created in a hybrid cluster.
    • Task: A Task node in a cluster is used for computing and does not store persistent data. Yarn and Storm are mainly installed on Task nodes. Task nodes are optional, and the number of Task nodes can be zero. Analysis Task nodes are created in an analysis cluster. Streaming Task nodes are created in a streaming cluster. Both analysis and streaming Task nodes are created in a hybrid cluster.
      When the number of clusters does not change much but the clusters' service processing capabilities need to be remarkably and temporarily improved, add Task nodes to address the following situations:
      • The number of temporary services is increased, for example, report processing at the end of the year.
      • Long-term tasks must be completed in a short time, for example, some urgent analysis tasks.

    Disk LVM

    This parameter is valid in the Operation column of a streaming Core node only when the streaming Core node is created. Click this parameter to enable or disable the disk LVM function. The function status is displayed in the parentheses next to this parameter.

    If LVM is enabled, all disks on a node are mounted as logical volumes. This delivers more proper disk planning to avoid data skew, thereby improving system stability.

    (Optional) Configure Task Node

    Click Configure Task Node to configure the information about the Task node.

    • If a cluster whose billing mode is Yearly/Monthly requires the auto scaling function, select the specifications of the Task nodes to be added after the auto scaling is enabled in Instance Specifications and set Instance Count to 0. Click Auto Scaling in the Operation column of the Task node. On the Auto Scaling page that is displayed, configure auto scaling rules. For details, see Using Auto Scaling in a Cluster. When the instance quantity is greater than 0, the auto scaling function is not supported.
    • If a cluster whose billing mode is Pay-per-use requires the auto scaling function, click Auto Scaling in the Operation column of the Task node. On the Auto Scaling page that is displayed, configure auto scaling rules. For details, see Using Auto Scaling in a Cluster.
      Note:
      • The Auto Scaling parameter in the Operation column of the Task node is used to configure an auto scaling rule. The content in the parentheses next to this parameter indicates the default node range when the auto scaling is enabled or is Disabled when auto scaling is disabled.
      • The price calculator only calculates the price of basic configurations. When Instance Count is set to 0 for Task nodes, the price calculator does not calculate the fee of the Task nodes regardless of whether the number of nodes for auto scaling is configured. The Task nodes added by using the auto scaling function are charged based on the actual usage duration.

    Instance Specifications

    Instance specifications of a node. MRS supports host specifications determined by CPU, memory, and disk space.

    MRS supports instance specifications detailed in ECS Specifications Used by MRS.

    NOTE:
    • More advanced instance specifications provide better data processing.
    • If you select HDDs for Core nodes, there is no charging information for data disks. The fees are charged with ECSs.
    • If you select HDDs for Core nodes, the system disks (40 GB) of Master nodes and Core nodes, as well as the data disks (200 GB) of Master nodes, are SATA disks.
    • If you select non-HDD disks for Core nodes, the disk types of Master and Core nodes are determined by Data Disk.
    • If Sold out appears next to an instance specification of a node, the node of this specification cannot be purchased. You can only purchase nodes of other specifications.
    • The Master node specification (4 vCPUs 8 GB) is not within the SLA after-sales scope. It is applicable only to the test environment and is not recommended for the production environment.

    Instance Count

    Number of Master, Core, and Task nodes

    For Master nodes:

    • If Cluster HA is enabled, the number of Master nodes is fixed to 2.
    • If Cluster HA is disabled, the number of Master nodes is fixed to 1.

    The minimum number of Core nodes is 1 and the total number of Core and Task nodes cannot exceed 500.

    NOTE:
    • If more than 500 Core nodes and Task nodes are required, contact technical support engineers or invoke a background interface to modify the database.
    • A small number of nodes may cause clusters to run slowly while a large number of nodes may be unnecessarily costly. Set an appropriate value based on data to be processed.

    Data Disk

    Disk space of Core nodes

    Users can add disks to increase storage capacity when creating a cluster. There are two different configurations for storage and computing:

    • Data storage and computing are performed separately

      Data is stored in OBS, which features low cost and unlimited storage capacity. The clusters can be terminated at any time in OBS. The computing performance is determined by OBS access performance and is lower than that of HDFS. This configuration is recommended if data computing is infrequent.

    • Data storage and computing are performed together

      Data is stored in HDFS, which features high cost, high computing performance, and limited storage capacity. Before terminating clusters, you must export and store the data. This configuration is recommended if data computing is frequent.

    The following disk types are supported:

    • SATA: Common I/O
    • SAS: High I/O
    • SSD: Ultra-high I/O

    The disk sizes range from 100 GB to 32,000 GB, with 10 GB added each time, for example, 100 GB, 110 GB.

    NOTE:
    • More nodes in a cluster require higher disk capacity of Master nodes. To ensure stable cluster running, set the disk capacity of the Master node to over 600 GB if the number of nodes is 300 and increase it to over 1 TB if the number of nodes reaches 500.
    • The Master node increases data disk storage space for MRS Manager. The disk type must be the same as the data disk type of Core nodes. The default disk space is 200 GB and cannot be changed.

    Data Disk Encryption

    Whether to encrypt data in the data disk mounted to the cluster. This function is disabled by default. To use this function, you must have the Security Administrator and KMS Administrator permissions.

    Keys used by encrypted data disks are provided by KMS of DEW, secure and convenient. Therefore, you do not need to establish and maintain the key management infrastructure.

    Click or to disable or enable the data disk encryption function. For details, see EVS Disk Encryption.

    Data Disk Key Name

    This parameter is mandatory when the Data Disk Encryption function is enabled. Select the name of the key used to encrypt the data disk. By default, the default master key named evs/default is selected. You can select another master key from the drop-down list.

    If disks are encrypted using a CMK, which is then disabled or scheduled for deletion, the disks can no longer be read from or written to, and data on these disks may never be restored. Exercise caution when performing this operation.

    Click View Key List to enter a page where you can create and manage keys.

    Data Disk Key ID

    This parameter is displayed only when the Data Disk Encryption function is enabled. This parameter indicates the key ID corresponding to the selected key name.

    Table 3 Login information

    Parameter

    Description

    Login Mode

    • Password

      You can log in to ECS nodes using a password.

      A password must meet the following requirements:

      1. Must be 8 to 26 characters long.
      2. Must contain at least 3 of the following character types: uppercase letters, lowercase letters, digits, and special characters (!@$%^-_=+[{}]:\,./?), but must not contain spaces.
      3. Cannot be the username or the username spelled backwards.
    • Key Pair

      Keys are used to log in to Master1 of the cluster.

      A key pair, also called an SSH key, consists of a public key and a private key. You can create an SSH key and download the private key for authenticating remote login. For security, a private key can only be downloaded once. Keep it secure.

      Select the key pair, for example SSHkey-bba1.pem, from the drop-down list. If you have obtained the private key file, select I acknowledge that I have obtained private key file SSHkey-bba1.pem and that without this file I will not be able to log in to my ECS. If no key pair is created, click View Key Pair to create or import keys. Then obtain the private key file.

      Configure an SSH key using either of the following two methods:

      1. Create an SSH key

        After you create an SSH key, a public key and a private key are generated. The public key is stored in the system, and the private key is stored in the local ECS. When you log in to an ECS, the public and private keys are used for authentication.

      2. Import an SSH key

        If you have obtained the public and private keys, import the public key into the system. When you log in to an ECS, the public and private keys are used for authentication.

    Table 4 Configuring valid duration

    Parameter

    Description

    Required Duration

    Cluster required duration when the billing mode is Yearly/Monthly. The required duration ranges from one month to one year.

    Table 5 Advanced settings

    Parameter

    Description

    Configure

    After you click Configure, the page for adding a job, a tag or a bootstrap action is displayed.

    Skip

    You can set parameters later.

  4. Click Next.
  5. Confirm cluster specifications.If you select the Yearly/Monthly billing mode, click Submit Order. If you select the Pay-per-use billing mode, click Submit to submit a cluster creation task.
  6. Click Back to Cluster List to view the cluster status.

    For details about cluster status during cluster creation, see the Status parameter description in Table 1.

    Cluster creation takes some time. The initial status of the cluster is Starting. After the cluster is created successfully, the cluster status becomes Running.

    Users can create a maximum of 10 clusters at a time and manage a maximum of 100 clusters on the MRS management console.

    The name of a new cluster can be the same as that of a failed or terminated cluster.

Failed to Create a Cluster

If the cluster fails to be created, the failed task automatically switches to the Manage Failed Task page. You can click displayed in Figure 1 to access the Manage Failed Task page and move the cursor over in the Task Status column shown in Figure 2 to view the causes. For details about how to delete the failed task, see Deleting a Failed Task.

Figure 1 Managing failed tasks
Figure 2 Causes

Table 6 provides error codes about cluster creation failure.

Table 6 Error codes

Error Code

Message

MRS.101

Insufficient quota to meet your request. Contact customer service to increase the quota.

MRS.102

The token cannot be null or invalid. Try again later or contact customer service.

MRS.103

Invalid request. Try again later or contact customer service.

MRS.104

Insufficient resources. Try again later or contact customer service.

MRS.105

Insufficient IP addresses in the existing subnet. Try again later or contact customer service.

MRS.201

Failed due to an ECS error. Try again later or contact customer service.

MRS.202

Failed due to an IAM error. Try again later or contact customer service.

MRS.203

Failed due to a VPC error. Try again later or contact customer service.

MRS.400

MRS system error. Try again later or contact customer service.

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel