Help Center> Data Lake Insight> Service Overview> Constraints and Limitations
Updated on 2024-01-29 GMT+08:00

Constraints and Limitations

On Jobs

  • DLI supports the following types of jobs: Spark SQL, Spark Jar, Flink SQL, and Flink Jar.
  • DLI supports the following Spark versions: Spark 3.1, 2.4, and 2.3 (end of marketing, EOM).
  • DLI supports the following Flink versions: Flink Jar 1.15, Flink 1.12 (EOM), Flink 1.10 (EOM), and Flink 1.7 (end of service or support, EOS).
  • SQL jobs support the Spark and Trino engines.
    • Spark: displays jobs whose execution engine is Spark.
    • Trino: displays jobs whose execution engine is Trino.
  • Only the latest 100 jobs are displayed on DLI's SparkUI.
  • A maximum of 1,000 job results can be displayed on the console. To view more or all jobs, export the job data to OBS.
  • To export job run logs, you must have the permission to access OBS buckets. You need to configure a DLI job bucket on the Global Configuration > Project page in advance.
  • The View Log button is not available for synchronization jobs and jobs running on the default queue.
  • Only Spark jobs support custom images.
  • An elastic resource pool supports a maximum of 32,000 CUs.
  • Minimum CUs of a queue that can be created in an elastic resource pool:
    • General purpose queue: 4 CUs
    • SQL queue: Spark SQL queue: 8 CUs; Trino SQL queue: 16 CUs

For details about job constraints, see Job Management.

On DLI Packages

  • After a package is purchased, the region cannot be changed. A purchased package can be used only in the region where it is purchased.
  • A package cannot be unsubscribed after being purchased.
  • During billing, resources in the package are preferentially used. After the quotas in the package are used up, you are billed for the excess resources on a pay-per-use basis.
  • Resources used before a package is purchased cannot be deducted from the purchased package.
  • If your package expires, you can still use DLI resources and you will be billed on a pay-per-use basis.
  • The quota of a storage package is reset every hour. The quota of other types of packages is reset every month.

On Queues

  • A queue named default is preset in DLI for you to experience. Resources are allocated on demand. You are billed based on the amount of data scanned in each job (unit: GB).
  • Queue types:
    • For SQL: Spark SQL jobs can be submitted to SQL queues.
    • For general purpose: The queue is used to run Spark programs, Flink SQL jobs, and Flink Jar jobs.

    The queue type cannot be changed. If you want to use another queue type, purchase a new queue.

  • The billing mode of a queue cannot be changed.
  • The region of a queue cannot be changed.
  • Queues with 16 CUs do not support scale-out or scale-in.
  • Queues with 64 CUs do not support scale-in.
  • When creating a queue, you can only select cross-AZ active-active for yearly/monthly queues and pay-per-use dedicated queues. The price of a cross-AZ queue is twice that of a single-AZ queue.
  • A newly created queue can be scaled in or out only after a job is executed on the queue.
  • DLI queues cannot access the Internet.

    For details about how to access the Internet from an elastic resource pool, see Configuring the Connection Between a DLI Queue and a Data Source on the Internet.

For more constraints on using a DLI queue, see Queue Overview.

On Elastic Resource Pools

  • The billing mode of an elastic resource pool cannot be changed.
  • The region of an elastic resource pool cannot be changed.
  • For a pay-per-use resource pool, the dedicated resource mode is selected by default. The resource pool is billed by calendar hour since it is created.
  • Jobs of Flink 1.10 or later can run in elastic resource pools.
  • The network segment of an elastic resource pool cannot be changed after being set.
  • Associating an elastic resource pool with a queue:
    • Only pay-per-use queues (including dedicated queues) can be associated with elastic resource pools.
    • No resources are frozen.
  • Currently, only yearly/monthly elastic resource pools can be scaled.
  • You can only view the scaling history of resource pools in the last 30 days.
  • Elastic resource pools cannot access the Internet.

    For details about how to access the Internet from an elastic resource pool, see Configuring the Connection Between a DLI Queue and a Data Source on the Internet.

For more constraints on elastic resource pools, see Elastic Resource Pool Overview.

On DLI Storage Resources

DLI can store databases and tables. DLI storage is billed based on the amount of stored data.

On Resources

  • Database
    • default is the database built in DLI. You cannot create a database named default.
    • DLI supports a maximum of 50 databases.
  • Table
    • DLI supports a maximum of 5,000 tables.
    • DLI supports the following table types:
      • MANAGED: Data is stored in a DLI table.
      • EXTERNAL: Data is stored in an OBS table.
      • View: A view can only be created using SQL statements.
      • Datasource table: The table type is also EXTERNAL.
    • You cannot specify a storage path when creating a DLI table.
  • Data import
    • Only OBS data can be imported to DLI or OBS.
    • You can import data in CSV, Parquet, ORC, JSON, or Avro format from OBS to tables created on DLI.
    • To import data in CSV format to a partitioned table, place the partition column in the last column of the data source.
    • The encoding format of imported data can only be UTF-8.
  • Data export
    • Data in DLI tables (whose table type is MANAGED) can only be exported to OBS buckets, and the export path must contain a folder.
    • The exported file is in JSON format, and the text format can only be UTF-8.
    • Data can be exported across accounts. That is, after account B authorizes account A, account A has the permission to read the metadata and permission information of account B's OBS bucket as well as the read and write permissions on the path. Account A can export data to the OBS path of account B.
  • Package
    • A package can be deleted, but a package group cannot be deleted.
    • The following types of packages can be uploaded:
      • JAR: JAR file
      • PyFile: User Python file
      • File: User file
      • ModelFile: User AI model file

For details about constraints on resources, see Data Management.

On Enhanced Datasource Connections

  • Datasource connections cannot be created for the default queue.
  • Flink jobs can directly access DIS, OBS, and SMN data sources without using datasource connections.
  • Enhanced connections can only be created for yearly/monthly and pay-per-use queues.
  • VPC Administrator permissions are required for enhanced connections to use VPCs, subnets, routes, VPC peering connections.

    You can set these permissions by referring to Service Authorization.

  • If you use an enhanced datasource connection, the CIDR block of the elastic resource pool or queue cannot overlap with that of the data source.
  • Only queues bound with datasource connections can access datasource tables.
  • Datasource tables do not support the preview function.
  • When checking the connectivity of datasource connections, the constraints on IP addresses are as follows:
    • The IP address must be valid, which consists of four decimal numbers separated by periods (.). The value ranges from 0 to 255.
    • During the test, you can add a port after the IP address and separate them with colons (:). The port can contain a maximum of five digits. The value ranges from 0 to 65535.

      For example, 192.168.xx.xx or 192.168.xx.xx:8181.

  • When checking the connectivity of datasource connections, the constraints on domain names are as follows:
    • The domain name can contain 1 to 255 characters. Only letters, digits, underscores (_), and hyphens (-) are allowed.
    • The top-level domain name must contain at least two letters, for example, .com, .net, and .cn.
    • During the test, you can add a port after the domain name and separate them with colons (:). The port can contain a maximum of five digits. The value ranges from 0 to 65535.

      For example, example.com:8080.

For more constraints on enhanced datasource connections, see Enhanced Datasource Connection Overview.

On Datasource Authentication

  • Only Spark SQL and Flink OpenSource SQL 1.12 jobs support datasource authentication.
  • Flink jobs can use datasource authentication only on queues created after May 1, 2023.
  • DLI supports four types of datasource authentication. Select an authentication type specific to each data source.
    • CSS: applies to 6.5.4 or later CSS clusters with the security mode enabled.
    • Kerberos: applies to MRS security clusters with Kerberos authentication enabled.
    • Kafka_SSL: applies to Kafka with SSL enabled.
    • Password: applies to GaussDB(DWS), RDS, DDS, and DCS.

For more constraints on datasource authentication, see Datasource Authentication Introduction.

On SQL Syntax

  • Constraints on the SQL syntax:
    • You are not allowed to specify a storage path when creating a DLI table using SQL statements.
  • Constraints on the size of SQL statements:
    • Each SQL statement should contain less than 500,000 characters.
    • The size of each SQL statement must be less than 1 MB.

Other

  • For details about quota constraints, see Quotas.
  • Recommended browsers for logging in to DLI:
    • Google Chrome 43.0 or later
    • Mozilla Firefox 38.0 or later
    • Internet Explorer 9.0 or later

    For details about the compatibility list of more browsers, see Which Browsers Are Supported?