Help Center > > Developer Guide> Tutorial: Importing Data from OBS to a Cluster> Step 2: Creating a Foreign Table

Step 2: Creating a Foreign Table

Updated at: Jun 24, 2019 GMT+08:00
  1. Connect to the DWS database.
  2. Create a foreign table to store the three CSV data files that have been uploaded to OBS in Step 1: Uploading Data to OBS. This table is used to transfer data to ordinary tables in DWS.

    The following foreign table and its parameter settings are provided as an example.

    Parameters defined in a foreign table are used to identify data formats and set the error tolerance for data import. In this tutorial, only key parameter settings are provided. For more information, see CREATE FOREIGN TABLE (for OBS Import and Export).
    DROP FOREIGN TABLE IF EXISTS product_info_ext;
    CREATE FOREIGN TABLE product_info_ext
    (
        product_price                integer        not null,
        product_id                   char(30)       not null,
        product_time                 date           ,
        product_level                char(10)       ,
        product_name                 varchar(200)   ,
        product_type1                varchar(20)    ,
        product_type2                char(10)       ,
        product_monthly_sales_cnt    integer        ,
        product_comment_time         date           ,
        product_comment_num          integer        ,
        product_comment_content      varchar(200)                   
    ) 
    SERVER gsmpp_server 
    OPTIONS(
    LOCATION 'obs://mybucket/input_data/product_info | obs://mybucket02/input_data/product_info',
    FORMAT 'CSV' ,
    DELIMITER ',',
    ENCODING 'utf8',
    HEADER 'false',
    ACCESS_KEY 'access_key_value_to_be_replaced',
    SECRET_ACCESS_KEY 'secret_access_key_value_to_be_replaced',
    FILL_MISSING_FIELDS 'true',
    IGNORE_EXTRA_DATA 'true'
    )
    READ ONLY 
    LOG INTO product_info_err 
    PER NODE REJECT LIMIT 'unlimited';
    If the following information is displayed, the foreign table has been created:
    CREATE FOREIGN TABLE

    Parameter settings in the foreign table are as follows:

    • SERVER: It is always set to gsmpp_server. You do not need to change it.
    • LOCATION: The OBS path consists of obs://, a bucket name, and a file path. Example: obs://<bucket_name>/<file_path>.
      Multiple paths (if any) are separated by vertical bars (|). For example:
      LOCATION 'obs://mybucket/input_data/product_info  |  obs://mybucket02/input_data/product_info',
      The following two files having the specified prefix in the mybucket folder will be imported:
      mybucket/input_data/product_info0.csv
      mybucket/input_data/product_info1.csv
    • FORMAT

      It is set to CSV because data to be imported in this tutorial is in CSV format. If the data is in TEXT or ORC format, set this parameter accordingly.

    • DELIMITER

      It is set to , because the data in source data files is separated by commas (,).

    • ENCODING

      It is set to UTF-8.

    • HEADER

      This parameter specifies whether a data file contains a header. It is valid only for data files in CSV format. The first line of data files in Preparing Data Files is not a header. Therefore, it is set to false.

    • ACCESS_KEY and SECRET_ACCESS_KEY

      These parameters specify the AK and SK for a user to access OBS. Set them based on site conditions. To obtain access keys, log in to the management console, hover the cursor on the username in the upper right corner, click My Credential, and click Access Keys. Then, you can view existing access keys or click Add Access Key to create one.

    • FILL_MISSING_FIELDS

      This parameter specifies how to handle the problem that the last column of a row in a source data file is lost during data import. The default value is false or off. This parameter is set to true in this tutorial.

      • true/on: The last column is set to NULL. No error is reported.
      • false/off: Error "missing data for column "tt"" is reported.
        For example, the last column product_comment_content of the second record in the product_info2.csv source data file is lost. If FILL_MISSING_FIELDS is set to false or off, information similar to the following will be displayed in the error table during data import:
        missing data for column "product_comment_content"
    • IGNORE_EXTRA_DATA

      This parameter specifies whether to ignore excessive columns when the number of columns in a source data file exceeds that defined in the foreign table. The default value is false or off. This parameter is set to true in this tutorial.

      • true/on: The excessive columns of a row are ignored. No error is reported.
      • false/off: Error "extra data after last expected column" is reported.

        For example, the number of columns in the third record in the product_info2.csv source data file is greater than that defined for the foreign table. If IGNORE_EXTRA_DATA is set to false or off, information similar to the following will be displayed in the error table during data import:

        extra data after last expected column
    • READ ONLY

      Syntax defined in a foreign table can be used for both importing data to the DWS cluster and for exporting data from the cluster. To import data to the cluster, use READ ONLY for the foreign table. To export data, use WRITE ONLY.

    • LOG INTO

      This parameter specifies the error table that records data format error information during import. You only need to specify the table name, and do not need to create it in advance. DWS automatically creates the table when a foreign table is created and automatically deletes it when the foreign table is deleted.

    • PER NODE REJECT LIMIT

      This parameter specifies the maximum number of data format errors on each DN during data import. If the number of errors exceeds the specified value on any DN, data import fails, an error is reported, and the system exits data import.

      It is set to unlimited in this tutorial, indicating that all data format errors during import can be recorded.

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel