Help Center> ModelArts> FAQs> ExeML> Preparing Data> What Are the Requirements for Training Data When You Create a Predictive Analytics Project in ExeML?
Updated on 2022-12-06 GMT+08:00

What Are the Requirements for Training Data When You Create a Predictive Analytics Project in ExeML?

Requirements on Datasets

  • Dataset consists of letters, digits, hyphens (-), and underscores (_), and must be in CSV format. Data files cannot be stored in the root directory of an OBS bucket, but in a folder in the OBS bucket, for example, /obs-xxx/data/input.csv.
  • Use newline characters (\n or LF) to separate lines and commas (,) to separate columns in the file content. The file content cannot include non-English symbols (for example, Chinese characters). The column content cannot contain special characters such as commas, line breaks, or quotation marks. It is recommended that the column content consist of only letters and numbers.
  • Data training
    • The number of columns in the training data must be the same, and there has to be at least 100 data records (a feature with different values is considered as different data records).
    • The training columns cannot contain timestamp formats (such as yy-mm-dd and yyyy-mm-dd).
    • If a column has only one value, the column is considered invalid. Ensure that there are at least two values in the label column and no data is missing.

      The label column is the training target specified in a training task. It is the output (prediction item) for the model trained using the dataset.

    • In addition to the label column, the dataset must contain at least two valid feature columns. Ensure that there are at least two values in each feature column and that the percentage of missing data must be lower than 10%.
    • The CSV file cannot contain a table header, or the training will fail.

Preparing Data FAQs

more