Help Center> ModelArts> API Reference> Historical APIs> Training Management (Old Version)> Training Jobs> Querying the Details About a Training Job Version
Updated on 2023-12-14 GMT+08:00

Querying the Details About a Training Job Version

Function

This API is used to obtain the details about a specified training job based on the job ID.

URI

GET /v1/{project_id}/training-jobs/{job_id}/versions/{version_id}

Table 1 describes the required parameters.
Table 1 Parameters

Parameter

Mandatory

Type

Description

project_id

Yes

String

Project ID. For details about how to obtain a project ID, see Obtaining a Project ID and Name.

job_id

Yes

Long

ID of a training job

version_id

Yes

Long

Version ID of a training job

Request Body

None

Response Body

Table 2 describes the response parameters.
Table 2 Parameters

Parameter

Type

Description

is_success

Boolean

Whether the request is successful

job_id

Long

ID of a training job

job_name

String

Name of a training job

job_desc

String

Description of a training job

version_id

Long

Version ID of a training job

version_name

String

Version name of a training job

pre_version_id

Long

Name of the previous version of a training job

engine_type

Integer

Engine type of a training job. The mapping between engine_type and engine_name is as follows:

  • engine_type: 1, engine_name: TensorFlow
  • engine_type: 2, engine_name: MXNet
  • engine_type: 3, engine_name: Ray
  • engine_type: 4, engine_name: Caffe
  • engine_type: 5, engine_name: Spark_MLlib
  • engine_type: 9, engine_name: XGBoost-Sklearn
  • engine_type: 10, engine_name: PyTorch
  • engine_type: 12, engine_name: Horovod

engine_name

String

Name of the engine selected for a training job. Currently, the following engines are supported:

  • Ascend-Powered-Engine
  • Caffe
  • Horovod
  • MXNet
  • PyTorch
  • Ray
  • Spark_MLlib
  • TensorFlow
  • XGBoost-Sklearn
  • MindSpore-GPU

engine_id

Long

ID of the engine selected for a training job

engine_version

String

Version of the engine selected for a training job

status

Integer

Status of a training job. For details about the job statuses, see Job Statuses.

app_url

String

Code directory of a training job

boot_file_url

String

Boot file of a training job

create_time

Long

Time when a training job is created

parameter

Array<Object>

Running parameters of a training job. This parameter is a container environment variable when a training job uses a custom image. For details, see Table 3.

duration

Long

Training job running duration, in milliseconds

spec_id

Long

ID of the resource specifications selected for a training job

core

String

Number of cores of the resource specifications

cpu

String

CPU memory of the resource specifications

gpu_num

Integer

Number of GPUs of the resource specifications

gpu_type

String

GPU type of the resource specifications

worker_server_num

Integer

Number of workers in a training job

data_url

String

Dataset of a training job

train_url

String

OBS path of the training job output file

log_url

String

OBS URL of the logs of a training job. By default, this parameter is left blank. Example value: /usr/train/

dataset_version_id

String

Dataset version ID of a training job

dataset_id

String

Dataset ID of a training job

data_source

Array<Object>

Dataset of a training job. For details, see Table 4.

model_id

Long

Model ID of a training job

model_metric_list

String

Model metrics of a training job. For details, see Table 5.

system_metric_list

Object

System monitoring metrics of a training job. For details, see Table 6.

user_image_url

String

SWR URL of a custom image used by a training job

user_command

String

Boot command used to start the container of a custom image of a training job

resource_id

String

Charged resource ID of a training job

dataset_name

String

Dataset of a training job

spec_code

String

Resource specifications selected for a training job

start_time

Long

Training start time

volumes

Array<Object>

Storage volume that can be used by a training job. For details, see Table 11.

dataset_version_name

String

Dataset of a training job

pool_name

String

Name of a resource pool

pool_id

String

ID of a resource pool

nas_mount_path

String

Local mount path of SFS Turbo (NAS). Example value: /home/work/nas

nas_share_addr

String

Shared path of SFS Turbo (NAS). Example value: 192.168.8.150:/

nas_type

String

Only NFS is supported. Example value: nfs

Table 3 parameter parameters

Parameter

Type

Description

label

String

Parameter name

value

String

Parameter value

Table 4 data_source parameters

Parameter

Type

Description

dataset_id

String

Dataset ID of a training job

dataset_version

String

Dataset version ID of a training job

type

String

Dataset type

  • obs: Data from OBS is used.
  • dataset: Data from a specified dataset is used.

data_url

String

OBS bucket path

Table 5 model_metric_list parameters

Parameter

Type

Description

metric

JSON Array

Validation metrics of a classification of a training job. For details, see Table 7.

total_metric

JSON

Overall validation parameters of a training job. For details, see Table 9.

Table 6 system_metric_list parameters

Parameter

Type

Description

cpuUsage

Array

CPU usage of a training job

memUsage

Array

Memory usage of a training job

gpuUtil

Array

GPU usage of a training job

Table 7 metric parameters

Parameter

Type

Description

metric_values

JSON

Validation metrics of a classification of a training job. For details, see Table 8.

reserved_data

JSON

Reserved parameter

metric_meta

JSON

Classification of a training job, including the classification ID and name

Table 8 metric_values parameters

Parameter

Type

Description

recall

Float

Recall of a classification of a training job

precision

Float

Precision of a classification of a training job

accuracy

Float

Accuracy of a classification of a training job

Table 9 total_metric parameters

Parameter

Type

Description

total_metric_meta

JSON

Reserved parameter

total_reserved_data

JSON

Reserved parameter

total_metric_values

JSON

Overall validation metrics of a training job. For details, see Table 10.

Table 10 total_metric_values parameters

Parameter

Type

Description

f1_score

Float

F1 score of a training job

recall

Float

Total recall of a training job

precision

Float

Total precision of a training job

accuracy

Float

Total accuracy of a training job

Table 11 volumes parameters

Parameter

Mandatory

Type

Description

nfs

No

Object

Storage volume of the shared file system type. Only the training jobs running in the resource pool with a shared file system network connected support such storage volumes. For details, see Table 6.

host_path

No

Object

Storage volume of the host file system type. Only training jobs running in a dedicated resource pool support such storage volumes. For details, see Table 7.

Table 12 nfs parameters

Parameter

Mandatory

Type

Description

id

Yes

String

ID of an SFS Turbo file system

src_path

Yes

String

Path to an SFS Turbo file system

dest_path

Yes

String

Local path to a training job

read_only

No

Boolean

Whether dest_path is read-only. The default value is false.

  • true: read-only permission
  • false: read/write permission. This is the default value.
Table 13 host_path parameters

Parameter

Mandatory

Type

Description

src_path

Yes

String

Local path to a host

dest_path

Yes

String

Local path to a training job

read_only

No

Boolean

Whether dest_path is read-only. The default value is false.

  • true: read-only permission
  • false: read/write permission. This is the default value.

Sample Request

The following shows how to obtain the details about the job whose job_id is 10 and version_id is 10.

GET    https://endpoint/v1/{project_id}/training-jobs/10/versions/10

Sample Response

  • Successful response
    {
        "is_success": true,
        "job_id": 10,
        "job_name": "TestModelArtsJob",
        "job_desc": "TestModelArtsJob desc",
        "version_id": 10,
        "version_name": "jobVersion",
        "pre_version_id": 5,
        "engine_type": 1,
        "engine_name": "TensorFlow",
        "engine_id": 1,
        "engine_version": "TF-1.4.0-python2.7",
        "status": 10,
        "app_url": "/usr/app/",
        "boot_file_url": "/usr/app/boot.py",
        "create_time": 1524189990635,
        "parameter": [
            {
                "label": "learning_rate",
                "value": 0.01
            }
        ],
        "duration": 532003,
        "spec_id": 1,
        "core": 2,
        "cpu": 8,
        "gpu_num": 2,
        "gpu_type": "P100",
        "worker_server_num": 1,
        "data_url": "/usr/data/",
        "train_url": "/usr/train/",
        "log_url": "/usr/log/",
        "dataset_version_id": "2ff0d6ba-c480-45ae-be41-09a8369bfc90",
        "dataset_id": "38277e62-9e59-48f4-8d89-c8cf41622c24",
        "data_source": [
            {
                "type": "obs",
                "data_url": "/qianjiajun-test/minst/data/"
            }
        ],
        "user_image_url": "100.125.5.235:20202/jobmng/custom-cpu-base:1.0",
        "user_command": "bash -x /home/work/run_train.sh python /home/work/user-job-dir/app/mnist/mnist_softmax.py --data_url /home/work/user-job-dir/app/mnist_data",
        "model_id": 1,
        "model_metric_list": "{\"metric\":[{\"metric_values\":{\"recall\":0.005833,\"precision\":0.000178,\"accuracy\":0.000937},\"reserved_data\":{},\"metric_meta\":{\"class_name\":0,\"class_id\":0}}],\"total_metric\":{\"total_metric_meta\":{},\"total_reserved_data\":{},\"total_metric_values\":{\"recall\":0.005833,\"id\":0,\"precision\":0.000178,\"accuracy\":0.000937}}}",
        "system_metric_list": {
            "cpuUsage": [
                "0",
                "3.10",
                "5.76",
                "0",
                "0",
                "0",
                "0"
            ],
            "memUsage": [
                "0",
                "0.77",
                "2.09",
                "0",
                "0",
                "0",
                "0"
            ],
            "gpuUtil": [
                "0",
                "0.25",
                "0.88",
                "0",
                "0",
                "0",
                "0"
            ]
    },
        "dataset_name": "dataset-test",
        "dataset_version_name": "dataset-version-test",
        "spec_code": "modelarts.vm.gpu.p100",
        "start_time": 1563172362000,
        "volumes": [
            {
                "nfs": {
                    "id": "43b37236-9afa-4855-8174-32254b9562e7",
                    "src_path": "192.168.8.150:/",
                    "dest_path": "/home/work/nas",
                    "read_only": false
                }
            },
            {
                "host_path": {
                    "src_path": "/root/work",
                    "dest_path": "/home/mind",
                    "read_only": false
                }
            }
        ],
        "pool_id": "pool9928813f",
        "pool_name": "p100",
        "nas_mount_path": "/home/work/nas",
        "nas_share_addr": "192.168.8.150:/",
        "nas_type": "nfs"
    }
  • Failed response
    {
        "is_success": false,
        "error_message": "Error string",
        "error_code": "ModelArts.0105"
    }

Status Code

For details about the status code, see Status Code.