Create Training Pipelines; An Introduction
Training pipelines let you perform custom machine learning (ML) training and automatically create a 'Model' resource based on your training output.
Before you create a pipeline
Before you create a training pipeline on Vertex AI, you need to create a 'Python training application' or a 'custom container' to define the training code and dependencies you want to run on Vertex AI. If you create a Python training application using TensorFlow, scikit-learn, or XGBoost, you can use our pre-built containers to run your code. If you're not sure which of these options to choose, refer to the 'training code requirements' to learn more.
Training Pipeline options:
Training pipelines let you perform custom machine learning (ML) training and automatically create a 'Model' resource based on your training output.
Before you create a pipeline
Before you create a training pipeline on Vertex AI, you need to create a 'Python training application' or a 'custom container' to define the training code and dependencies you want to run on Vertex AI. If you create a Python training application using TensorFlow, scikit-learn, or XGBoost, you can use our pre-built containers to run your code. If you're not sure which of these options to choose, refer to the 'training code requirements' to learn more.
Training Pipeline Options:
A training pipeline encapsulates training jobs with additional steps. This guide explains two different training pipelines:
1. Launch a CustomJob and upload the resulting model to Vertex AI.
2. Launch a hyperparameter runing job and upload the resulting model to Vertex AI.
What a CustomJob includes:
When you create a custom job, you specify settings that Vertex AI needs to run your training code, including:
1. One worker pool for single-node training(WorkerPoolSpec), or multiple worker pools for distributed training.
2. Optional settings for configuring job scheduling (Scheduling), setting certain environment variables for your training code, using a custom service account and using VPC Network Peering.
Within the worker pool(s), you can specify the following settings:
1. Machine types and accelerators.
2. Configuration of what type of training code the worker pool runs: either a Python training application (PythonPackageSpec) or a custom container (ContainerSpec).
If you want to create a standalone custom job outside of a Vertex AI training pipeline, refer to the guide on custom jobs.
Configure your pipeline to use a managed dataset
Within your training pipeline, you can configure your custom training job or hyperparameter tuning job to use a managed dataset. Managed datasets let you manage your datasets with your training applications and models.
To use a managed dataset in your training pipeline:
1. Create your dataset.
2. Update your training application to use a managed dataset.
3. Specify a managed dataset when you create your training pipeline.
Configure distributed training
Within your training pipeline, you can configure your custom training job or hyperparameter tuning job for distributed training by specifying multiple worker pools.
All the examples on this page show single-replica training jobs with one worker pool. To modify them for distributed training:
1. Use your first worker pool to configure your primary replica, and set the replica count to 1.
2. Add more worker pools to configure worker replicas, parameter server replicas, or evaluator replicas, if your machine learning framework supports these additional cluster tasks for distributed training.
Comments
Post a Comment