Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,6 @@ dmypy.json

src/autogluon/cloud/version.py
VERSION.minor
src/autogluon/cloud/templates/*.yaml
.idea
.vscode
.DS_Store
Expand Down
7 changes: 7 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,18 @@
"sphinx_togglebutton", # sphinx-togglebutton.readthedocs.io
"sphinx.ext.autodoc", # www.sphinx-doc.org/en/master/usage/extensions/autodoc.html
"sphinx.ext.autosummary", # www.sphinx-doc.org/en/master/usage/extensions/autosummary.html
"sphinx.ext.extlinks", # www.sphinx-doc.org/en/master/usage/extensions/extlinks.html
"sphinx.ext.napoleon", # www.sphinx-doc.org/en/master/usage/extensions/napoleon.html
"sphinx.ext.viewcode", # www.sphinx-doc.org/en/master/usage/extensions/viewcode.html
"sphinxcontrib.googleanalytics", # github.com/sphinx-contrib/googleanalytics
]

# Pin links to repo files at the current release tag so released docs don't dangle on master.
# Usage in markdown: {repo-file}`src/autogluon/cloud/templates/ag_cloud_sagemaker.yaml`
extlinks = {
"repo-file": (f"https://github.com/autogluon/autogluon-cloud/blob/v{release}/%s", "%s"),
}

# See https://myst-parser.readthedocs.io/en/latest/syntax/optional.html
myst_enable_extensions = ["colon_fence", "deflist", "dollarmath", "html_image", "substitution"]

Expand Down
179 changes: 7 additions & 172 deletions docs/tutorials/autogluon-cloud.md
Original file line number Diff line number Diff line change
@@ -1,185 +1,20 @@
# Train and Deploy AutoGluon Models on Amazon SageMaker with AutoGluon-Cloud
# Train and Deploy AutoGluon Models with AutoGluon-Cloud

To help with AutoGluon models training, AWS developed a set of training and inference [deep learning containers](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#autogluon-training-containers).
The containers can be used to train models with CPU and GPU instances and deployed as a SageMaker endpoint or used as a batch transform job.
AutoGluon-Cloud lets you train, deploy, and run inference with AutoGluon models on AWS using the same APIs you'd use locally. Under the hood, it runs your jobs on [Amazon SageMaker](https://aws.amazon.com/sagemaker/) using AWS's official [AutoGluon deep learning containers](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#autogluon-training-containers) — so you don't manage any infrastructure yourself.

We offer the [autogluon.cloud](https://github.com/autogluon/autogluon-cloud) module to utilize those containers and [Amazon SageMaker](https://aws.amazon.com/sagemaker/) underneath to train/deploy AutoGluon backed models with simple APIs.
It supports the `tabular`, `timeseries`, and `multimodal` predictors. The examples below use `TabularCloudPredictor`; the others share the same API.

```{attention}
Costs for running cloud compute are managed by Amazon SageMaker, and storage costs are managed by AWS S3. AutoGluon-Cloud is a wrapper to these services at no additional charge. While AutoGluon-Cloud makes an effort to simplify the usage of these services, it is ultimately the user's responsibility to monitor compute usage within their account to avoid unexpected charges.
```


## Installation
`autogluon.cloud` does not come with the default `autogluon` installation. You can install it via:

```bash
pip install autogluon.cloud
```{note}
This tutorial assumes you've already set up AutoGluon-Cloud on AWS. If you haven't, see [Setup](setup.md) first.
```

Also ensure that the latest version of sagemaker python API is installed via:

```bash
pip install -U sagemaker
```

This is required to ensure the information about newly released containers is available.

## Prepare an IAM Role with Necessary Permissions
Currently, AutoGluon-Cloud can use two cloud backends: **Amazon SageMaker** and **Ray (AWS)**.
Here is an overview of the features supported by each backend.

| Feature | SageMaker | Ray (AWS) |
|--------------------------------|---------------|--------------|
| **Supported modalities** | `tabular`, `timeseries`, `multimodal` | `tabular` |
| **Training (single instance)** | ✅ | ✅ |
| **Training (distributed)** | ❌ | ✅ |
| **Inference endpoints** | ✅ | ❌ |
| **Batch inference** | ✅ | ❌ |

AutoGluon-Cloud needs to interact with various AWS resources. For this purpose, we recommend to set up a dedicated IAM role with the necessary permissions. This can be done using one of the following options.

::::{tab-set}
:::{tab-item} CloudFormation (AWS CLI)
:sync: setup-cli
1. Download and review the CloudFormation template from the [AutoGluon-Cloud repository](https://github.com/autogluon/autogluon-cloud/tree/master/cloudformation)
```bash
BACKEND="sagemaker" # Supported options "sagemaker", "ray_aws"
wget https://raw.githubusercontent.com/autogluon/autogluon-cloud/refs/heads/master/cloudformation/ag_cloud_$BACKEND.yaml
```
```{note}
Make sure you review the IAM policy defined in the CloudFormation template, and make necessary changes according to your use case before applying it.
```

2. Deploy the CloudFormation stack
```bash
aws cloudformation create-stack \
--stack-name ag-cloud \ # use your preferred stack name
--template-body file://ag_cloud_$BACKEND.yaml \
--capabilities CAPABILITY_NAMED_IAM # give permission to create IAM roles
```

3. Review the outputs produced by the stack
```bash
aws cloudformation describe-stacks --stack-name ag-cloud --query "Stacks[0].Outputs"
```
The output should contain the **name of the S3 bucket** and the **ARN of the IAM role** created for AutoGluon-Cloud.
```json
[
{
"OutputKey": "BucketName",
"OutputValue": "ag-cloud-bucket-abcd1234",
"Description": "S3 bucket where AutoGluon-Cloud will save trained predictors"
},
{
"OutputKey": "RoleARN",
"OutputValue": "arn:aws:iam::222222222222:role/ag-cloud-execution-role",
"Description": "ARN of the created IAM role for AutoGluon-Cloud to run on SageMaker"
}
]
```

:::
:::{tab-item} CloudFormation (AWS Console)
:sync: setup-cfn
1. Download and review the CloudFormation template for the backend of your choice from the [AutoGluon-Cloud repository](https://github.com/autogluon/autogluon-cloud/tree/master/cloudformation)
- Template for [SageMaker](https://raw.githubusercontent.com/autogluon/autogluon-cloud/refs/heads/master/cloudformation/ag_cloud_sagemaker.yaml)
- Template for [Ray (AWS)](https://raw.githubusercontent.com/autogluon/autogluon-cloud/refs/heads/master/cloudformation/ag_cloud_ray_aws.yaml)

```{note}
Make sure you review the IAM policy defined in the CloudFormation template, and make necessary changes according to your use case before applying it.
```

2. Log in to the <a href="https://console.aws.amazon.com" target="_blank" rel="noopener noreferrer">AWS Console</a>. Make sure to select the region where you would like to use AutoGluon-Cloud.
4. Go to <a href="https://console.aws.amazon.com/cloudformation/home#/stacks/create" target="_blank" rel="noopener noreferrer">CloudFormation > Stacks > Create stack</a> and create a stack using the CloudFormation template downloaded in Step 1.

5. After the stack is created, go to the *Outputs* tab and view the **name of the S3 bucket** and the **ARN of the IAM role** created for AutoGluon-Cloud
![img](img/stack-outputs.png)

:::
:::{tab-item} Manual
:sync: setup-manual
1. Create an S3 bucket for AutoGluon-Cloud to store predictors. Replace `S3_BUCKET_NAME` with your preferred name for the bucket.
```bash
aws s3 mb s3://S3_BUCKET_NAME
```

2. Generate trust relationship and IAM policy with our utils via the following command
```python
from autogluon.cloud import TabularCloudPredictor # Can be other CloudPredictor as well

TabularCloudPredictor.generate_default_permission(
backend="BACKEND_YOU_WANT", # We currently support "sagemaker" and "ray_aws"
account_id="YOUR_ACCOUNT_ID", # The AWS account ID you plan to use for CloudPredictor.
cloud_output_bucket="S3_BUCKET_NAME" # S3 bucket name where intermediate artifacts will be uploaded and trained models should be saved. You need to create this bucket beforehand.
)
```
```{note}
Make sure you review the trust relationship and IAM policy files, and make necessary changes according to your use case before applying them.
```
In the following steps, make sure to replace `AUTOGLUON-ROLE-NAME` with your desired role name, `AUTOGLUON-POLICY-NAME` with your desired policy name, and `222222222222` with your AWS account number.

3. Create the IAM role.
```bash
aws iam create-role --role-name AUTOGLUON-ROLE-NAME --assume-role-policy-document file://ag_cloud_sagemaker_trust_relationship.json
```
This method will return the **role ARN** that looks similar to `arn:aws:iam::222222222222:role/AUTOGLUON-ROLE-NAME`. Keep it for further reference.

4. Create the IAM policy.
```bash
aws iam create-policy --policy-name AUTOGLUON-POLICY-NAME --policy-document file://ag_cloud_sagemaker_iam_policy.json
```
This method will return the **policy ARN** that looks similar to `arn:aws:iam::222222222222:policy/AUTOGLUON-POLICY-NAME`. Keep it for further reference.

5. Attach the IAM policy to the role.
```bash
aws iam attach-role-policy --role-name AUTOGLUON-ROLE-NAME --policy-arn "arn:aws:iam::222222222222:policy/AUTOGLUON-POLICY-NAME"
```
:::
::::

Make sure to remember:
- **ARN of the IAM role** created for AutoGluon-Cloud
- **Name of the S3 bucket**, where AutoGluon-Cloud will store the training artifacts

After completing the setup, assume the IAM role using AWS CLI or boto3.

::::{tab-set}
:::{tab-item} Python / boto3
:sync: assume-boto3
```python
import boto3

# Replace this with the ARN of your AutoGluon-Cloud IAM role
ROLE_ARN = "arn:aws:iam::222222222222:role/AUTOGLUON-ROLE-NAME"

session = boto3.Session()
credentials = session.client("sts").assume_role(
RoleArn=ROLE_ARN,
RoleSessionName="AutoGluonCloudSession"
)["Credentials"]

boto3.setup_default_session(
aws_access_key_id=credentials["AccessKeyId"],
aws_secret_access_key=credentials["SecretAccessKey"],
aws_session_token=credentials["SessionToken"],
)
```{attention}
SageMaker compute and S3 storage are billed to your AWS account. AutoGluon-Cloud is a free wrapper, but it's your responsibility to monitor usage to avoid unexpected charges.
```
Now when you use `autogluon.cloud` in the same Python script / Jupyter notebook, the correct IAM role will be used.
:::
:::{tab-item} AWS CLI
:sync: assume-cli
See section "Assume the IAM role" in this [tutorial](https://repost.aws/knowledge-center/iam-assume-role-cli).
:::
::::

For more details on setting up IAM roles and policies, refer to this [tutorial](https://aws.amazon.com/premiumsupport/knowledge-center/iam-assume-role-cli/).

## Training
Using `autogluon.cloud` to train AutoGluon backed models is simple and not too much different from training an AutoGluon predictor directly.

Currently, `autogluon.cloud` supports training/deploying `tabular`, `multimodal` and `timeseries` predictors. In the example below, we use `TabularCloudPredictor` for demonstration. You can substitute it with other `CloudPredictors` easily as they share the same APIs.

```python
from autogluon.cloud import TabularCloudPredictor
train_data = "train.csv" # can be a DataFrame as well
Expand Down
7 changes: 7 additions & 0 deletions docs/tutorials/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,12 @@
::::{grid} 2
:gutter: 3

:::{grid-item-card} Setup
:link: setup.html

Set up AutoGluon-Cloud on AWS in one command with `bootstrap`, or register existing AWS resources you already have.
:::

:::{grid-item-card} AutoGluon-Cloud
:link: autogluon-cloud.html

Expand All @@ -21,6 +27,7 @@ maxdepth: 2
hidden: true
---

Setup <setup>
Essentials <autogluon-cloud>
Foundation Models <foundation_model>
Image Modality <image-modality>
Expand Down
150 changes: 150 additions & 0 deletions docs/tutorials/setup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
# Set Up AutoGluon-Cloud on AWS

AutoGluon-Cloud trains and deploys models on AWS SageMaker on your behalf. To do that, every `CloudPredictor` or `FoundationModel` you create needs two AWS resources:

- An **IAM role** that SageMaker assumes to run training and inference jobs.
- An **S3 bucket** to stage training data and store trained models.

You have two options for supplying them:

1. **Save them once** to `~/.autogluon/cloud.yaml`, and AutoGluon-Cloud will pick them up automatically on every call. This is the recommended path — set it up with [`bootstrap`](#bootstrap) or [`register`](#register) below.
2. **Pass them explicitly** to each `CloudPredictor` / `FoundationModel`, e.g. `CloudPredictor(role="arn:aws:iam::...", cloud_output_path="s3://my-bucket/...")`. Useful if you need different roles or buckets per call, or if you don't want a config file on disk.

The rest of this page covers option 1.

## Commands

AutoGluon-Cloud ships four commands for managing the saved configuration:

| Command | What it does | When to use it |
|---|---|---|
| [`bootstrap`](#bootstrap) | Provisions a role and bucket via CloudFormation, then saves them. | First-time setup with no existing AWS resources. |
| [`register`](#register) | Saves an existing role and bucket without provisioning anything. | Your platform team already gave you a role and bucket. |
| [`status`](#status) | Verifies the saved resources still exist and are accessible. | Sanity-check before training, or after IAM/S3 changes. |
| [`teardown`](#teardown) | Deletes resources created by `bootstrap` and the saved config. | Cleanup when you're done with AutoGluon-Cloud. |

Each command is available both as a CLI subcommand (`autogluon-cloud <command>`) and as a Python function (`from autogluon.cloud import <command>`). The sections below show both forms.

## Install

```bash
pip install -U autogluon.cloud
```

This installs the `autogluon-cloud` CLI alongside the Python API.


## `bootstrap`

Provisions an IAM role and S3 bucket via CloudFormation, then saves them to `~/.autogluon/cloud.yaml`. Use this if you don't already have AWS resources for AutoGluon-Cloud.

`bootstrap` uses the [standard boto3 credential resolution order](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#configuring-credentials) to find your AWS credentials, so anything that works for the AWS CLI or boto3 will work here (`aws configure`, `AWS_*` environment variables, an active SSO session, or an instance profile). Run:

::::{tab-set}
:::{tab-item} CLI
:sync: setup-cli
```bash
autogluon-cloud bootstrap
```
:::
:::{tab-item} Python
:sync: setup-py
```python
from autogluon.cloud import bootstrap

bootstrap()
```
:::
::::

The CloudFormation stack is named `ag-cloud-sagemaker` by default. Subsequent `CloudPredictor` calls pick the saved values up automatically.

```{note}
Review the CloudFormation template before deploying: {repo-file}`src/autogluon/cloud/templates/ag_cloud_sagemaker.yaml`.
```


## `register`

Tells AutoGluon-Cloud to use an IAM role and S3 bucket you already have. Use this when your platform team has provisioned them for you and you want to skip CloudFormation.

::::{tab-set}
:::{tab-item} CLI
:sync: setup-cli
```bash
autogluon-cloud register \
--role arn:aws:iam::222222222222:role/MyAutoGluonRole \
--bucket my-autogluon-bucket \
--region us-east-1
```
:::
:::{tab-item} Python
:sync: setup-py
```python
from autogluon.cloud import register

register(
role="arn:aws:iam::222222222222:role/MyAutoGluonRole",
bucket="my-autogluon-bucket",
region="us-east-1",
)
```
:::
::::

`register` makes no AWS calls — it only persists the values to `~/.autogluon/cloud.yaml`. The IAM role must trust `sagemaker.amazonaws.com` and have permissions equivalent to AWS's [`AmazonSageMakerFullAccess`](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AmazonSageMakerFullAccess.html) managed policy plus read/write access to your bucket.


## `status`

Verifies that the saved IAM role, S3 bucket, and (if applicable) CloudFormation stack still exist and are accessible.

::::{tab-set}
:::{tab-item} CLI
:sync: setup-cli
```bash
autogluon-cloud status
```
:::
:::{tab-item} Python
:sync: setup-py
```python
from autogluon.cloud import status

reports = status()
```
:::
::::

`ok` means the resource exists; `ok (unverified ...)` means the caller lacks the IAM permission to verify (the resource is probably fine, but `status` couldn't confirm).


## `teardown`

Deletes the CloudFormation stacks created by `bootstrap` and removes `~/.autogluon/cloud.yaml`. Backends added via `register` only have their config entry removed — your existing role and bucket are left untouched.

::::{tab-set}
:::{tab-item} CLI
:sync: setup-cli
```bash
autogluon-cloud teardown
```
:::
:::{tab-item} Python
:sync: setup-py
```python
from autogluon.cloud import teardown

teardown()
```
:::
::::

```{warning}
CloudFormation refuses to delete non-empty S3 buckets. If your bucket holds training artifacts you want to discard, empty it first with `aws s3 rm s3://<bucket> --recursive`.
```


## Where the config lives

`bootstrap` and `register` both write to `~/.autogluon/cloud.yaml`. The file is keyed by backend, so you can have separate entries for different backends side by side. Override the directory with the `AG_CONFIG_DIR` environment variable.
Loading
Loading