Add support for serverless inference endpoints by shchur · Pull Request #242 · autogluon/autogluon-cloud

shchur · 2026-06-02T14:07:06Z

Issue #, if available:

Description of changes:

Add inference_mode ("realtime" | "serverless") and inference_config kwargs to SagemakerBackend.deploy, CloudPredictor.deploy, and TimeSeriesFoundationModel.deploy. Serverless uses ServerlessInferenceConfig with preset memory_size_in_mb=4096, max_concurrency=5, overlaid by user inference_config. Sets LOG_LOCATION/METRICS_LOCATION=/tmp for serverless (TorchServe fails on read-only /).
Add JobPredictionFuture (endpoint/prediction_future.py) — status() / result() handle returned by predict(wait=False) instead of None.
Bump version 0.4.2 → 0.5.0.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

github-actions · 2026-06-02T14:41:51Z

github-actions · 2026-06-02T15:48:38Z

shchur added 3 commits June 2, 2026 14:02

Add serverless inference mode

1cec676

Simplify instance validation logic

6c72127

Fix default instance_type

2ee3288

Fix failing test

a511459

shchur merged commit 4a6949d into autogluon:master Jun 2, 2026
12 checks passed

Provide feedback