Skip to content

Add support for serverless inference endpoints#242

Merged
shchur merged 4 commits into
autogluon:masterfrom
shchur:add-serverless-inference
Jun 2, 2026
Merged

Add support for serverless inference endpoints#242
shchur merged 4 commits into
autogluon:masterfrom
shchur:add-serverless-inference

Conversation

@shchur
Copy link
Copy Markdown
Collaborator

@shchur shchur commented Jun 2, 2026

Issue #, if available:

Description of changes:

  • Add inference_mode ("realtime" | "serverless") and inference_config kwargs to SagemakerBackend.deploy, CloudPredictor.deploy, and TimeSeriesFoundationModel.deploy. Serverless uses ServerlessInferenceConfig with preset memory_size_in_mb=4096, max_concurrency=5, overlaid by user inference_config. Sets LOG_LOCATION/METRICS_LOCATION=/tmp for serverless (TorchServe fails on read-only /).
  • Add JobPredictionFuture (endpoint/prediction_future.py) — status() / result() handle returned by predict(wait=False) instead of None.
  • Bump version 0.4.20.5.0.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

Job PR-242-1cec676 is done.
Docs are uploaded to https://d12sc05jpx1wj5.cloudfront.net/PR-242/1cec676/index.html

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

Job PR-242-a511459 is done.
Docs are uploaded to https://d12sc05jpx1wj5.cloudfront.net/PR-242/a511459/index.html

@shchur shchur merged commit 4a6949d into autogluon:master Jun 2, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant