Skip to content

Add support for serverless inference#239

Closed
shchur wants to merge 13 commits into
autogluon:masterfrom
shchur:inference-modes
Closed

Add support for serverless inference#239
shchur wants to merge 13 commits into
autogluon:masterfrom
shchur:inference-modes

Conversation

@shchur
Copy link
Copy Markdown
Collaborator

@shchur shchur commented Jun 1, 2026

Issue #, if available:

Description of changes:

  • Added inference_mode parameter on CloudPredictor.deploy, TimeSeriesFoundationModel.deploy, and SagemakerBackend.deploy to provision serverless endpoints (inference_mode="serverless") in addition to the default realtime ones
  • Added inference_config: Optional[Dict[str, Any]] for mode-specific overrides, forwarded to sagemaker.serverless.ServerlessInferenceConfig; defaults to memory_size_in_mb=4096, max_concurrency=5
  • Raise ValueError if instance_type is set together with inference_mode="serverless"
  • predict(wait=False) now returns a JobPredictionFuture (with output_path, status(), result()) instead of None; @overload added so type checkers narrow predict(...) to pd.DataFrame and predict(..., wait=False) to JobPredictionFuture
  • Renamed backend-internal serve_configfm_serve_config
  • Renamed tutorial files: autogluon-cloud.mdcloud-predictor.md, foundation_model.mdfoundation-model.md
  • Added "Choosing an Inference Option" section + Serverless Endpoint subsection to the FM tutorial
  • Added unit tests pinning the dispatch contract between inference_mode and sagemaker.Model.deploy(...) kwargs, plus a CI-only integration test deploying Chronos-2 to a serverless endpoint

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@shchur shchur changed the title Add support for serverless and async inference Add support for serverless inference Jun 1, 2026
@shchur shchur marked this pull request as draft June 2, 2026 09:46
@shchur shchur closed this Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant