Cloud servingAdvanced6h

AWS SageMaker.

Training and deploying models on AWS's ML platform.

What is SageMaker?

SageMaker is AWS's managed platform for the ML lifecycle: training jobs, hyperparameter tuning, model hosting, and endpoints, without managing the underlying servers. It handles the infrastructure so you can focus on the model.

Why it matters

Many companies run their ML on a managed cloud platform rather than hand-rolling servers, and SageMaker is the most common on AWS. Knowing how managed training and serving works — and what it costs — rounds out your ability to ship models at company scale, not just on your laptop.

What to learn

  • Training jobs on managed infrastructure
  • Built-in algorithms versus bring-your-own-container
  • Hosting models as endpoints
  • Real-time versus batch transform
  • Autoscaling endpoints
  • The cost model and shutting things down
  • Where SageMaker fits versus a plain container

Common pitfall

Leaving expensive GPU endpoints or notebook instances running idle. SageMaker bills by the hour for provisioned resources, and a forgotten endpoint quietly runs up a large bill. Shut down endpoints and instances when not in use, and set budget alerts, because managed convenience makes it easy to forget what is running.

Resources

Primary (free):

Practice

In SageMaker, train a small model with a managed training job and deploy it to an endpoint, then call the endpoint for a prediction. Immediately delete the endpoint afterward and confirm it is gone. Done when you have trained, served, and torn down without leaving anything billing.

Outcomes

  • Run a managed training job on SageMaker.
  • Deploy a model as a real-time endpoint.
  • Choose real-time versus batch transform.
  • Control cost by shutting down idle resources.
Back to AI / ML roadmap