HomeMachine LearningNVIDIA Nemotron 3 Ultra now available on Amazon SageMaker JumpStart

NVIDIA Nemotron 3 Ultra now available on Amazon SageMaker JumpStart

Today we are excited to announce day zero availability of NVIDIA Nemotron 3 Ultra on Amazon SageMaker JumpStart.

With this launch, you can now deploy the Nemotron 3 Ultra model using a one-click deployment experience. Nemotron 3 Ultra is an open model designed for cutting-edge reasoning and orchestration in long-running autonomous agents, delivering 5x faster inference and up to 30% lower cost for agent workloads. Nemotron 3 Ultra is optimized for the NVFP4 format, making the model much faster and cost-effective to host.

Introducing NVIDIA Nemotron 3 Ultra

NVIDIA Nemotron 3 Ultra is a large open language model with 550 billion total parameters and 55 billion active parameters. It is based on a hybrid Transformer-Mamba Mixture-of-Experts (MoE) architecture, designed to provide cutting-edge intelligence at a fraction of the cost of computing dense models of equivalent quality.

SpecificationDetails
ArchitectureHybrid Transformer-Mamba MoE
Settings550 B in total / 55 B active
Context lengthUp to 1 million tokens
Input/OutputIncoming text, outgoing text
PrecisionNVFP4
Inference speed5x faster for long-running agent workflows
CostUp to 30% off for complex agent tasks

Why Agentic AI Needs Purpose-Built Models

Agents don’t respond just once. They plan, call tools, delegate work to subagents, check the results, and perform hundreds of rounds. Each step adds tokens and calculations, so the metrics that matter are task completion with useful accuracy, finish time, and cost per task.

Nemotron 3 Ultra solves this problem directly. Its MoE architecture activates only 55 B of its 550 B parameters per direct pass, maintaining high throughput even with context lengths of a million tokens. This means agents can maintain planning, tool invocation, and self-correction loops that span hundreds of rounds while helping maintain consistency and manage costs.

Enterprise Use Cases

Nemotron 3 Ultra excels in workloads that require sustained, multi-step reasoning:

  • Agent orchestrators – coordinate multiple subagents, manage state over long tool call chains
  • Coding Agents – generate, test, debug and iterate on code in large repositories
  • In-depth research – synthesize information from multiple sources, maintain coherent reasoning in an extended context
  • Complex business workflows – automate business processes in several stages with decision-making branching and error recovery

Getting Started with SageMaker JumpStart

You can deploy Nemotron 3 Ultra through Amazon SageMaker JumpStart with one-click deployment, eliminating the need to manage infrastructure or configure service infrastructures.

Prerequisites

Before you begin, make sure you have:

  • An AWS account
  • Correctly extended permissions for SageMaker JumpStart
  • Sufficient service quota for GPU instances (for example, ml.p5en.48xlarge, ml.p5.48xlarge, or ml.g7e.48xlarge)

Important: Deploying this template creates a SageMaker endpoint that incurs charges when it runs. GPU instances like ml.p5en.48xlarge can cost several dollars per hour. See Amazon SageMaker AI pricing for details. Don’t forget to delete your endpoint when finished to avoid recurring charges.

Deploy Using SageMaker Studio

  1. Open Amazon SageMaker Studio
  2. In the left navigation pane, choose SageMaker JumpStart
  3. Search Nemotron 3 Ultra
  4. Select the template card
  5. Choose Deploy
  6. Select your instance type (supported instance types are ml.p5en.48xlarge, ml.p5.48xlarge, or ml.g7e.48xlarge)
  7. Check deployment settings (defaults are sufficient for most use cases)
  8. Choose Deploy to create the endpoint
  9. Wait for the endpoint status to show InService before performing inference

Deploy Using the SageMaker Python SDK

import sagemaker
from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(
model_id="huggingface-reasoning-nvidia-nemotron-3-ultra-550b-a55b-nvfp4", # Check in SageMaker JumpStart model map
role=sagemaker.get_execution_role(), # Your SageMaker execution role ARN
)
predictor = model.deploy(accept_eula=True)

Run inference

payload = {
"messages": [{
"role": "user",
"content": "Break this task into subtasks, identify which tools are needed, and run them in sequence."
}],
"max_tokens": 20480,
"temperature": 0.6,
"top_p": 0.95,
}
response = predictor.predict(payload)
print(response["choices"][0]["message"]["content"])

To Clean

To avoid incurring unnecessary costs, delete the SageMaker endpoint when you are finished:

predictor.delete_endpoint()

Conclusion

NVIDIA Nemotron 3 Ultra brings cutting-edge reasoning to Amazon SageMaker JumpStart with 5x faster inference and up to 30% lower cost for agent workloads. Its hybrid Transformer-Mamba MoE architecture and million-token popup make it purpose-built for the sustained, multi-step reasoning demanded by production workers.

Whether you’re building agent orchestrators, coding agents, deep search systems, or complex enterprise automation, Nemotron 3 Ultra is ready to deploy today from SageMaker JumpStart.

Get started now by searching for Nemotron 3 Ultra in Amazon SageMaker JumpStart.

About the Authors

Dan Ferguson is a Solutions Architect at AWS, based in New York, USA. As an expert in machine learning services, Dan strives to support clients on their journey to integrating ML workflows effectively, efficiently, and sustainably.

Malav Shastri is a software development engineer at AWS, where he works on the Amazon SageMaker JumpStart and Amazon Bedrock teams. Its role is to enable customers to benefit from cutting-edge open source and proprietary foundation models. Malav holds a master’s degree in computer science.

Vivek Gangasani is a global leader in solutions architecture, SageMaker Inference. He leads the solution architecture, technical go-to-market (GTM), and outbound product strategy for SageMaker Inference. It also helps enterprises and startups deploy and optimize GenAI models and build AI workflows with SageMaker and GPUs. Currently, he is focused on developing strategies and content to optimize inference performance and use cases such as agentic workflows, RAG, etc. In his free time, Vivek enjoys hiking, watching movies, and trying different cuisines.

For more information, visit the source link Here.

“`

Must Read
Related News

LEAVE A REPLY

Please enter your comment!
Please enter your name here