NVIDIA Nemotron 3 Ultra now available on Amazon SageMaker JumpStart

Today we are excited to announce day zero availability of NVIDIA Nemotron 3 Ultra on Amazon SageMaker JumpStart.

With this launch, you can now deploy the Nemotron 3 Ultra model using a one-click deployment experience. Nemotron 3 Ultra is an open model designed for cutting-edge reasoning and orchestration in long-running autonomous agents, delivering 5x faster inference and up to 30% lower cost for agent workloads. Nemotron 3 Ultra is optimized for the NVFP4 format, making the model much faster and cost-effective to host.

Introducing NVIDIA Nemotron 3 Ultra

NVIDIA Nemotron 3 Ultra is a large open language model with 550 billion total parameters and 55 billion active parameters. It is based on a hybrid Transformer-Mamba Mixture-of-Experts (MoE) architecture, designed to provide cutting-edge intelligence at a fraction of the cost of computing dense models of equivalent quality.

Specification	Details
Architecture	Hybrid Transformer-Mamba MoE
Settings	550 B in total / 55 B active
Context length	Up to 1 million tokens
Input/Output	Incoming text, outgoing text
Precision	NVFP4
Inference speed	5x faster for long-running agent workflows
Cost	Up to 30% off for complex agent tasks

Why Agentic AI Needs Purpose-Built Models

Agents don’t respond just once. They plan, call tools, delegate work to subagents, check the results, and perform hundreds of rounds. Each step adds tokens and calculations, so the metrics that matter are task completion with useful accuracy, finish time, and cost per task.

Nemotron 3 Ultra solves this problem directly. Its MoE architecture activates only 55 B of its 550 B parameters per direct pass, maintaining high throughput even with context lengths of a million tokens. This means agents can maintain planning, tool invocation, and self-correction loops that span hundreds of rounds while helping maintain consistency and manage costs.

Enterprise Use Cases

Nemotron 3 Ultra excels in workloads that require sustained, multi-step reasoning:

Agent orchestrators – coordinate multiple subagents, manage state over long tool call chains

Coding Agents – generate, test, debug and iterate on code in large repositories

In-depth research – synthesize information from multiple sources, maintain coherent reasoning in an extended context

Complex business workflows – automate business processes in several stages with decision-making branching and error recovery

Getting Started with SageMaker JumpStart

You can deploy Nemotron 3 Ultra through Amazon SageMaker JumpStart with one-click deployment, eliminating the need to manage infrastructure or configure service infrastructures.

Prerequisites

Before you begin, make sure you have:

An AWS account

Correctly extended permissions for SageMaker JumpStart

Sufficient service quota for GPU instances (for example, ml.p5en.48xlarge, ml.p5.48xlarge, or ml.g7e.48xlarge)

Important: Deploying this template creates a SageMaker endpoint that incurs charges when it runs. GPU instances like ml.p5en.48xlarge can cost several dollars per hour. See Amazon SageMaker AI pricing for details. Don’t forget to delete your endpoint when finished to avoid recurring charges.

Deploy Using SageMaker Studio

Open Amazon SageMaker Studio

In the left navigation pane, choose SageMaker JumpStart

Search Nemotron 3 Ultra

Select the template card

Choose Deploy

Select your instance type (supported instance types are ml.p5en.48xlarge, ml.p5.48xlarge, or ml.g7e.48xlarge)

Check deployment settings (defaults are sufficient for most use cases)

Choose Deploy to create the endpoint

Wait for the endpoint status to show InService before performing inference

Deploy Using the SageMaker Python SDK

import sagemaker

from sagemaker.jumpstart.model import JumpStartModel



model = JumpStartModel(

    model_id="huggingface-reasoning-nvidia-nemotron-3-ultra-550b-a55b-nvfp4",  # Check in SageMaker JumpStart model map

    role=sagemaker.get_execution_role(),  # Your SageMaker execution role ARN

)

predictor = model.deploy(accept_eula=True)

Run inference

payload = {

    "messages": [{

        "role": "user",

        "content": "Break this task into subtasks, identify which tools are needed, and run them in sequence."

    }],

    "max_tokens": 20480,

    "temperature": 0.6,

    "top_p": 0.95,

}

response = predictor.predict(payload)

print(response["choices"][0]["message"]["content"])

To Clean

To avoid incurring unnecessary costs, delete the SageMaker endpoint when you are finished:

predictor.delete_endpoint()

Conclusion

NVIDIA Nemotron 3 Ultra brings cutting-edge reasoning to Amazon SageMaker JumpStart with 5x faster inference and up to 30% lower cost for agent workloads. Its hybrid Transformer-Mamba MoE architecture and million-token popup make it purpose-built for the sustained, multi-step reasoning demanded by production workers.

Whether you’re building agent orchestrators, coding agents, deep search systems, or complex enterprise automation, Nemotron 3 Ultra is ready to deploy today from SageMaker JumpStart.

Get started now by searching for Nemotron 3 Ultra in Amazon SageMaker JumpStart.

About the Authors

Dan Ferguson is a Solutions Architect at AWS, based in New York, USA. As an expert in machine learning services, Dan strives to support clients on their journey to integrating ML workflows effectively, efficiently, and sustainably.

Malav Shastri is a software development engineer at AWS, where he works on the Amazon SageMaker JumpStart and Amazon Bedrock teams. Its role is to enable customers to benefit from cutting-edge open source and proprietary foundation models. Malav holds a master’s degree in computer science.

Vivek Gangasani is a global leader in solutions architecture, SageMaker Inference. He leads the solution architecture, technical go-to-market (GTM), and outbound product strategy for SageMaker Inference. It also helps enterprises and startups deploy and optimize GenAI models and build AI workflows with SageMaker and GPUs. Currently, he is focused on developing strategies and content to optimize inference performance and use cases such as agentic workflows, RAG, etc. In his free time, Vivek enjoys hiking, watching movies, and trying different cuisines.

For more information, visit the source link Here.

“`

Microsoft adds another year to Windows 10 Extended Update (arstechnica.com) 41

Our latest Google Finance upgrades, including a new app

This is the best way to get a PS5 without paying full price

How Cara is pioneering domain-specific AI for enterprise insurance brokerages with AWS

NVIDIA Nemotron 3 Ultra now available on Amazon SageMaker JumpStart

Introducing NVIDIA Nemotron 3 Ultra

Why Agentic AI Needs Purpose-Built Models

Enterprise Use Cases

Getting Started with SageMaker JumpStart

Prerequisites

Deploy Using SageMaker Studio

Deploy Using the SageMaker Python SDK

To Clean

Conclusion

About the Authors

Microsoft adds another year to Windows 10 Extended Update (arstechnica.com) 41

Our latest Google Finance upgrades, including a new app

This is the best way to get a PS5 without paying full price

How Cara is pioneering domain-specific AI for enterprise insurance brokerages with AWS

As the PBM industry changes, LucyRx and Abarca Health merge to build scale

How Cara is pioneering domain-specific AI for enterprise insurance brokerages with AWS

Use Gemini to create Google Sheets

I deleted all the static Claude API keys I had. Here’s the keyless migration, vendor by vendor.

Thinking to remember: how reasoning unlocks parametric knowledge in LLMs

Building AI agents in Rust – part 4

LEAVE A REPLY Cancel reply

Useful Links

Latest News

Our latest Google Finance upgrades, including a new app

This is the best way to get a PS5 without paying full price

How Cara is pioneering domain-specific AI for enterprise insurance brokerages with AWS

Our Newsletter