Nemotron 3 Super is now available as a fully managed and serverless model on Amazon Bedrock, joining the Nemotron Nano models that are already available within the Amazon Bedrock environment.
With NVIDIA Nemotron open models on Amazon Bedrock, you can accelerate innovation and deliver tangible business value without managing infrastructure complexities. You can power your generative AI applications with Nemotron through the fully managed inference of Amazon Bedrock, using its extensive features and tooling.
This post explores the technical characteristics of the Nemotron 3 Super model and discusses potential application use cases. It also provides technical guidance to get started using this model for your generative AI applications within the Amazon Bedrock environment.
Nemotron 3 Super is a hybrid Mixture of Experts (MoE) model with leading compute efficiency and accuracy for multi-agent applications and for specialized agentic AI systems. The model is released with open weights, datasets, and recipes so developers can customize, improve, and deploy the model on their infrastructure for enhanced privacy and security.
Model overview:
Nemotron 3 Super uses latent MoE, where experts operate on a shared latent representation before outputs are projected back to token space. This approach allows the model to call on 4x more experts at the same inference cost, enabling better specialization around subtle semantic structures, domain abstractions, or multi-hop reasoning patterns.
MTP enables the model to predict several future tokens in a single forward pass, significantly increasing throughput for long reasoning sequences and structured outputs. For planning, trajectory generation, extended chain-of-thought, or code generation, MTP reduces latency and improves agent responsiveness.
To learn more about Nemotron 3 Super’s architecture and how it is trained, see Introducing Nemotron 3 Super: an Open Hybrid Mamba Transformer MoE for Agentic Reasoning.
Nemotron 3 Super helps power various use cases for different industries. Some of the use cases include
Get Started with NVIDIA Nemotron 3 Super in Amazon Bedrock. Complete the following steps to test NVIDIA Nemotron 3 Super in Amazon Bedrock

After completing the previous steps, you can test the model immediately. To truly showcase Nemotron 3 Super’s capability, we will move beyond simple syntax and task it with a complex engineering challenge. High-reasoning models excel at “system-level” thinking where they must balance architectural trade-offs, concurrency, and distributed state management.
Let’s use the following prompt to design a globally distributed service:
"Design a distributed rate-limiting service in Python that must support 100,000 requests per second across multiple geographic regions.
1. Provide a high-level architectural strategy (e.g., Token Bucket vs. Fixed Window) and justify your choice for a global scale. 2. Write a thread-safe implementation using Redis as the backing store. 3. Address the 'race condition' problem when multiple instances update the same counter. 4. Include a pytest suite that simulates network latency between the app and Redis."
This prompt requires the model to operate as a senior distributed-systems engineer — reasoning about trade-offs, producing thread-safe code, anticipating failure modes, and validating everything with realistic tests, all in a single coherent response.

You can access the model programmatically using the model ID nvidia.nemotron-super-3-120b . The model supports both the InvokeModel and Converse APIs through the AWS Command Line Interface (AWS CLI) and AWS SDK with nvidia.nemotron-super-3-120b as the model ID. Further, it supports the Amazon Bedrock OpenAI SDK compatible API.
Run the following command to invoke the model directly from your terminal using the AWS Command Line Interface (AWS CLI) and the InvokeModel API:
aws bedrock-runtime invoke-model
--model-id nvidia.nemotron-super-3-120b
--region us-west-2
--body '{"messages": [{"role": "user", "content": "Type_Your_Prompt_Here"}], "max_tokens": 512, "temperature": 0.5, "top_p": 0.9}'
--cli-binary-format raw-in-base64-out
invoke-model-output.txt
If you want to invoke the model through the AWS SDK for Python (Boto3), use the following script to send a prompt to the model, in this case by using the Converse API:
import boto3
from botocore.exceptions import ClientError
# Create a Bedrock Runtime client in the AWS Region you want to use.
client = boto3.client("bedrock-runtime", region_name="us-west-2")
# Set the model ID
model_id = "nvidia.nemotron-super-3-120b"
# Start a conversation with the user message.
user_message = "Type_Your_Prompt_Here"
conversation = [
{
"role": "user",
"content": [{"text": user_message}],
}
]
try:
# Send the message to the model using a basic inference configuration.
response = client.converse(
modelId=model_id,
messages=conversation,
inferenceConfig={"maxTokens": 512, "temperature": 0.5, "topP": 0.9},
)
# Extract and print the response text.
response_text = response["output"]["message"]["content"][0]["text"]
print(response_text)
except (ClientError, Exception) as e:
print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
exit(1)
To invoke the model through the Amazon Bedrock OpenAI-compatible ChatCompletions endpoint you can proceed as follows using the OpenAI SDK:
# Import OpenAI SDK
from openai import OpenAI
# Set environment variables
os.environ["OPENAI_API_KEY"] = "<insert your bedrock API key>"
os.environ["OPENAI_BASE_URL"] = "https://bedrock-runtime.<AWS region>.amazon.com/openai/v1"
# Set the model ID
model_id = "nvidia.nemotron-super-3-120b"
# Set prompts
system_prompt = “Type_Your_System_Prompt_Here”
user_message = "Type_Your_User_Prompt_Here"
# Use ChatCompletionsAPI
response = client.chat.completions.create(
model= model _ID,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_message}
],
temperature=0,
max_completion_tokens=1000
)
# Extract and print the response text
print(response.choices[0].message.content)
In this post, we showed you how to get started with NVIDIA Nemotron 3 Super on Amazon Bedrock for building the next generation of agentic AI applications. By combining the model’s advanced Hybrid Transformer-Mamba architecture and Latent MoE with the fully managed, serverless infrastructure of Amazon Bedrock, organizations can now deploy high-reasoning, efficient applications at scale without the heavy lifting of backend management. Ready to see what this model can do for your specific workflow?
Manuel Rioux est fièrement propulsé par WordPress