We are excited to announce the availability of Gemma 3 27B Instruct models through Amazon Bedrock Marketplace and Amazon SageMaker JumpStart. With this launch, developers and data scientists can now deploy Gemma 3, a 27-billion-parameter language model, along with its specialized instruction-following versions, to help accelerate building, experimentation, and scalable deployment of generative AI solutions on AWS.
In this post, we show you how to get started with Gemma 3 27B Instruct on both Amazon Bedrock Marketplace and SageMaker JumpStart, and how to use the model’s powerful instruction-following capabilities in your applications.
Gemma 3 27B is a high-performance, open-weight, multimodal language model by Google designed to handle both text and image inputs with efficiency and contextual understanding. It introduces a redesigned attention architecture, enhanced multilingual support, and extended context capabilities. With its optimized memory usage and support for large input sequences, it is well-suited for complex reasoning tasks, long-form interactions, and vision-language applications. With 27 billion parameters and training on up to 6 trillion tokens of text, these models are optimized for tasks requiring advanced reasoning, multilingual capabilities, and instruction following. According to Google, Gemma3 27B Instruct models are ideal for developers, researchers, and businesses looking to build generative AI applications such as chatbots, virtual assistants, and automated content generation tools. The following are its key features:
Key use cases for Gemma3, as described by Google, include:
There are two primary methods for deploying Gemma 3 27B in AWS: The first approach involves using Amazon Bedrock Marketplace, which offers a streamlined way of accessing Amazon Bedrock APIs (Invoke and Converse) and tools such as Amazon Bedrock Knowledge Bases, Amazon Bedrock Agents, Amazon Bedrock Flows, Amazon Bedrock Guardrails, and model evaluation. The second approach is using SageMaker JumpStart, a machine learning (ML) hub, with foundation models (FMs), built-in algorithms, and pre-built ML solutions. You can deploy pre-trained models using either the Amazon SageMaker console or SDK.
Amazon Bedrock Marketplace offers access to over 150 specialized FMs, including Gemma 3 27B Instruct.
To try the Gemma 3 27B Instruct model using Amazon Bedrock Marketplace, you need the following:
To deploy the model using Amazon Bedrock Marketplace, complete the following steps:

Information about Gemma3’s features, costs, and setup instructions can be found on its model overview page. This resource includes integration examples, API documentation, and programming samples. The model excels at a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. You can also access deployment guidelines and license details to begin implementing Gemma3 into your projects.

Although default configurations are typically sufficient for basic needs, you have the option to customize security features such as virtual private cloud (VPC) networking, role-based permissions, and data encryption. These advanced settings might require adjustment for production environments to maintain compliance with your organization’s security protocols.

Prior to deploying Gemma 3, verify that your AWS account has sufficient quota allocation for ml.g5.48xlarge instances. A quota set to 0 will trigger deployment failures, as shown in the following screenshot.

To request a quota increase, open the AWS Service Quotas console and search for SageMaker. Locate ml.g5.48xlarge for endpoint usage and choose Request quota increase, then specify your required limit value.

You can now use the playground to interact with Gemma 3.

For detailed steps and example code for invoking the model using Amazon Bedrock APIs, refer to Submit prompts and generate response using the API and the following code:
import boto3
bedrock_runtime = boto3.client("bedrock-runtime")
endpoint_arn = "arn:aws:sagemaker:us-east-2:061519324070:endpoint/endpoint-quick-start-3t7kp"
response = bedrock_runtime.converse(
modelId=endpoint_arn,
messages=[
{
"role": "user",
"content": [{"text": "What is Amazon doing in the field of generative AI?"}]
}
],
inferenceConfig={
"maxTokens": 256,
"temperature": 0.1,
"topP": 0.999
}
)
print(response["output"]["message"]["content"][0]["text"])
SageMaker JumpStart offers access to a broad selection of publicly available FMs. These pre-trained models serve as powerful starting points that can be deeply customized to address specific use cases. You can use state-of-the-art model architectures—such as language models, computer vision models, and more—without having to build them from scratch.
With SageMaker JumpStart, you can deploy models in a secure environment. The models can be provisioned on dedicated SageMaker inference instances and can be isolated within your VPC. After deploying an FM, you can further customize and fine-tune it using the extensive capabilities of Amazon SageMaker AI, including SageMaker inference for deploying models and container logs for improved observability. With SageMaker AI, you can streamline the entire model deployment process.
There are two ways to deploy the Gemma 3 model using SageMaker JumpStart:
We examine both deployment methods to help you determine which approach aligns best with your requirements.
To try the Gemma 3 27B Instruct model in SageMaker JumpStart, you need the following prerequisites:
SageMaker JumpStart provides a user-friendly interface for deploying pre-built ML models with just a few clicks. Through the SageMaker JumpStart UI, you can select, customize, and deploy a wide range of models for various tasks such as image classification, object detection, and natural language processing, without the need for extensive coding or ML expertise.

The model browser displays available models, with details like the provider name and model capabilities.



The model details page includes the following information:
Before you deploy the model, we recommended you review the model details and license terms to confirm compatibility with your use case.
Selecting appropriate instance types and counts is crucial for cost and performance optimization. Monitor your deployment to adjust these settings as needed. Under Inference type, Real-time inference is selected by default. This is optimized for sustained traffic and low latency.

The deployment process can take several minutes to complete.
To use Gemma 3 with the SageMaker Python SDK, first make sure you have installed the SDK and set up your AWS permissions and environment correctly. The following is a code example showing how to programmatically deploy and run inference with Gemma 3:
import sagemaker
from sagemaker.jumpstart.model import JumpStartModel
from sagemaker import Session, image_uris
import boto3
# Initialize SageMaker session
session = sagemaker.Session()
role = sagemaker.get_execution_role()
# Specify model parameters
model_id = "huggingface-vlm-gemma-3-27b-instruct" # or "huggingface-llm-gemma-2b" for the smaller version
instance_type = "ml.g5.48xlarge" # Choose appropriate instance based on your needs
# Create and deploy the model
model = JumpStartModel(
model_id=model_id,
role=role,
instance_type=instance_type,
model_version="*", # Latest version
)
# Deploy the model
predictor = model.deploy(
initial_instance_count=1,
accept_eula=True # Required for deploying foundation models
)
With your Gemma 3 model successfully deployed as a SageMaker endpoint, you’re now ready to start making predictions. The SageMaker SDK provides a straightforward way to interact with your model endpoint for inference tasks. The following code demonstrates how to format your input and make API calls to the endpoint. The code handles both sending requests to the model and processing its responses, making it straightforward to integrate Gemma 3 into your applications.
import json
import boto3
# Initialize AWS session (ensure your AWS credentials are configured)
session = boto3.Session()
sagemaker_runtime = session.client("sagemaker-runtime")
# Define the SageMaker endpoint name (replace with your deployed endpoint name)
endpoint_name = "hf-vlm-gemma-3-27b-instruct-2025-05-07-18-09-16-221"
payload = {
"inputs": "What is Amazon doing in the field of generative AI?",
"parameters": {
"max_new_tokens": 256,
"temperature": 0.1,
"top_p": 0.9,
"return_full_text": False
}
}
# Run inference
try:
response = sagemaker_runtime.invoke_endpoint(
EndpointName=endpoint_name,
ContentType="application/json",
Body=json.dumps(payload)
)
# Parse the response
result = json.loads(response["Body"].read().decode("utf-8"))
generated_text = result[0]["generated_text"].strip()
print("Generated Response:")
print(generated_text)
except Exception as e:
print(f"Error during inference: {e}")
To avoid incurring ongoing charges for AWS resources used during exploration of Gemma3 27B Instruct models, it’s important to clean up deployed endpoints and associated resources. Complete the following steps:
gemma3-27b-instruct-endpoint).Always verify that all endpoints are deleted after experimentation to optimize costs. Refer to the Amazon SageMaker documentation for additional guidance on managing resources.
The availability of Gemma3 27B Instruct models in Amazon Bedrock Marketplace and SageMaker JumpStart empowers developers, researchers, and businesses to build cutting-edge generative AI applications with ease. With their high performance, multilingual capabilities and efficient deployment on AWS infrastructure, these models are well-suited for a wide range of use cases, from conversational AI to code generation and content automation. By using the seamless discovery and deployment capabilities of SageMaker JumpStart and Amazon Bedrock Marketplace, you can accelerate your AI innovation while benefiting from the secure, scalable, and cost-effective AWS Cloud infrastructure.
We encourage you to explore the Gemma3 27B Instruct models today by visiting the SageMaker JumpStart console or Amazon Bedrock Marketplace. Deploy the model and experiment with sample prompts to meet your specific needs. For further learning, explore the AWS Machine Learning Blog, the SageMaker JumpStart GitHub repository, and the Amazon Bedrock documentation. Start building your next generative AI solution with Gemma3 27B Instruct models and unlock new possibilities with AWS!
Santosh Vallurupalli is a Sr. Solutions Architect at AWS. Santosh specializes in networking, containers, and migrations, and enjoys helping customers in their journey of cloud adoption and building cloud-based solutions for challenging issues. In his spare time, he likes traveling, watching Formula1, and watching The Office on repeat.
Aravind Singirikonda is an AI/ML Solutions Architect at AWS. He works with AWS customers in the healthcare and life sciences domain to provide guidance and technical assistance, helping them improve the value of their AI/ML solutions when using AWS.
Pawan Matta is a Sr. Solutions Architect at AWS. He works with AWS customers in the gaming industry and guides them to deploy highly scalable, performant architectures. His area of focus is management and governance. In his free time, he likes to play FIFA and watch cricket.
Ajit Mahareddy is an experienced Product and Go-To-Market (GTM) leader with over 20 years of experience in product management, engineering, and GTM. Prior to his current role, Ajit led product management building AI/ML products at leading technology companies, including Uber, Turing, and eHealth. He is passionate about advancing generative AI technologies and driving real-world impact with generative AI.
Manuel Rioux est fièrement propulsé par WordPress