The adoption and implementation of generative AI inference has increased with organizations building more operational workloads that use AI capabilities in production at scale. To help customers achieve the scale of their generative AI applications, Amazon Bedrock offers cross-Region inference (CRIS) profiles. CRIS is a powerful feature that organizations can use to seamlessly distribute inference processing across multiple AWS Regions. This capability helps you get higher throughput while you’re building at scale and helps keep your generative AI applications responsive and reliable even under heavy load.
We are excited to introduce Global cross-Region inference for Amazon Bedrock and bring Anthropic Claude models in India. Amazon Bedrock now offers Anthropic’s Claude Opus 4.6, Claude Sonnet 4.6, and Claude Haiku 4.5 through Amazon Bedrock Global cross-Region inference (CRIS) for customers operating in India. These frontier models deliver a massive 1-million token context window and advanced agentic capabilities, allowing your applications to process vast datasets and complex workflows with unprecedented speed and intelligence. With this launch, customers using ap-south-1 (Mumbai) and ap-south-2 (Hyderabad) can access Anthropic’s latest Claude models on Amazon Bedrock while benefiting from global inference capacity and highly available inference managed by Amazon Bedrock. With global CRIS, customers can scale inference workloads seamlessly, improve resiliency, and reduce operational complexity. In this post, you will discover how to use Amazon Bedrock’s Global cross-Region Inference for Claude models in India. We will guide you through the capabilities of each Claude model variant and how to get started with a code example to help you start building generative AI applications immediately.
Global cross-Region inference helps organizations manage unplanned traffic bursts by using compute resources across inference capacity across commercial AWS Regions (Regions other than the AWS GovCloud (US) Regions and the China Regions) globally. This section explores how the Global cross-Region inference feature works and the technical mechanisms that power its functionality.
Global cross-Region inference is offered through Inference profiles. Inference profiles operate on two key concepts:
To use Anthropic models, Amazon Bedrock offers out of the box Global Inference profiles. For example:
<global.anthropic.claude-opus-4-6-v1><global.anthropic.claude-sonnet-4-6><global.anthropic.claude-opus-4-5-20251101-v1:0><global.anthropic.claude-sonnet-4-5-20250929-v1:0><global.anthropic.claude-haiku-4-5-20251001-v1:0>For customers in India using BOM (ap-south-1) and HYD (ap-south-2), the respective source and destinations would be as follows:
ap-south-1) -> AWS commercial Regionsap-south-2) -> AWS commercial RegionsFor information about considerations for choosing between Geographic and Global cross-Region inference, see Choosing between Geographic and Global cross-Region inference on the Amazon Bedrock User Guide.
As of today, you can implement global CRIS for the following models.
| Name | Model | Inference profile ID | Inference Processing Destination Regions |
| Global Anthropic Claude Opus 4.6 | Claude Opus 4.6 | global.anthropic.claude-opus-4-6-v1 |
Commercial AWS Regions |
| Global Anthropic Claude Sonnet 4.6 | Claude Sonnet 4.6 | global.anthropic.claude-sonnet-4-6 |
Commercial AWS Regions |
| Global Anthropic Claude Haiku 4.5 | Claude Haiku 4.5 | global.anthropic.claude-haiku-4-5-20251001-v1:0 |
Commercial AWS Regions |
| Global Claude Sonnet 4.5 | Claude Sonnet 4.5 | global.anthropic.claude-sonnet-4-5-20250929-v1:0 |
Commercial AWS Regions |
| GLOBAL Anthropic Claude Opus 4.5 | Claude Opus 4.5 | global.anthropic.claude-opus-4-5-20251101-v1:0 |
Commercial AWS Regions |
For example, to use Global cross-Region inference with Anthropic’s Claude Opus 4.5, you should complete the following key steps:
global.anthropic.claude-opus-4-5-20251101-v1:0) instead of a Region-specific model ID. This works with InvokeModel, InvokeModelWithResponseStream, and Converse and ConverseStream APIs.Indian enterprises face unique challenges during high-traffic periods. Diwali shopping surges, Dussehra ecommerce spikes, Eid celebrations, Christmas festivities, tax filing deadlines, cricket tournaments, and festival seasons when customer engagement peaks dramatically. Global cross-Region inference provides customers in India with the throughput elasticity needed to handle these demand surges without degradation.During such peak periods ecommerce platforms, fintech applications, and customer service chatbots experience 3-5x the normal traffic. With Global CRIS, your applications automatically access inference capacity across AWS Commercial Regions, helping with
By routing requests globally, customers in India gain access to a significantly larger capacity pool—transforming from Regional Tokens per Minute (TPM) limits to global-scale throughput. This means that your generative AI applications remain responsive and reliable when your business needs them most, without the operational complexity of manual multi-Region orchestration or the risk of customer-facing throttling errors.
There are two approaches to infer the Global cross-Region supported models. Let’s understand these approaches in detail.
The Amazon Bedrock playgrounds provide a visual interface to experiment on different models by using different configuration parameters. You can use playgrounds to test and compare different models by experimenting with prompts before integrating them into your application.
To get started using playgrounds, complete the following steps:
global.


To invoke the global models programmatically, one can use InvokeModel, Converse API for real-time requests, and InvokeModelWithResponseStream and ConverseStream API for streaming workloads. The full source code demonstrating these invocation APIs is available at the GitHub repository aws-samples/sample-amazon-bedrock-global-cris.
Invoke Anthropic Claude model with global cross-Region inference using Converse API
Let’s understand the implementation of global cross-Region inference on global.anthropic.claude-opus-4-6-v1 model using Converse API. We recommend the Converse API for conversational applications with a unified interface.
import boto3
# Initialize Bedrock client for India region (Mumbai)
bedrock = boto3.client("bedrock-runtime", region_name="ap-south-1")
# Global CRIS model ID for Claude Opus 4.6
MODEL_ID = "global.anthropic.claude-opus-4-6-v1"
try:
print("Invoking Claude Opus 4.6 via Global CRIS...")
# Use Converse API for simplified interaction
response = bedrock.converse(
messages=[
{
"role": "user",
"content": [{"text": "Explain cloud computing in 2 sentences."}],
}
],
modelId=MODEL_ID,
)
# Extract and display response
response_text = response["output"]["message"]["content"][0]["text"]
print("Response:", response_text)
# Display token usage information
usage = response.get("usage", {})
print("Tokens used:", usage)
if usage:
print(f"Input tokens: {usage.get('inputTokens', 'N/A')}")
print(f"Output tokens: {usage.get('outputTokens', 'N/A')}")
print(f"Total tokens: {usage.get('totalTokens', 'N/A')}")
except Exception as e:
print(f"Error: {e}")
print("Please check your AWS credentials and region configuration.")
The code samples for InvokeModel, InvokeModelWithResponseStream, and Converse and ConverseStream APIs for Global CRIS models can be referenced as follows:
| Model name | Inference profile ID | Invocation API | GitHub sample code |
| Global Anthropic Claude Opus 4.6 | global.anthropic.claude-opus-4-6-v1 |
Converse | Code |
| ConverseStream | Code | ||
| InvokeModel | Code, Advanced Usage | ||
| InvokeModelWithResponseStream | Code, Advanced Usage | ||
| Global Anthropic Claude Sonnet 4.6 | global.anthropic.claude-sonnet-4-6 |
Converse | Code |
| ConverseStream | Code | ||
| InvokeModel | Code | ||
| InvokeModelWithResponseStream | Code | ||
| Global Anthropic Claude Haiku 4.5 | global.anthropic.claude-haiku-4-5-20251001-v1:0 |
Converse | Code |
| ConverseStream | Code | ||
| InvokeModel | Code | ||
| InvokeModelWithResponseStream | Code | ||
| Global Claude Sonnet 4.5 | global.anthropic.claude-sonnet-4-5-20250929-v1:0 |
Converse | Code |
| ConverseStream | Code | ||
| InvokeModel | Code | ||
| InvokeModelWithResponseStream | Code | ||
| GLOBAL Anthropic Claude Opus 4.5 | global.anthropic.claude-opus-4-5-20251101-v1:0 |
Converse | Code |
| ConverseStream | Code | ||
| InvokeModel | Code | ||
| InvokeModelWithResponseStream | Code |
You can also work with global CRIS models using Application inference profiles. The sample is available on GitHub repository at application-inference-profile/multi_tenant_inference_profile_example.py. You can learn more on cross-Region (system-defined) inference profiles and application inference profiles from Set up a model invocation resource using inference profiles.
When using global cross-Region inference, Amazon CloudWatch and AWS CloudTrail continue to record log entries only in the source Region where the request originated. This streamlines monitoring and logging by maintaining the records in a single Region regardless of where the inference request is ultimately processed.
So far, we have learnt how to invoke global cross-Region inference supported models using InvokeModel, InvokeModelWithResponseStream, and Converse and ConverseStream APIs. We also captured the usage from the response from these APIs. Next, we will learn how to efficiently capture logs and metrics, thus improving the overall observability and traceability of the requests and responses made for the global endpoints. We will build dashboards using several key features:
This phased approach will help you how to set up and understand where your requests are being processed and help monitor your model performance effectively.
To gain comprehensive visibility into your global CRIS usage, you will need to enable model invocation logging and set up monitoring dashboards. This will help you track performance metrics, token usage, and identify which Regions are processing your requests.
To enable model invocation logging, navigate to the Amazon Bedrock service page in the AWS console and perform the following actions as shown.

You can use CloudWatch Generative AI Observability dashboard to monitor Model Invocations performance. You can track metrics such as invocation count, token usage, and errors using out-of-box views. We already enabled model invocation logging, so we will navigate to the CloudWatch service page in AWS Console, and choose Model Invocations under Gen AI Observability.

With Gen AI observability, you have complete visibility into your Gen AI workload’s performance with key metrics, end-to-end prompt tracing and step-by-step analysis of large language model (LLM) interactions. You can quickly diagnose issues and gain real-time insights into the performance and reliability of your entire AI stack.

To track which Region processed a request, CloudTrail events include an additionalEventData field with an inferenceRegion key that specifies the destination Region. Organizations can monitor and analyze the distribution of their inference requests across the AWS Global Infrastructure.





For Amazon Bedrock Model Invocation, track which Region processed a request, CloudTrail events include an additionalEventData field with an inferenceRegion key that specifies the destination Region.
SELECT eventTime,
awsRegion,
element_at(additionalEventData, 'inferenceRegion') AS inferenceRegion,
eventName,
userIdentity.arn AS userArn,
requestId
FROM <REPLACE_WITH_YOUR_EVENT_DATA_STORE_ID>
WHERE eventSource IN (
'bedrock.amazonaws.com'
)
AND eventName IN ('InvokeModel', 'InvokeModelWithResponseStream', 'Converse', 'ConverseStream')
AND eventTime >= '2025-11-06 00:00:00'
AND eventTime <= '2026-03-05 23:59:59'
You can observe that the Query Results show awsRegion from where the inference request originated from and the inferenceRegion which indicates the destination Region where the request was actually processed from.
From the query results in the Event data store for global cross-Region inference requests
awsRegion indicates the origin of the request, and inferenceRegion indicates the destination Region where the inference request was processed.eventName indicates the invocation API calls to the global CRIS models using InvokeModel, InvokeModelWithResponseStream, and Converse and ConverseStream APIs are captured.inferenceRegion blank, which indicate that the inference request executed within the Region where the request has originated from. Hence the destination Region in this case is the same as awsRegion.
The Amazon Bedrock Global cross-Region inference can empower Indian organizations to build resilient, high-performing AI applications with intelligent request routing across the worldwide infrastructure of AWS. With the comprehensive monitoring capabilities demonstrated in this post, you can gain complete visibility into your application’s performance and can track exactly where your inference requests are being processed.
Start your journey with global CRIS today by implementing the code examples provided for Anthropic’s Claude Haiku 4.5, Sonnet 4.6, and Opus 4.6. You can enable model invocation logging to gain insights through CloudWatch Gen AI observability, and use CloudTrail to track cross-Region request routing. Whether you’re optimizing for performance with global profiles or maintaining compliance with geography-specific profiles, Amazon Bedrock provides the flexibility that your organization needs.
For more information about Global cross-Region inference for Anthropic’s Claude Opus 4.6 and Claude Sonnet 4.6 in Amazon Bedrock, see Increase throughput with cross-Region inference, Supported Regions and models for inference profiles, and Use an inference profile in model invocation.
We and our partners are excited to see what customers build with this AI Inference capability.
Manuel Rioux est fièrement propulsé par WordPress