This post is cowritten by Jeff Boudier, Simon Pagezy, and Florent Gbelidji from Hugging Face.
Agentic AI systems represent an evolution from conversational AI to autonomous agents capable of complex reasoning, tool usage, and code execution. Enterprise applications benefit from strategic deployment approaches tailored to specific needs. These needs include managed endpoints, which deliver auto-scaling capabilities, foundation model APIs to support complex reasoning, and containerized deployment options that support custom integration requirements.
Hugging Face smolagents is an open source Python library designed to make it straightforward to build and run agents using a few lines of code. We will show you how to build an agentic AI solution by integrating Hugging Face smolagents with Amazon Web Services (AWS) managed services. You’ll learn how to deploy a healthcare AI agent that demonstrates multi-model deployment options, vector-enhanced knowledge retrieval, and clinical decision support capabilities.
While we use healthcare as an example, this architecture applies to multiple industries where domain-specific intelligence and reliability are critical. The solution uses the model-agnostic, modality-agnostic, and tool-agnostic design of smolagents to orchestrate across Amazon SageMaker AI endpoints, Amazon Bedrock APIs, and containerized model servers.
Many AI systems face limitations with single-model approaches that can’t adapt to diverse enterprise needs. These systems often have rigid deployment options, inconsistent APIs across different AI services, and lack multi-model deployment options for optimal model selection.
This solution demonstrates how organizations can build AI systems that address these limitations. The solution allows deployment selection based on operational needs and provides consistent request and response formats across different AI backends and deployment methods. It generates contextual responses through medical knowledge integration and vector search, supporting deployment from development to production environments through containerized architecture.
This healthcare use case illustrates how the AI agent can process complex medical queries for six medications with clinical decision support and AWS security and compliance capabilities.
The solution consists of the following services and features to deliver the agentic AI capabilities:
The following diagram illustrates the solution architecture.

The architecture is a complete integration of the Hugging Face smolagents framework with AWS services. A client web interface connects to a healthcare agent container that orchestrates across three model backends: SageMaker AI with BioM-ELECTRA, Amazon Bedrock with Claude 3.5 Sonnet V2, and a containerized model server with BioM-ELECTRA. The solution includes a vector store powered by OpenSearch Service and a security layer with data encryption at rest and in transit. The security layer also handles IAM access control and authentication, and any medical disclaimers for regulatory compliance.
This solution supports deployment options through smolagents with each backend optimized for different scenarios:
The three backends implement Hugging Face Messages API compatibility, confirming consistent request and response formats regardless of the selected model service. Users select the appropriate backend based on their requirements—the solution provides deployment options rather than automatic routing.
The complete implementation is available in the sample-healthcare-agent-with-smolagents-on-aws GitHub repository.
The integration of Hugging Face smolagents with AWS managed services offers significant advantages for enterprise agentic AI deployments.
Organizations can choose the optimal deployment for each use case: Amazon Bedrock for serverless access to foundation models and self-hosted containerized deployment for custom tool integration or SageMaker AI for specialized domain models. These options help to match specific workload requirements, rather than a one-size-fits-all approach.
Organizations can optimize their infrastructure choices without changing their agent logic. You can switch between containerized model server, SageMaker AI, and Amazon Bedrock without modifying your application code. This provides deployment options, while maintaining consistent agent behavior.
The CodeAgent approach of smolagents streamlines multi-step operations through direct Python code generation and processing. The following comparison illustrates the multi-step operations of smolagents:
{
"action": "search",
"parameters": {"query": "drug interactions"},
"next_action": {
"action": "filter",
"parameters": {"criteria": "severity > moderate"}
}
}
# Search and filter in single code block
results = search_tool("drug interactions")
filtered_results = [r for r in results if r.severity > "moderate"]
final_answer(f"Found {len(filtered_results)} severe interactions: {filtered_results}")
The smolagents CodeAgent supports single code blocks to handle multi-step operations, reducing large language model (LLM) calls while streamlining agent development. It provides full control of agent logic across AWS service deployments.
By deploying the application on AWS, you gain access to security features and auto-scaling capabilities that help you meet organizational security requirements and maintain regulatory compliance. Running containerized workloads with Amazon ECS and Fargate helps you achieve reliable operations and optimize costs through automated resource scaling.
Let’s walk through implementing this solution.
Before you deploy the solution, you need the following:
Run the following command to install the required Python packages:
pip install -r healthcare_ai_agent/phase_00_installation/requirements.txt
Set the required environment variables for your AWS Region and resource names before deploying the infrastructure.
export AWS_REGION=us-west-2
export SAGEMAKER_ENDPOINT_NAME=healthcare-qa-endpoint-1
export OPENSEARCH_DOMAIN=healthcare-vector-store
export OPENSEARCH_INDEX=medical-knowledge
export BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20241022-v2:0
export SAGEMAKER_MODEL_ID=sultan/BioM-ELECTRA-Large-SQuAD2
export CONTAINERIZED_MODEL_ID=sultan/BioM-ELECTRA-Large-SQuAD2
echo $AWS_REGION
echo $SAGEMAKER_ENDPOINT_NAME
These environment variables are used throughout the deployment and testing processes. Verify that they’re set before proceeding to the next step.
Start by creating the foundational AWS infrastructure components using the SampleAWSInfrastructureManager class from the Smolagents_SageMaker_Bedrock_Opensearch.py implementation.
For automated deployment of the AWS infrastructure components, you can use the enhanced main function.
from Smolagents_SageMaker_Bedrock_Opensearch import enhanced_main
enhanced_main()
# Select option 1 for complete AWS infrastructure deployment
If you prefer to create components individually, you can set up an OpenSearch Service domain for vector-enhanced knowledge retrieval and an Amazon ECS cluster for containerized deployment.
Both the OpenSearch Service domain and Amazon ECS cluster are automatically created as part of the complete AWS infrastructure deployment (Option 1 in enhanced_main). If you’ve already deployed the complete infrastructure, both components are ready and you can skip to the Deploy the Amazon SageMaker AI endpoint section.
Deploy the BioM-ELECTRA-Large-SQuAD2 model to SageMaker AI for specialized medical query processing. An automated method (for deployment through the enhanced main) and a manual method (for deployment through the SageMaker AI endpoint) are provided.
from Smolagents_SageMaker_Bedrock_Opensearch import enhanced_main
# Start the enhanced main function
enhanced_main()
# Select option 2 for SageMaker endpoint deployment
echo $SAGEMAKER_MODEL_ID
echo $SAGEMAKER_ENDPOINT_NAME
from Smolagents_SageMaker_Bedrock_Opensearch import deploy_sagemaker_endpoint_safe
endpoint_name = deploy_sagemaker_endpoint_safe()
# Verify endpoint status
print(f"✅ Endpoint deployed: {endpoint_name}")
MAX_LENGTH=512 and TEMPERATURE=0.1.Configure the two additional backend options using the SampleTripleHealthcareAgent class for model selection based on operational needs.
Configure access to Amazon Bedrock for foundation model integration with Claude 3.5 Sonnet V2.
echo $BEDROCK_MODEL_ID
Set up the medical knowledge database with six medications and vector embeddings in OpenSearch Service. You can use the enhanced_main() function, which provides an interactive menu for deployment tasks, or initialize manually using the SampleOpenSearchManager class.
from Smolagents_SageMaker_Bedrock_Opensearch import enhanced_main
# Start the enhanced main function
enhanced_main()
# Select option 4 for OpenSearch initialization and medical knowledge indexing
SampleOpenSearchManager:from Smolagents_SageMaker_Bedrock_Opensearch import enhanced_main
# Start the enhanced main function
enhanced_main()
# Select option 4 for OpenSearch initialization and medical knowledge indexing
drug_name, content, and content_type fields.# Verify the indexing completed successfully
print("Medical knowledge base initialized with 6 medications")
After setting up the core infrastructure, deploy a containerized model server that provides self-hosted model deployment capabilities.
Deploy the containerized model server using the LocalContainerizedModelServer class with BioM-ELECTRA-Large-SQuAD2 for model deployment.
For automated deployment, use the containerized Amazon ECS deployment:
from Smolagents_SageMaker_Bedrock_Opensearch import enhanced_main
# Start the enhanced main function
enhanced_main()
# Select option 3 for ECS container deployment
echo $CONTAINERIZED_MODEL_ID
from Smolagents_SageMaker_Bedrock_Opensearch import LocalContainerizedModelServer
containerized_server = LocalContainerizedModelServer()
containerized_server.start_server()
The containerized model server uses Docker sandboxing for secure code execution.
print(f"✅ Containerized Model Server status: {containerized_server.get_status()}")
Deploy the core healthcare agent using the SampleTripleHealthcareAgent class that demonstrates smolagents integration with deployment capabilities across the three backends.
Set up the main healthcare agent that orchestrates across SageMaker AI, Amazon Bedrock, and containerized model server with smolagents framework integration.
from Smolagents_SageMaker_Bedrock_Opensearch import SampleOpenSearchVectorStore
vector_store = SampleOpenSearchVectorStore()
vector_store_available = vector_store.initialize_client()
from Smolagents_SageMaker_Bedrock_Opensearch import SampleTripleHealthcareAgent
import os
agent = SampleTripleHealthcareAgent(
endpoint_name=os.getenv('SAGEMAKER_ENDPOINT_NAME'),
vector_store=vector_store if vector_store_available else None
)
SampleSageMakerEndpointModel with BioM-ELECTRA integrationSampleBedrockClaudeModel with Claude 3.5 Sonnet V2 APISampleContainerizedModel with fallback mechanismsSampleHealthcareCodeAgent for smolagents integration.max_steps=3 and DuckDuckGoSearchTool integration.The system demonstrates three deployment options based on operational needs.
Each backend includes database fallback with medical knowledge for six medications and Amazon OpenSearch Service provides contextual information across the model backends with similarity matching.
Use the sample_interactive_healthcare_assistant function to test the multi-model deployment.
from Smolagents_SageMaker_Bedrock_Opensearch import sample_interactive_healthcare_assistant
sample_interactive_healthcare_assistant()
You can now run and test the multi-model healthcare agent to observe the multi-backend deployment capabilities of smolagents using the actual medical knowledge database with six medications.
The solution provides multiple ways to interact with the healthcare AI agent based on your preferred development environment.
For an interactive web-based experience:
cd healthcare_ai_agent/streamlit_demo
streamlit run healthcare_app.py
For interactive development and experimentation:
jupyter lab Smolagents_SageMaker_Bedrock_Opensearch.ipynb
For command-line execution:
python Smolagents_SageMaker_Bedrock_Opensearch.py
These methods provide access to the same multi-model healthcare agent functionality with different user interfaces.
Use the enhanced_main function to access testing utilities and validate the multi-model deployment.
Experience the healthcare assistant’s conversational interface with near real-time model switching capabilities.
from Smolagents_SageMaker_Bedrock_Opensearch import enhanced_main
# Start the enhanced main function
enhanced_main()
Use the enhanced_main function to compare the three deployment options across different
query types.
To run interactive tests
from Smolagents_SageMaker_Bedrock_Opensearch import enhanced_main
enhanced_main()
Choose option 7 for Compare the 3 models to test deployment options.
Query: "What are the side effects of metformin?"
Expected deployment: SageMaker AI (specialized medical knowledge)
Response: "Metformin side effects: Nausea, Diarrhea. Serious: Lactic acidosis."
Query: "Compare cardiovascular risks of atorvastatin and simvastatin"
Expected deployment: Amazon Bedrock (complex reasoning with Claude 3.5 Sonnet V2)
Response: Detailed analysis with mechanism of action and monitoring requirements
Query: "Tell me about lisinopril monitoring requirements"
Expected deployment: Containerized Model Server (specialized tools and fallback)
Response: "Monitor Blood pressure, Kidney function for Lisinopril"
Actual responses include vector context from OpenSearch Service when available, showing similarity matching results alongside the model responses.
Use the built-in testing utilities to verify healthcare agent deployment and component status.
enhanced_main):from Smolagents_SageMaker_Bedrock_Opensearch import enhanced_main
enhanced_main()
# Select option 6 for component test
Expected output:
get_container_status():status = agent.get_container_status()
print(status)
# Shows: Healthcare Agent Container, smolagents framework Active,
# Messages API Compatible, 6 medications loaded
Expected output:
from Smolagents_SageMaker_Bedrock_Opensearch import enhanced_main
# Start the enhanced main function
enhanced_main()
# Select option 10 for containerized model server test
Expected output:
To avoid incurring future charges, delete the resources you created using the cleanup utilities provided in the implementation.
aws opensearch delete-domain --domain-name healthcare-vector-store
aws ecs update-service --cluster healthcare-agent-cluster --service healthcare-agent-service --desired-count 0
aws ecs delete-service --cluster healthcare-agent-cluster --service healthcare-agent-service
aws ecs delete-cluster --cluster healthcare-agent-cluster
from Smolagents_SageMaker_Bedrock_Opensearch import enhanced_main enhanced_main()
from Smolagents_SageMaker_Bedrock_Opensearch import cleanup_sagemaker_endpoint cleanup_sagemaker_endpoint("healthcare-qa-endpoint-1")
containerized_server.stop_server()
aws iam detach-role-policy --role-name ecsTaskExecutionRole --policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy
aws iam delete-role --role-name ecsTaskExecutionRole
For production deployments, implementing observability is essential for monitoring agent performance, tracking execution traces, and verifying reliability. Amazon Bedrock AgentCore Runtime provides observability with automatic instrumentation. It captures session metrics, performance data, error tracking, and complete execution traces (including tool invocations). Read more about implementing observability in Build trustworthy AI agents with Amazon Bedrock AgentCore observability.
While we demonstrated a healthcare implementation, this solution applies to multiple industries requiring domain-specific intelligence and reliability. With multi-model deployment, organizations can choose the optimal backend for each use case. This includes managed endpoints for production workloads, foundation models for complex reasoning, or self-hosted deployment for custom integration.
Financial institutions can deploy agents for regulatory compliance, risk assessment, and fraud detection while meeting strict security and audit requirements. This deployment approach supports specialized fraud detection models, complex regulatory analysis, and custom financial tools integration.
Manufacturing organizations can implement intelligent agents for predictive maintenance, quality control, and supply chain optimization. Multi-model deployment allows equipment monitoring, with domain-specific models and complex supply chain reasoning with foundation models.
Energy companies can deploy agents for grid operations, regulatory compliance, and infrastructure management. This approach supports near real-time demand forecasting with specialized models and complex environmental impact analysis with foundation models.
We showed how to build an agentic AI solution by integrating Hugging Face smolagents with AWS managed services. The solution demonstrates multi-model deployment options, vector-enhanced knowledge retrieval, and deployment capabilities so organizations can deploy domain-specific AI agents with AWS security features and compliance controls.
The healthcare use case illustrates how the model-agnostic design of smolagents supports deployment orchestration across Amazon SageMaker AI, Amazon Bedrock, and a containerized model server. Key technical innovations include the messages API compatibility across the backends, smolagents framework integration, and containerized deployment with AWS Fargate.
This solution architecture is extensible to financial services, manufacturing, energy, and other industries where domain-specific intelligence and reliability are critical.
Discuss a project and requirements to find the right help for your business needs with Hugging Face. Or if there are questions about getting started with AWS, speak with an AWS generative AI Specialist to learn how we can help accelerate your business today.
Further reading
Sanhita Sarkar, PhD, Global Partner Solutions, AI/ML, and generative AI at AWS. She drives AI/ML and generative AI partner solutions at AWS, with extensive leadership experience across edge, cloud, and data center environments. She holds several patents, has published research papers, and serves as chair for technical conferences.
Jeff Boudier, Head of Products at Hugging Face. Jeff was also a co-founder of Stupeflix, acquired by GoPro, where he served as director of product management, product marketing, business development, and corporate development.
Simon Pagezy, Partner Success Manager at Hugging Face. He leads partnerships at Hugging Face with major cloud and hardware companies. He works to make the Hugging Face offerings broadly accessible across diverse deployment environments.
Florent Gbelidji, Cloud Partnership Tech Lead for the AWS account, driving integrations between the Hugging Face offerings and AWS services. He has also been an ML Engineer in the Expert Acceleration Program where he helped companies build solutions with open-source AI.
Manuel Rioux est fièrement propulsé par WordPress