This post is cowritten with Remi Louf, CEO and technical founder of Dottxt.
Structured output in AI applications refers to AI-generated responses conforming to formats that are predefined, validated, and often strictly entered. This can include the schema for the output, or ways specific fields in the output should be mapped. Structured outputs are essential for applications that require consistency, validation, and seamless integration with downstream systems. For example, banking loan approval systems must generate JSON outputs with strict field validation, healthcare systems need to validate patient data formats and enforce medication dosage constraints, and ecommerce systems require standardized invoice generation for their accounting systems.
This post explores the implementation of .txt’s Outlines framework as a practical approach to implementing structured outputs using AWS Marketplace in Amazon SageMaker.
Structured outputs elevate generative AI from ad hoc text generation to dependable business infrastructure, enabling precise data exchange, automated decisioning, and end-to-end workflows across high‑stakes, integration-heavy environments. By enforcing schemas and predictable formats, they unlock use-cases where accuracy, traceability, and interoperability are non-negotiable, from financial reporting and healthcare operations to ecommerce logistics and enterprise workflow automation. This section explores where structured outputs create the most value and how they translate directly into reduced errors, lower operational risk, and measurable ROI.
The category structured output combines multiple types of requirements for how models should produce outputs that follow specific constraints mechanisms. The following are examples of constraint mechanisms.
transaction_id (string), amount (float), and timestamp (datetime) are present and correctly entered.enum to force models to select from fixed options—such as categorizing instruments as Percussion, String, Woodwind, Brass, or Keyboard—removing arbitrary category generation.In modern applications, AI models are integrated with non-AI types of processing and business systems. These integrations and junction points require consistency, type safety, and machine readability, because parsing ambiguities or format deviations would break workflows. Here are some of the common architectural patterns where we see critical interoperability between LLMs and infrastructure components:
Across high-stakes, integration-heavy domains, structured outputs transform generative models from flexible text engines into reliable business infrastructure that delivers predictability, auditability, and end‑to‑end automation.
The common thread is operational complexity, integration requirements, and risk sensitivity. Structured outputs transform AI from text generation into reliable business infrastructure where predictability, auditability, and system interoperability drive measurable ROI through reduced errors, faster processing, and seamless automation.
Structured output can be achieved in several ways. Most frameworks will, at the core, focus on validation to identify if the output adheres to the rules and requirements requested. If the output doesn’t conform, the framework will request a new output, and keep iterating as such until the model achieves the requested output structure.
Outlines offers an advanced approach called generation-time validation, meaning that the validation happens as the model is producing tokens, which shifts validation to early in the generation process instead of validating after completion. While not integrated with Amazon Bedrock, understanding Outlines provides insight into cutting-edge structured output techniques that inform hybrid implementation strategies.
Outlines, developed by the .txt team, is a Python library designed to bring deterministic structure and reliability to language model outputs—addressing a key challenge in deploying LLMs for production applications. Unlike traditional free-form generation, developers can use Outlines to enforce strict output formats and constraints during generation, not just after the fact. This approach makes it possible to use LLMs for tasks where accuracy, predictability, and integration with downstream systems are required.
Outlines enforces constraints through three main mechanisms:
During generation, Outlines follows a precise workflow:
For example, with a pattern like ^d*(.d+)?$for decimal numbers, Outlines converts this into an automaton that only allows valid numeric sequences to be generated. If 748 has been generated, the system knows the only valid next tokens are another digit, a decimal point, or the end of sequence token.
Enforcing structured output during generation offers significant advantages for reliability and performance in production environments. It helps to increase the validity of the output’s structure and can significantly improve performance:
Here are some of the proven benefits of the Outlines library:
Outlines can be seamlessly integrated into existing Python workflows:
from pydantic import BaseModel
# Define your data structure
class Patient(BaseModel):
id: int
name: str
diagnosis: str
age: int
# Load model and create structured generator
model = models.transformers("microsoft/DialoGPT-medium")
generator = generate.json(model, Patient)
# Generate structured output
prompt = "Create a patient record for John Smith, 45, with diabetes"
result = generator(prompt) # Returns valid Patient instance
print(result.name) # "John Smith"
print(result.age) # 45
For more complex schemas:
from enum import Enum
class Status(str, Enum):
ACTIVE = "active"
INACTIVE = "inactive"
PENDING = "pending"
class User(BaseModel):
username: str
email: str
status: Status
created_at: datetime
# Generator enforces enum values and datetime format
user_generator = generate.json(model, User)
You can directly deploy .txt’s Amazon SageMaker real-time inference solution for generating structured output by deploying one of .txt’s models such as DeepSeek-R1-Distill-Qwen-32B through AWS Marketplace. The following code assumes that you have already deployed an endpoint in your AWS account.
A Jupyter Notebook that walks through deploying the endpoint end-to-end is available in the product repository.
import json
import boto3
# Set this based on your SageMaker endpoint
endpoint_name = "dotjson-with-DeepSeek-R1-Distill-Qwen-32B"
session = boto3.Session()
structured_data = {
"patient_id": 12345,
"first": "John",
"last": "Adams",
"appointment_date": "2025-01-27",
"notes": "Patient presented with a headache and sore throat",
}
payload = {
"messages": [
{
"role": "system",
"content": "You are a helpful, honest, and concise assistant.",
},
{
"role": "user",
"content": f"Create a medical record from the following visit data: {structured_data}",
},
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "Medical Record",
"schema": {
"properties": {
"patient_id": {"title": "Patient Id", "type": "integer"},
"date": {"title": "Date", "type": "string", "format": "date-time"},
"diagnosis": {"title": "Diagnosis", "type": "string"},
"treatment": {"title": "Treatment", "type": "string"},
},
"required": ["patient_id", "diagnosis", "treatment"],
"title": "MedicalRecord",
"type": "object",
},
},
"max_tokens": 1000,
},
}
runtime = session.client("sagemaker-runtime")
response = runtime.invoke_endpoint(
EndpointName=endpoint_name,
ContentType="application/json",
Accept="application/json",
Body=json.dumps(payload).encode(),
)
body = json.loads(response["Body"].read().decode("utf-8"))
# View the structured output produced by the model
msg = body["choices"][0]["message"]
content = msg["content"]
medical_record = json.loads(content)
medical_record
This hybrid approach removes the need for retries compared to validation after completion.
While Outlines offers generation-time consistency, several other approaches provide structured outputs with different trade-offs:
When using most modern LLMs, such as Amazon Nova, users can define output schemas directly in prompts, supporting type constraints, enumerations, and structured templates within the AWS environment. The following guide shows different prompting patterns for Amazon Nova.
# Example Nova structured output
import boto3
bedrock = boto3.client('bedrock-runtime')
response = bedrock.invoke_model(
modelId='amazon.nova-pro-v1:0',
body=json.dumps({
"messages": [{"role": "user", "content": "Extract customer info from this text..."}],
"inferenceConfig": {"maxTokens": 500},
"toolConfig": {
"tools": [{
"toolSpec": {
"name": "extract_customer",
"inputSchema": {
"json": {
"type": "object",
"properties": {
"name": {"type": "string"},
"email": {"type": "string"},
"phone": {"type": "string"}
},
"required": ["name", "email"]
}
}
}
}]
}
})
)
Post-generation validation open-source frameworks have emerged as a critical layer in modern generative AI systems, providing structured, repeatable mechanisms to evaluate and govern model outputs before they are consumed by users or downstream applications. By separating generation from validation, these frameworks enable teams to enforce safety, quality, and policy constraints without constantly retraining or fine-tuning underlying models.
Language Models Query Language (LMQL) has a SQL-like interface and provides a query language for LLMs, so that developers can specify constraints, type requirements, and value ranges directly in prompts. Particularly effective for multiple-choice and type constraints.
Instructor provides retry mechanisms by wrapping LLM outputs with schema validation and automatic retry mechanisms. Tight integration with Pydantic models makes it suitable for scenarios where post-generation validation and correction are acceptable.
import boto3
import instructor
from pydantic import BaseModel
# Create a Bedrock client for runtime interactions
bedrock_client = boto3.client('bedrock-runtime')
# Set up the instructor client with Bedrock runtime
client = instructor.from_bedrock(bedrock_client)
# Define the structured response model
class User(BaseModel):
name: str
age: int
# Invoke the Claude Haiku model with the correct message structure
user = client.chat.completions.create(
modelId="global.anthropic.claude-haiku-4-5-20251001-v1:0",
messages=[
{"role": "user", "content": [{"text": "Extract: Jason is 25 years old"}]},
],
response_model=User,
)
print(user)
# Expected output:
# User “name='Jason' age=25”
Guidance offers fine-grained template-driven control over output structure and formatting, allowing token-level constraints. Useful for consistent response formatting and conversational flows.
Selecting the right structured output approach depends on several key factors that directly impact implementation complexity and system performance.
Organizations can use the structured output paradigm in AI to reliably enforce schemas, integrate with a wide range of models and APIs, and balance post-generation validation versus direct generation methods for greater control and consistency. By understanding the trade-offs in performance, integration complexity, and schema enforcement, builders can tailor solutions to their technical and business requirements, facilitating scalable and efficient automation across diverse applications.
To learn more about implementing structured outputs with LLMs on AWS:
Manuel Rioux est fièrement propulsé par WordPress