This blog post is co-written with Gene Arnold from Alation.
To build a generative AI-based conversational application integrated with relevant data sources, an enterprise needs to invest time, money, and people. First, you would need build connectors to the data sources. Next you need to index this data to make it available for a Retrieval Augmented Generation (RAG) approach where relevant passages are delivered with high accuracy to a large language model (LLM). To do this, you need to select an index that provides the capabilities to index the content for semantic and vector search, build the infrastructure to retrieve data, rank the answers, and build a feature rich web application. Additionally, you might need to hire and staff a large team to build, maintain, and manage such a system.
Amazon Q Business is a fully managed generative AI-powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. Amazon Q Business can help you get fast, relevant answers to pressing questions, solve problems, generate content, and take actions using the data and expertise found in your company’s information repositories, code, and enterprise systems. To do this Amazon Q Business provides out-of-the-box native data source connectors that can index content into a built-in retriever and uses an LLM to provide accurate, well written answers. A data source connector is a component of Amazon Q Business that helps to integrate and synchronize data from multiple repositories into one index. Amazon Q Business offers multiple prebuilt connectors to a large number of data sources, including ServiceNow, Atlassian Confluence, Amazon Simple Storage Service (Amazon S3), Microsoft SharePoint, Salesforce, and many more. For a full list of supported data source connectors, see Amazon Q Business connectors.
However, many organizations store relevant information in the form of unstructured data on company intranets or within file systems on corporate networks that are inaccessible to Amazon Q Business using its native data source connectors. You can now use the custom data source connector within Amazon Q Business to upload content to your index from a wider range of data sources.
Using an Amazon Q Business custom data source connector, you can gain insights into your organization’s third party applications with the integration of generative AI and natural language processing. This post shows how to configure an Amazon Q Business custom connector and derive insights by creating a generative AI-powered conversation experience on AWS using Amazon Q Business while using access control lists (ACLs) to restrict access to documents based on user permissions.
Alation is a data intelligence company serving more than 600 global enterprises, including 40% of the Fortune 100. Customers rely on Alation to realize the value of their data and AI initiatives. Headquartered in Redwood City, California, Alation is an AWS Specialization Partner and AWS Marketplace Seller with Data and Analytics Competency. Organizations trust Alation’s platform for self-service analytics, cloud transformation, data governance, and AI-ready data, fostering innovation at scale. In this post, we will showcase a sample of how Alation’s business policies can be integrated with an Amazon Q Business application using a custom data source connector.
After you integrate Amazon Q Business with data sources such as Alation, users can ask questions from the description of the document. For example,
A data source connector is a mechanism for integrating and synchronizing data from multiple repositories into one container index. Amazon Q Business offers multiple pre-built data source connectors that can connect to your data sources and help you create your generative AI solution with minimal configuration. However, if you have valuable data residing in spots for which those pre-built connectors cannot be used, you can use a custom connector.
When you connect Amazon Q Business to a data source and initiate the data synchronization process, Amazon Q Business crawls and adds documents from the data source to its index.
You would typically use an Amazon Q Business custom connector when you have a repository that Amazon Business doesn’t yet provide a data source connector for. Amazon Q Business only provides metric information that you can use to monitor your data source sync jobs. You must create and run the crawler that determines the documents your data source indexes. A simple architectural representation of the steps involved is shown in the following figure.

The solution shown of integrating Alation’s business policies is for demonstration purposes only. We recommend running similar scripts only on your own data sources after consulting with the team who manages them, or be sure to follow the terms of service for the sources that you’re trying to fetch data from. The steps involved for other custom data sources are very similar except the part where we connect to Alation and fetch data from it. To crawl and index contents in Alation you configure an Amazon Q Business custom connector as a data source in your Amazon Q Business application.
For this walkthrough, you should have the following prerequisites:
https://[[your-domain]].alationcloud.com/admin/auth/ and see the OAuth Client Applications. Alation admins can navigate to https://[[your-domain]].alationcloud.com/admin/users/ and change user access if needed.StartDataSourceSyncJob, BatchPutDocument, and StopDataSourceSyncJob permissions) and the AWS Secrets Manager secret (GetSecretValue). Additionally, it’s recommended that the policy restricts access to only the Amazon Q Business application Amazon Resource Name (ARN) and the Secrets Manager secret created in the following steps.In your Alation cloud account, create an OAuth2 client application that can be consumed from an Amazon Q Business application.
https://[[your-domain]].alationcloud.com/admin/auth/).



Client_Id and Client_Secret values copied from the previous step.



In our example, you would create three different sets of Alation policies for a fictional organization named Unicorn Rentals. Grouped as Workplace, HR, and Regulatory, each policy contains a rough two-page summary of crucial organizational items of interest. You can find details on how to create policies on Alation documentation.

On the Amazon Q Business side, let’s assume that we want to ensure that the following access policies are enforced. Users and access are setup via code illustrated in later sections.
| # | First name | Last name | Policies authorized for access |
| 1 | Alejandro | Rosalez | Workplace, HR, and Regulatory |
| 2 | Sofia | Martinez | Workplace and HR |
| 3 | Diego | Ramirez | Workplace and Regulatory |














Now let’s load the Alation data into Amazon Q Business using the correct access permissions. The code examples that follow are also available on the accompanying GitHub code repository.
secrets_manager_client = boto3.client('secretsmanager')
secret_name = "alation_test"
try:
get_secret_value_response = secrets_manager_client.get_secret_value(
SecretId=secret_name
)
secret = eval(get_secret_value_response['SecretString'])
except ClientError as e:
raise e
base_url = "https://[[your-domain]].alationcloud.com"
token_url = "/oauth/v2/token/"
introspect_url = "/oauth/v2/introspect/"
jwks_url = "/oauth/v2/.well-known/jwks.json/"
api_url = base_url + token_url
data = {
"grant_type": "client_credentials",
}
client_id = secret['Client_Id']
client_secret = secret['Client_Secret']
auth = HTTPBasicAuth(username=client_id, password=client_secret)
response = requests.post(url=api_url, data=data, auth=auth)
print(response.json())
access_token = response.json().get('access_token','')
api_url = base_url + introspect_url + "?verify_token=true"
data = {
"token": access_token,
}
response = requests.post(url=api_url, data=data, auth=auth)
primary_principal_list = []
workplace_policy_principals = []
hr_policy_principals = []
regulatory_policy_principals = []
principal_user_email_ids = ['[email protected]', ‘[email protected]', ‘[email protected]']
workplace_policy_email_ids = ['[email protected]', '[email protected]', '[email protected]']
hr_policy_email_ids = ['[email protected]', '[email protected]']
regulatory_policy_email_ids = ['[email protected]', '[email protected]']
for workplace_policy_member in workplace_policy_email_ids:
workplace_policy_members_dict = { 'user': { 'id': workplace_policy_member, 'access': 'ALLOW', 'membershipType': 'DATASOURCE' }}
workplace_policy_principals.append(workplace_policy_members_dict)
if workplace_policy_member not in primary_principal_list:
primary_principal_list.append(workplace_policy_member)
for hr_policy_member in hr_policy_email_ids:
hr_policy_members_dict = { 'user': { 'id': hr_policy_member, 'access': 'ALLOW', 'membershipType': 'DATASOURCE' }}
hr_policy_principals.append(hr_policy_members_dict)
if hr_policy_member not in primary_principal_list:
primary_principal_list.append(hr_policy_member)
for regulatory_policy_member in regulatory_policy_email_ids:
regulatory_policy_members_dict = { 'user': { 'id': regulatory_policy_member, 'access': 'ALLOW', 'membershipType': 'DATASOURCE' }}
regulatory_policy_principals.append(regulatory_policy_members_dict)
if regulatory_policy_member not in primary_principal_list:
primary_principal_list.append(regulatory_policy_member)
url = "https://[[your-domain]].com/integration/v1/business_policies/?limit=200&skip=0&search=[[Workplace/HR/Regulatory]]&deleted=false"
headers = {
"accept": "application/json",
"TOKEN": access_token
}
response = requests.get(url, headers=headers)
policy_data = ""
for policy in json.loads(response.text):
if policy["title"] is not None:
policy_title = cleanhtml(policy["title"])
else:
policy_title = "None"
if policy["description"] is not None:
policy_description = cleanhtml(policy["description"])
else:
policy_description = "None"
temp_data = policy_title + ":n" + policy_description + "nn"
policy_data += temp_data
qbusiness_client = boto3.client('qbusiness')
application_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
index_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
data_source_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
for principal in primary_principal_list:
create_user_response = qbusiness_client.create_user(
applicationId=application_id,
userId=principal,
userAliases=[
{
'indexId': index_id,
'dataSourceId': data_source_id,
'userId': principal
},
],
)
for principal in primary_principal_list:
get_user_response = qbusiness_client.get_user(
applicationId=application_id,
userId=principal
)
for user_alias in get_user_response['userAliases']:
if "dataSourceId" in user_alias:
print(user_alias['userId'])
start_data_source_sync_job_response = qbusiness_client.start_data_source_sync_job(
dataSourceId = data_source_id,
indexId = index_id,
applicationId = application_id
)
job_execution_id = start_data_source_sync_job_response['executionId']
workplace_policy_document_id = hashlib.shake_256(policy_data.encode('utf-8')).hexdigest(128)
docs = [ {
"id": policy_document_id,
"content" : {
'blob': policy_data.encode('utf-8')
},
"contentType": "PLAIN_TEXT",
"title": "Unicorn Rentals – Workplace/HR/Regulatory Policy",
"accessConfiguration" : { 'accessControls': [ { 'principals': [[xx]]_policy_principals } ] }
}
]
batch_put_document_response = qbusiness_client.batch_put_document(
applicationId = application_id,
indexId = index_id,
dataSourceSyncId = job_execution_id,
documents = docs,
)
stop_data_source_sync_job_response = qbusiness_client.stop_data_source_sync_job(
dataSourceId = data_source_id,
indexId = index_id,
applicationId = application_id
)
max_time = time.time() + 1*60*60
found = False
while time.time() < max_time and bool(found) == False:
list_documents_response = qbusiness_client.list_documents(
applicationId=application_id,
indexId=index_id
)
if list_documents_response:
for document in list_documents_response["documentDetailList"]:
if document["documentId"] == workplace_policy_document_id:
status = document["status"]
print(status)
if status == "INDEXED" or status == "FAILED" or status == "DOCUMENT_FAILED_TO_INDEX" or status == "UPDATED":
found = True
else:
time.sleep(10)
except:
print("Exception when calling API")




Now that the data synchronization is complete, you can start exploring insights from Amazon Q Business. With the newly created Amazon Q Business application, select the Web Application settings tab and navigate to the auto-created URL. This will open a new tab with a preview of the user interface and options that you can customize to fit your use case.








If you’re unable to get answers to any of your questions and get the message “Sorry, I could not find relevant information to complete your request,” check to see if any of the following issues apply:
If none of the above are true then open a support case to get this resolved.
To avoid incurring future charges, clean up any resources created as part of this solution. Delete the Amazon Q Business custom connector data source and client application created in Alation and the Amazon Q Business application. Next, delete the Secrets Manager secret with Alation OAuth client application credential data. Also, delete the user management setup in IAM Identity Center and the SageMaker Studio domain.
In this post, we discussed how to configure the Amazon Q Business custom connector to crawl and index tasks from Alation as a sample. We showed how you can use Amazon Q Business generative AI-based search to enable your business leaders and agents discover insights from your enterprise data.
To learn more about the Amazon Q Business custom connector, see the Amazon Q Business developer guide. To learn more about Alation Data Catalog, which is available for purchase through AWS Marketplace. Speak to your Alation account representative for custom purchase options. For any additional information, contact your Alation business partner.
Alation is an AWS Specialization Partner that has pioneered the modern data catalog and is making the leap into a full-service source for data intelligence. Alation is passionate about helping enterprises create thriving data cultures where anyone can find, understand, and trust data.
Contact Alation | Partner Overview | AWS Marketplace
Gene Arnold is a Product Architect with Alation’s Forward Deployed Engineering team. A curious learner with over 25 years of experience, Gene focuses how to sharpen selling skills and constantly explores new product lines.
Prabhakar Chandrasekaran is a Senior Technical Account Manager with AWS Enterprise Support. Prabhakar enjoys helping customers build cutting-edge AI/ML solutions on the cloud. He also works with enterprise customers providing proactive guidance and operational assistance, helping them improve the value of their solutions when using AWS. Prabhakar holds eight AWS and seven other professional certifications. With over 21 years of professional experience, Prabhakar was a data engineer and a program leader in the financial services space prior to joining AWS.
Sindhu Jambunathan is a Senior Solutions Architect at AWS, specializing in supporting ISV customers in the data and generative AI vertical to build scalable, reliable, secure, and cost-effective solutions on AWS. With over 13 years of industry experience, she joined AWS in May 2021 after a successful tenure as a Senior Software Engineer at Microsoft. Sindhu’s diverse background includes engineering roles at Qualcomm and Rockwell Collins, complemented by a Master’s of Science in Computer Engineering from the University of Florida. Her technical expertise is balanced by a passion for culinary exploration, travel, and outdoor activities.
Prateek Jain is a Sr. Solutions Architect with AWS, based out of Atlanta Georgia. He is passionate about GenAI and helping customers build amazing solutions on AWS. In his free time, he enjoys spending time with Family and playing tennis.
Manuel Rioux est fièrement propulsé par WordPress