Introduction
Small language models (SLMs) often take a back seat to their larger GenAI counterparts, but I’ve been pondering how they could be used in practical, everyday scenarios like AWS account management. While it’s true that GenAI is the industry’s hype word, applying these tools to familiar problems can offer an excellent framework for learning and innovation. That’s exactly what I aim to focus on in this post.
The goal? To enable the Bedrock Agent to handle queries like:
"Get July's costs for Bill's AWS accounts." or
"List all prod - accounts"
Here’s how it works: The agent will first fetch all AWS accounts where Bill is tagged as the owner using the Organizations API. It will then use the Cost Explorer API to perform cost analysis for each of those accounts. By orchestrating these steps, the Bedrock Agent acts as a practical assistant for AWS account management tasks.
In the next session, We’ll take a deeper dive into both of these AWS APIs to understand how they work together to achieve our goal.
AWS Organizations API
AWS Organizations Tags offer a powerful way to structure and manage AWS accounts by attaching metadata in the form of tags. These tags can be used to categorize accounts and answer queries like, “Which department owns this account?” or “What environment does this account belong to?”
By leveraging tags such as Owner
and Environment
, you can create a simple framework for managing accounts. In this POC, these tags play a central role in enabling the Bedrock Agent to fetch additional information about AWS accounts. That metadata is then translated into AWS account IDs, which is essential for the next step: cost analysis. For more details, see the AWS Organizations Tagging documentation.
Cost Explorer API
For cost analysis, we’re keeping things simple. The Bedrock Agent queries the Cost Explorer API for account costs over a specified period. While this method is far from comprehensive—anyone familiar with AWS cost management knows there’s much more nuance—it serves as a solid starting point.
One particularly cool aspect of using a language model interface is its ability to interpret natural language into precise API parameters. For example, the Cost Explorer API expects a "Start date for filtering costs in YYYY-MM-DD format." When a query like "last month's costs" is submitted, the model intelligently converts "last month" into the correct date range and passes it as a parameter to the API. This highlights the value of combining language models with AWS APIs to streamline complex workflows.
About Foundation Models
The first version of this solution came to life in the fall of 2024. Initially, I used Claude 3.5 Haiku and Sonnet models. While they delivered excellent performance, it felt like overkill for the small, straightforward prompts in this project. So, when Amazon introduced the Nova models at re:Invent 2024, I jumped at the chance to see how well they’d perform for this use case. For this demo, I opted to use the Amazon Nova Lite model, which proved to be a solid fit.
However, Nova Lite's smaller size did reveal a few limitations:
Accuracy Issues: When validating cost analysis results (e.g., "All dev accounts' costs in November") using a basic calculator, the numbers often didn’t match. Even for simpler queries like "Get account count by environment," the model sometimes returned totals that exceeded the actual number of accounts in the mock dataset.
Token Limitations: Nova Lite has a maximum output of 5,000 tokens. For queries involving tens of accounts with metadata, the responses often exceeded this limit, causing the output to stop mid-sentence. While you can prompt the agent to continue with commands like "go on," this disrupts the workflow.
That said, Nova Lite's speed is a standout feature. For short prompts, the quick response time makes for a snappy and efficient experience. It also integrates seamlessly with Bedrock Agents and Lambda functions, making it an excellent choice for building lightweight solutions.
For organizations with hundreds of accounts or more complex queries, however, using a larger model would make more sense, as the data volume and accuracy demands increase.
Overview of the Solution
Architecture
This workflow diagram demonstrates how a user's query is processed and translated into API calls.
invoke_agent.py
cli
To interact with the Bedrock Agent, I created a simple Python script called invoke_agent.py
. This script serves as a command-line interface, making it easy to submit queries to the agent. Additionally, it prints the input and output token counts for each query, offering insights into the efficiency and resource usage of the interactions.
Implementation Walkthrough
If you’d like to follow along, all source code is available in this GitHub repository: https://github.com/markymarkus/bedrock-agent-accounts.
To use this solution, you’ll also need access to the amazon.nova-lite-v1:0
Bedrock foundation model. Currently, Amazon Nova models are only available in the us-east-1
region, so be sure to deploy your resources there.
Using Mock vs. Real Data
In my private life, I only manage a handful of accounts—and I’d rather not share their details publicly. To address this, I developed functions to generate mock account and billing data. For this POC, I worked with a simulated AWS Organization containing 30 accounts, each enriched with mock metadata and billing information to provide a realistic testing environment.
By default, this solution uses mock data for account and cost queries. If you’d like to experiment with real account and cost data, you can disable the mock data by deploying the CloudFormation template with the parameter: EnableMockData: false
.
Deployment
aws s3 mb s3://temp-bedrock-cf-staging
aws cloudformation package --s3-bucket temp-bedrock-cf-staging --output-template-file packaged.yaml --region us-east-1 --template-file template.yaml
aws cloudformation deploy --stack-name dev-bedrock-accounts-agent --template-file packaged.yaml --region us-east-1 --capabilities CAPABILITY_IAM
At this point the Bedrock account agent is ready. Next update created agent's ID to invoke_agent.py
:
aws cloudformation describe-stacks --stack-name dev-bedrock-accounts-agent --query 'Stacks[0].Outputs[?OutputKey==`AgentId`].OutputValue'
And then copy-paste the ID to `invoke_agent.py` agent_id variable.
...And ACTION!
Here we have finally recording of running queries to Bedrock Agent.
DIRECT LINK to the image, in case dev.to doesn't allow animated GIF
Key takeaways
Throughout this project, several important points emerged. While seasoned AI professionals might find some of these familiar, they proved invaluable for my work and could benefit others as well:
LLMs Don’t Know the Current Date
By default, Claude’s knowledge is limited to 2023, meaning prompts like "Get last month's costs" return results based on outdated information. To resolve this, I modified the invoke_agent.py script to append the current date to promptSessionAttributes
.
Bedrock Agents Are Token-Hungry
It’s impressive to observe how Foundation models reasons through API calls, orchestrating actions like retrieving account lists and fetching costs for each account. However, each step of this reasoning process requires the model to articulate its logic in the prompt, consuming a significant amount of tokens in the process.
Let the LLM Handle the Work
In earlier iterations of the project, I required the agent to pass an exact email (e.g., "list all accounts where the owner is markus_toivakka@myowndomain.fi") to Lambda function. Lambda function then did filtering and returned corresponding account list. However, by shifting the filtering logic to the LLM, I enabled it to retrieve a full account list and filter accounts itself, allowing for more flexible queries like even "give all markus's accounts".
Conclusion
This exercise gave me a clearer understanding of the key pieces of the puzzle: how AI Agents can automate tasks and how seamlessly an LLM interface can be added to an existing API. However, it also highlighted a crucial point—LLMs and AI Agents consume potentially a lot of data. There's an important trade-off to consider: Should filtering be handled in a Python function, or should the data be ingested and processed by the LLM itself?
While getting the solution described in this blog to work was fairly straightforward, optimizing it for cost efficiency and grasping the broader dimensions of data processing in AI systems presents a much more complex challenge.