Photo by Katie Rodriguez on Unsplash
Introduction
It is hard to follow the latest news from the gen AI world. The hype is overwhelming and many of us might feel the "AI fatigue". On the other hand, the possibilities opened by LLMs are truly fascinating.
Even staying away from the latest news outlets it is clear that one of the hottest topics is so-called "agency". It would be nice to see how models can "do things".
To be honest I am not that excited by models "doing things" like booking taxis or ordering groceries. I am more intrigued by the possibility of using "agent" capabilities to gather relevant information to answer the user's query.
In this blog post, I would like to explore how LLM will use the provided "tools" to gather data.
General flow
For simplicity's sake, the flow is straightforward.
The user sends a query to our app. Then, based on a list of provided "tools", the model decides if the query could be answered using the provided toolset. If yes - it calls picked functions with parameters extracted from the user's query. Once data is gathered the answer for the user is prepared based on it.
Stack
The description above fits the AWS Bedrock Agents. They are super handy.
I will, however, use the Step Function to construct the flow by myself. This way I won't be limited to models supported by AWS Bedrock.
As LLM I use Claude 3. Anthropic released (as a public beta) new APIs for function calls
. New APIs are not yet available on Bedrock, that's why I use them directly from Anthropic.
Finally, I use the Lambda function to glue the logic and perform as tools
.
I utilize AWS CDK as IaC and Rust for Lambda's code (with cargo lambda
).
Solution
Code is available in this repo
Let's create the AWS CDK project.
mkdir llm_step && cd $_
cdk init --language typescript
Flow I plan to prepare looks like this:
We need to let the model know what tools it could use. If it decides to do so, we are responsible for calling the tools on our side and providing a model back with the whole context including used tools.
LLM call
AWS Step Functions have built-in actions for calling AWS Bedrock. In my case I need to call third-party API and use, well, Call third-party API
action.
Call third-party API
action needs a Connection
to be configured with the proper authentication method. The shape of the expected header is defined in the Anthropic docs
In CDK it looks like this
// lib/llm_step-stack.ts
const llmAPIKey = new secrets.Secret(this, "LlmApiKey", {});
const llmDestinationConnection = new events.Connection(
this,
"LlmDestinationConnection",
{
authorization: events.Authorization.apiKey(
"x-api-key",
llmAPIKey.secretValue
),
description: "LLM Destination Connection",
}
);
For some reason I wasn't able to deploy the stack without creating a separate secret. Based on AWS CDK examples from documentation it should work without it
Once the connection is in place I can define a call to LLM
// lib/llm_step-stack.ts
const callLlmTask = new tasks.HttpInvoke(this, "Call LLM", {
apiRoot: "https://api.anthropic.com",
apiEndpoint: sfn.TaskInput.fromText("/v1/messages"),
body: sfn.TaskInput.fromObject(getLlmPrompt()),
connection: llmDestinationConnection,
headers: sfn.TaskInput.fromObject({
"Content-Type": "application/json",
"anthropic-version": "2023-06-01",
"anthropic-beta": "tools-2024-04-04",
}),
method: sfn.TaskInput.fromText("POST"),
resultSelector: {
"role.$": "$.ResponseBody.role",
"content.$": "$.ResponseBody.content",
"stop_reason.$": "$.ResponseBody.stop_reason",
},
resultPath: "$.taskResult",
});
Headers, content type, and URL are defined in docs. I moved the prompt to the other file to avoid bloating the infrastructure definition. The result of the call maps role
, content
, and stop_reason
. The result is passed as taskResult
.
LLM prompt
I am only scratching the surface of the prompt preparation. Basically I followed instructions from tool use
examples on the Anthropic site
The API call contains defined fields: model
, max_tokens
, system
, tools
, and messages
.
System
is a field to pass additional instructions for the model.
Tools
is a list of tools defined with JSONSchema
Messages
is a chain creating a conversation between the user and the assistant.
My prompt is defined in a separate file:
export type LLMPrompt = {
model: string;
max_tokens: number;
system: string;
tools: any[];
messages: any[];
};
export const getLlmPrompt = () => {
return {
model: "claude-3-sonnet-20240229",
max_tokens: 400,
system:
"Before answering the question, please think about it step-by-step within <thinking></thinking> tags. Then, provide your final answer within <answer></answer> tags. Skip the introduction and start from <thinking> tag.",
tools: [
{
name: "get_weather",
description:
"Get the current weather in a given location. Weather is defined base on city name and state or country.",
input_schema: {
type: "object",
properties: {
location: {
type: "string",
description:
"The city and state, <example>San Francisco, CA</example> <example>Berlin, Germany</example>",
},
},
required: ["location"],
},
},
{
name: "get_restaurants",
description:
"Get the list of recommended restaurants in the given city. It provides information if the given facility offers outdoor seating. Restaurants are grouped by reviews from guests",
input_schema: {
type: "object",
properties: {
location: {
type: "string",
description:
"The city and state, <example>San Francisco, CA</example> <example>Berlin, Germany</example>",
},
},
required: ["location"],
},
},
],
"messages.$": "$.messages",
};
};
What is important here is to explicitly prompt Claude to document the thinking process. This is a simple way to improve the quality of responses (Opus - the biggest model from the Cluade 3 family will do it automatically)
I use the JSONPath notation to let Step Functions know that for this task I will use messages from the task's input.
Handle LLM response
Each time model can decide to use tools or to skip. I have a choice
task to check if I need to call the lambda function with tools, or if should I just pass the final answer. I can do it based on stop_reason
field in the taskResult
// lib/llm_step-stack.ts
const passAnswer = new sfn.Pass(this, "Answer");
const callToolsTask = new tasks.LambdaInvoke(this, "Call tools", {
lambdaFunction: toolsLambda,
resultSelector: {
"messages.$": "$.Payload",
},
}).next(callLlmTask);
const choiceIfUseTool = new sfn.Choice(this, "Choice if use tool");
choiceIfUseTool.when(
sfn.Condition.stringEquals("$.taskResult.stop_reason", "tool_use"),
callToolsTask
);
choiceIfUseTool.otherwise(passAnswer);
Lambda function for tools
I use Rust for my lambda. I don't want to paste the whole code into this post (it is in this repo). Let me just stress a few moments.
Deserialize LLM answer
LLM doesn't run any tool. It returns a response, which contains instructions about tools to be used, but it's our responsibility to use them. The key thing is the structure of those instructions. They might look this way:
{
"id": "msg_01Aq9w938a90dw8q",
"model": "claude-3-opus-20240229",
"stop_reason": "tool_use",
"role": "assistant",
"content": [
{
"type": "text",
"text": "<thinking>I need to use the get_weather, and the user wants SF, which is likely San Francisco, CA.</thinking>"
},
{
"type": "tool_use",
"id": "toolu_01A09q90qw90lq917835lq9",
"name": "get_weather",
"input": {"location": "San Francisco, CA", "unit": "celsius"}
}
]
}
What is awesome is that the LLM answer is structured and can be easily converted to the domain and used down the pipeline.
One of the neatest features of Rust is the powerful type system. It is easy to define LlmToolRequest
which can be either text
or tool_use
#[derive(Debug, Serialize, Deserialize, Clone)]
#[serde(untagged, rename_all = "snake_case")]
pub(crate) enum LlmToolRequest {
LlmToolTextRequest(LlmToolTextRequest),
LlmToolUseRequest(LlmToolUseRequest),
}
In the same way, the input for the given function calls is also serialized into specific types and can be easily used in the application
#[derive(Debug, Serialize, Deserialize)]
#[serde(untagged, rename_all = "snake_case")]
pub(crate) enum LlmToolInput {
GetWeather(GetWeatherToolInput),
GetRestaurantsToolInput(GetRestaurantsToolInput),
}
Usually, it is ok to let serde_json
deserialize JSON and figure out all types. In my example, both tools
use the same shape of parameters, so I need to add an extra step and understand what type of tool is called.
To do so I initially let the input
be a generic serde_json::value::Value
//...
#[derive(Debug, Serialize, Deserialize, Clone)]
pub(crate) struct LlmToolUseRequest {
#[serde(rename = "type")]
pub(crate) request_type: String,
pub(crate) id: String,
pub(crate) name: String,
pub(crate) input: serde_json::value::Value,
}
//...
Then I pattern match the request type using a deserialized name
//...
let tool_result = match req_use.name.as_str() {
"get_weather" => {
let inp = serde_json::from_value::<GetWeatherToolInput>(input).unwrap();
get_weather(req_use.id.clone(), inp)
}
"get_restaurants" => {
let inp = serde_json::from_value::<GetRestaurantsToolInput>(input).unwrap();
get_restaurants(req_use.id.clone(), inp)
}
_ => panic!("unknown tool name"),
};
//...
Return the answer from lambda
The result of the Lambda will be passed back to the LLM call step in the state machine. It needs to be constructed from 3 pieces:
- all previous messages created during the flow
- last LLM response used to call tools
- result from tools
The first two can be passed to the result. The last one needs to have a specific shape and tool_use_id
passed with previous messages.
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_01A09q90qw90lq917835lq9",
"content": "15 degrees"
}
]
}
In the documentation are examples of passing back errors or empty responses.
It is a good idea to keep the response from the tool simple so LLM can understand and use it. Sometimes we would need to pass more complex data.
As far as I understood, Claude 3 understands the content structured with XML tags.In the get_restaurant
function I return a list of two restaurants with some details. My function looks like this:
//...
pub(crate) fn get_restaurants(
tool_use_id: String,
input: GetRestaurantsToolInput,
) -> LlmToolResult {
println!("checking restaurants for {}", input.location);
LlmToolResult {
result_type: String::from("tool_result"),
tool_use_id,
content: String::from(
r#"
<restaurants>
<restaurant>
<name>Restaurant ABC</name>
<address>Street 111</address>
<phone>12345678</phone>
<website>www.restaurant1.com</website>
<cuisine>Italian</cuisine>
<outdoor>true</outdoor>
</restaurant>
<restaurant>
<name>Restaurant XYZ</name>
<address>Street 999</address>
<phone>987654</phone>
<website>www.restaurant2.com</website>
<cuisine>French</cuisine>
<outdoor>false</outdoor>
</restaurant>
</restaurants>
"#,
),
}
}
Deploy
Lambda Function is defined in the CDK
// lib/llm_step-stack.ts
const toolsLambda = new lambda.Function(this, "llm_tools_lambda", {
runtime: lambda.Runtime.PROVIDED_AL2,
code: lambda.Code.fromAsset(
"lib/functions/llm_tools/target/lambda/llm_tools/"
),
handler: "not.required",
memorySize: 256,
timeout: cdk.Duration.seconds(30),
});
In VSCode we can have a look at the shape of state machine flow defined with CDK
To deploy the solution, once the rust function is built with cargo lambda
it is enough to run
cdk bootstrap
npm run build && cdk deploy
Testing
Let's start from basic query
As expected, LLM decided to call the tool to check the weather in the given city. Interestingly enough it figured out in which country Warsaw is located and augmented input parameters with this knowledge. The full chain of messages including the final result:
{
"messages": [
{
"content": "What is the current weather in Warsaw",
"role": "user"
},
{
"role": "assistant",
"content": [
{
"type": "text",
"text": "<thinking>\nTo get the current weather in Warsaw, I can use the \"get_weather\" tool, providing \"Warsaw, Poland\" as the location parameter."
},
{
"type": "tool_use",
"id": "toolu_0192GHrwDaPKDhe5PryN9zqn",
"name": "get_weather",
"input": {
"location": "Warsaw, Poland"
}
}
]
},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_0192GHrwDaPKDhe5PryN9zqn",
"content": "The weather is sunny, 20 degrees"
}
]
}
],
"taskResult": {
"role": "assistant",
"stop_reason": "end_turn",
"content": [
{
"type": "text",
"text": "\n</thinking>\n\n<answer>\nThe current weather in Warsaw is sunny with a temperature of 20 degrees Celsius.\n</answer>"
}
]
}
}
Let's check something different. What are football teams in Madrid"
As expected the model decided not to call tools. In this case it simply refused to give the answer:
{
"messages": [
{
"role": "user",
"content": "What are football teams in Madrid"
}
],
"taskResult": {
"role": "assistant",
"stop_reason": "end_turn",
"content": [
{
"type": "text",
"text": "Okay, let me think through this step-by-step:\n<thinking>\nTo get a list of football teams in Madrid, I will need to use some external data source or API that has information on sports teams in different cities and locations. Unfortunately, the tools provided here do not seem to have that kind of information about sports teams.\n\nThe \"get_weather\" tool allows getting weather information for a location, and the \"get_restaurants\" tool provides recommended restaurants in a city, but neither of these contain data about sports teams.\n</thinking>\n<answer>\nUnfortunately, I do not have access to any tools or data sources that can directly provide a list of football teams in Madrid. The tools available to me are limited to getting weather information and restaurant recommendations for a given city location. I do not have a way to lookup and retrieve details about sports teams or clubs based on the city.\n</answer>"
}
]
}
}
Ok, nice. Now let's try to make LLM use both provided tools.
Find a place for a dinner suitable for the current weather in Barcelona"
{
"messages": [
{
"content": "Find place for a dinner suitable for the current weather in Barcelona",
"role": "user"
},
{
"content": [
{
"text": "Okay, let me think through this step-by-step:\n<thinking>\n1. To find a suitable place for dinner based on the current weather in Barcelona, I first need to get the current weather conditions there.\n2. I can use the \"get_weather\" tool to retrieve the weather information for Barcelona:",
"type": "text"
},
{
"id": "toolu_01Bi8u7Ducrn4ECy6mHSEp7v",
"input": {
"location": "Barcelona, Spain"
},
"name": "get_weather",
"type": "tool_use"
}
],
"role": "assistant"
},
{
"content": [
{
"content": "The weather is sunny, 20 degree",
"tool_use_id": "toolu_01Bi8u7Ducrn4ECy6mHSEp7v",
"type": "tool_result"
}
],
"role": "user"
},
{
"role": "assistant",
"content": [
{
"type": "text",
"text": "3. Since the weather is nice and sunny with moderate temperature, an outdoor / open-air dining option would be ideal.\n4. To get recommendations for restaurants with outdoor seating in Barcelona, I can use the \"get_restaurants\" tool:"
},
{
"type": "tool_use",
"id": "toolu_01MjmMU51eD9Z61XKB7xEz24",
"name": "get_restaurants",
"input": {
"location": "Barcelona, Spain"
}
}
]
},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_01MjmMU51eD9Z61XKB7xEz24",
"content": "\n<restaurants>\n <restaurant>\n <name>Restaurant ABC</name>\n <address>Street 111</address>\n <phone>12345678</phone>\n <website>www.restaurant1.com</website>\n <cuisine>Italian</cuisine>\n <outdoor>true</outdoor>\n </restaurant>\n <restaurant>\n <name>Restaurant XYZ</name>\n <address>Street 999</address>\n <phone>987654</phone>\n <website>www.restaurant2.com</website>\n <cuisine>French</cuisine>\n <outdoor>false</outdoor>\n </restaurant>\n</restaurants>\n"
}
]
}
],
"taskResult": {
"role": "assistant",
"stop_reason": "end_turn",
"content": [
{
"type": "text",
"text": "</thinking>\n\n<answer>\nBased on the sunny and comfortable weather in Barcelona today, an ideal dinner spot would be Restaurant ABC, which offers outdoor/open-air seating and serves Italian cuisine. The restaurant is located at Street 111, with website www.restaurant1.com and phone 12345678. Its outdoor dining area allows you to enjoy your meal while basking in the nice Barcelona weather.\n</answer>"
}
]
}
}
Summary
Creating flow with Step Functions is a pleasant experience in general. For this simple example, the prepared flow is straightforward but in a more realistic environment, there would be more steps.
The potential limitation is the size of the output which can be passed between tasks, but it could be overcome with the utilization of S3 (native Step Functions integration with Bedrock does it automatically)
It is great to see more structured output from the model. Of course, we can prompt the model to use XML tags in the response, but I wouldn't rely on this. New Anthropic APIs help with handling the flow.
Even though the overall solution looks just like every standard system, it is not deterministic by any meaning. The same query for different cities might result in a totally different response. Sometimes the model decides that it has enough information to provide the response, and sometimes it refuses to answer. It's worth keeping this in mind.
As for picking Claude 3 models, I was impressed by the results provided by Sonnet. The biggest one - Opus (which is not a surprise) does much better, especially when it comes to reasoning about its own decisions. In my case, Haiku failed to use tools in the meaningful way.