During re:Invent 2024 it was announced that Amazon Bedrock now have support for a new kind of model — the reranker.
In a typical RAG workflow the user query is converted into vectors through the help of an embedding model, then the vectors are used to carry out a search in a vector database and the result is passed back as context to the prompt and fed to an LLM. In a Proof of Concept the amount of data being searched and returned is often so small that the above workflow works just fine. But in production, the amount of data is probably much bigger and the results may not be as effective.
1.Reranker to the rescue
This is where a reranker or a rerank model comes into the picture. It acts as a second filter between the vector search and the prompt. Let’s say we want to return up to 50 results from our vector database, but all 50 of these results are most likely not relevant to the user question.
So we take the 50 results and pass them to our rerank model and tell it we want to get the 5 closest matching results back. The rerank model will evaluate the 50 documents against the user query and give us back 5 results with the highest matching documents based on scoring (scores are calculated between 1 and 0 where better is as close to 1 as possible). We can then take these 5 documents and add them to our prompt as context and carry out the user question with the help of an ordinary LLM. The result should in theory be more precise and accurate.
Amazon Bedrock currently support two rerank models, Amazon Rerank 1.0 and Cohere Rerank 3.5. Below I will show you how you can work with them through two available methods in Bedrocks API.
2.Our Options:
The first method is the standard “invoke_model” endpoint in the bedrock-runtime API, and the second option is the “rerank” method available in bedrock-agent-runtime endpoint.
The bedrock-agent-runtime endpoint is as you probably can guess supposed to be part of an agentic workflow and Amazon Knowledge Bases (another managed feature in Bedrock) but it can be used without having to involve any of these features.
The reason I will show you both is because I found each of them to have their own strength and weaknesses. For example the “invoke_model” option is more straight and simple. It allows us to pass documents in the form of strings or json. But it wont allow (at least to my knowledge) to involve rank_fields which can be super effective if you are working with json documents. With rank_fields you basically tell the rerank model which keys it should take into consideration while scoring the documents.
While in the case of the “rerank” method in bedrock-agent-runtime, you get a more complex payload, you can’t mix and match between TEXT (strings) and JSON, but you can leverage the rank_fields.
So let’s get started, the first approach in this demonstration will be the invoke_model path.
# Example 1: Using invoke_model with mixed string and JSON documents
import boto3
import json
bedrock_client = boto3.client("bedrock-runtime")
# Prepare the documents, mixing strings and dictionaries
documents = [
"Stockholm is the capital of Sweden", # String
{"title": "Bangkok", "text": "Bangkok is the capital of Thailand"}, # Dictionary
"Oslo is the capital of Norway", # String
"Udon Thani is a city in Thailand", # String
"Kuala Lumpur is the capital of Malaysia", # String
{"title": "Vientiane", "text": "Vientiane is the capital of Laos"}, # Dictionary
]
# Serialize dictionaries to JSON strings to simulate mixed inputs
serialized_documents = [
json.dumps(doc) if isinstance(doc, dict) else doc for doc in documents
]
# Construct the body
body = json.dumps(
{
"query": "What are three cities in south east asia?",
"documents": serialized_documents,
"top_n": 3,
"api_version": 2,
}
)
# Invoke the Bedrock model
response = bedrock_client.invoke_model(
modelId="cohere.rerank-v3-5:0",
accept="application/json",
contentType="application/json",
body=body,
)
# Process the response
response_body = json.loads(response.get("body").read())
print(response_body)
# Extract the indices of the matching documents
matching_indices = [result["index"] for result in response_body["results"]]
# Retrieve and print the matching documents
matching_documents = [documents[i] for i in matching_indices]
print("Matching Documents:")
for doc in matching_documents:
# Deserialize JSON strings back to dictionaries for readability
if isinstance(doc, str) and doc.startswith("{") and doc.endswith("}"):
doc = json.loads(doc)
print(doc)
Output to the query “What are three cities in south east asia?”
{'results': [
{'index': 1, 'relevance_score': 0.20046411},
{'index': 4, 'relevance_score': 0.19896571},
{'index': 5, 'relevance_score': 0.18133602}
]}
Matching Documents:
{'title': 'Bangkok', 'text': 'Bangkok is the capital of Thailand'}
Kuala Lumpur is the capital of Malaysia
{'title': 'Vientiane', 'text': 'Vientiane is the capital of Laos'}
Second approach with “rerank” method in the bedrock-agent-runtime:
# Example 2: Using rerank with JSON documents and rank_fields
import boto3
import json
bedrock_client = boto3.client("bedrock-agent-runtime")
payload = {
"queries": [
{
"textQuery": {"text": "What is the capital of Sweden?"},
"type": "TEXT",
}
],
"rerankingConfiguration": {
"bedrockRerankingConfiguration": {
"modelConfiguration": {
"modelArn": "arn:aws:bedrock:eu-central-1::foundation-model/amazon.rerank-v1:0",
"additionalModelRequestFields": {"rank_fields": ["city", "text"]},
},
"numberOfResults": 2,
},
"type": "BEDROCK_RERANKING_MODEL",
},
"sources": [
{
"inlineDocumentSource": {
"jsonDocument": {
"city": "Stockholm",
"text": "Stockholm is the capital of Sweden.",
},
"type": "JSON",
},
"type": "INLINE",
},
{
"inlineDocumentSource": {
"jsonDocument": {
"city": "Bangkok",
"text": "Bangkok is the capital of Thailand.",
},
"type": "JSON",
},
"type": "INLINE",
},
{
"inlineDocumentSource": {
"jsonDocument": {
"city": "New Stockholm",
"text": "New Stockholm is a city in USA.",
},
"type": "JSON",
},
"type": "INLINE",
},
],
}
# Construct the payload
rerank_params = {
"queries": payload["queries"],
"rerankingConfiguration": payload["rerankingConfiguration"],
"sources": payload["sources"],
}
# Invoke the Bedrock model
response_body = bedrock_client.rerank(**rerank_params)
print(response_body["results"])
# Extract the indices of the matching documents
matching_indices = [result["index"] for result in response_body["results"]]
# Retrieve and print the matching documents
matching_documents = [payload["sources"][i] for i in matching_indices]
print("Matching Documents:")
for doc in matching_documents:
# Deserialize JSON strings back to dictionaries for readability
if isinstance(doc, str) and doc.startswith("{") and doc.endswith("}"):
doc = json.loads(doc)
print(doc)
Output from the query “What is the capital of Sweden?”
[
{'index': 0, 'relevanceScore': 0.981979250907898},
{'index': 2, 'relevanceScore': 0.006241259165108204}
]
Matching Documents:
{'inlineDocumentSource': {'jsonDocument': {'city': 'Stockholm', 'text': 'Stockholm is the capital of Sweden.'}, 'type': 'JSON'}, 'type': 'INLINE'}
{'inlineDocumentSource': {'jsonDocument': {'city': 'New Stockholm', 'text': 'New Stockholm is a city in USA.'}, 'type': 'JSON'}, 'type': 'INLINE'}
I must say that I do favour the bedrock-agent-runtime approach because it can leverage the “rank_fields” but it do add some complexity to the payload being sent to bedrock.
Conclusion
Overall I am really happy to see Amazon Bedrock implement support for not one but two great rerank models. I know this is a feature many people have been waiting for. And I hope people start to implement these kind of models into their RAG workflow because they can make a big difference.