Retrieval augmented generation (RAG) is a technique that enhances the accuracy and reliability of generative AI models by augmenting their knowledge base with facts from external sources. RAG enables large language models (LLMs) to craft accurate, assertive, and brilliant responses on a subject matter.
In this article, we’ll demonstrate how to use the RAG technique in a modern application. To do so, we’ll create a Flutter application using Langchain for the LLM framework and pgVector, an open-source Postgres extension for vector similarity search.
Before beginning, you’ll need a few things:
- A good understanding of the Flutter framework and dart programming.
- A Neon account.
- An OpenAI API key (meaning you’ll need an OpenAI account).
- And some cookies and coffee.
Demystifying some concepts
With the aid of databases, especially those that support vector capabilities like Neon, we can use the RAG technique to assist LLMs in delivering accurate answers to an end user. Neon is a fully managed serverless Postgres that provides separate storage and computing to offer autoscaling, branching, and bottomless storage. Neon is fully open source under the Apache 2.0 licenses, and we can find the neondatabase on GitHub.
Let’s first demystify some concepts, starting with pgVector. pgVector is a Postgres extension that works with vector embeddings for storage, similarity search, and more. Enabling the pgVector extension in your Neon database simplifies storing vector embeddings as well as easy querying using the inner product (<#>
) or cosine distance (<=>
).
Langchain itself is not an LLM but a framework that aids application development with LLMs. Thus, it enables context-aware applications that need language models to reason.
That raises a burning question: How do these parts relate to one another?
RAG applications usually consist of two components: indexing and retrieval.
The indexing process involves integrating (loading) the external data source, splitting it into smaller pieces, embedding the document as a vector, and storing it.
Langchain handles splitting and embedding by providing the application access to OpenAI’s embedding API. Neon comes into play in the storage process.
For the retrieval process, pgVector uses its vector similarity index capability to search the distance between the query vector and the stored vector in the Neon database. Then Langchain uses OpenAI as an LLM to generate the desired result from the query in natural language.
The following sections will cover all the steps in building our application, from creating a Neon database to building the Flutter application. Let us set up a Neon account and create our database without further ado.
Creating a Neon database
After creating a Neon account, as specified earlier, let’s proceed to sign in to the account by selecting one of the methods provided for user authentication.
After successful sign-in, we’ll be redirected to a Create Project screen on the home page, where we are asked to fill in our desired project name, postgres version, and database name. We can explore more options for changing the branch name to any other name, but let’s leave it as main
for now and click Create project.
Afterward, we are redirected to the home page, where we get a popup showing the connection details to the Neon project we created earlier. We need these details to access the Neon project from our application and copy it to a safe file. And with that, we have successfully created a Neon database for our Flutter application.
Neon provides three database management methods: the Neon CLI(command line interface), the Neon API, and SQL. With SQL, Neon made an SQL editor available to run SQL commands directly on the console. Thus, we will use SQL to manage our Neon database, but we‘ll do so via a Postgres connection from our application to the Neon database.
The Flutter application is a simple chatbot that responds to queries based on the data from the external data source—in this case, a PDF file. Therefore, in the coming sections, we will clone a Flutter template, connect the template to the Neon database, and add the functionalities to implement the RAG technique within the app.
Creating the Flutter application
To begin, we will use a Flutter template application containing a display area, a text area where we will type our query, and a drawer with a button to upload our desired PDF.
To clone the project, run the command below in a terminal:
git clone https://github.com/muyiwexy/neon_rag_with_langchain.git
After cloning the project, run the following command:
flutter pub get
This command obtains all the dependencies listed in the pubspec.yaml
file in the current working directory and their transitive dependencies.
This project uses the Model View Controller (MVC) architecture to handle specific development aspects of the application. The architecture helps us maintain readability by separating the business (core) logic from the UI (presentation layer).
To make things easier to locate, here’s an ASCII representation of the lib
folder structure:
lib/
├─ home/
│ ├─ controller/
│ ├─ model/
│ ├─ view/
│ │ ├─ widgets/
│ │ │ ├─ display_area.dart
│ │ │ ├─ text_area.dart
│ │ ├─ home_page.dart
│ ├─ view_model/
├─ core/
│ │ ├─ dependency_injection/
├─ main.dart
Since we are using the MVC architecture, the UI code is placed in the lib/home/view
folder. To proceed, we need to add some external dependencies necessary for building the application to the pubspec.yaml
file.
dependencies:
file_picker
flutter_dotenv
langchain
langchain_openai
path_provider
postgres
provider
syncfusion_flutter_pdf
After successfully doing this, we’ll create an abstraction for all the services needed throughout this project. Let’s call this abstract class LangchainService
— within it, we will implement the processes involved in implementing the RAG technique. So, next, locate the lib/home/view_model
folder and create a dart file langchain_service.dart
within it. To perform an abstraction, add the code below to the file:
abstract class LangchainService {
// do something
}
Indexing
Load
The load process involves integrating the document into the system, which is usually offline. Thus, to achieve this, we will do the following:
- Use the
file_picker
package to select the files from a local device - Use the
syncfusion_flutter_pdf
package to read the document (PDF) and convert it to text - Use the
path_provider
package to find commonly used file ecosystems such as thetemp
orAppData
directories
Compared to the other services, the load process is offline; thus, we will perform this operation separately from the other processes. To load a file, create an index_notifier.dart
in the lib/home/controller
directory. Next, we create a ChangeNotifier
class, IndexNotifier
, with a final
value of LangchainService
. Also, we will create two global private String
variables, _filepath
and _fileName
, and a getter for the _fileName
variable.
class IndexNotifier extends ChangeNotifier {
late LangchainService langchainService;
IndexNotifier({required this.langchainService});
String? _filepath;
String? _fileName;
String? get fileName => _fileName;
}
In essence, and by the ChangeNotifier
, this class will be one of two files that handle the state management load of the application. Next, we will implement a function that returns a type Document
from the Langchain
package. We will use the method to pick a PDF document from our local device and assign the file type and name to the String
variables created earlier.
Also, we will have a Future
function that converts PDFs to text, which is loaded as Documents
using the TextLoader
class from Langchain
.
class IndexNotifier extends ChangeNotifier {
// do something
Future<Document> _pickedFile() async {
FilePickerResult? result = await FilePicker.platform
.pickFiles(type: FileType.custom, allowedExtensions: ['pdf']);
if (result != null) {
_filepath = result.files.single.path;
_fileName = result.files.single.name.replaceAll('.pdf', '').toLowerCase();
final textfile =
_filepath!.isNotEmpty ? await _readPDFandConvertToText() : "";
final loader = TextLoader(textfile);
final document = await loader.load();
Document? docs;
for (var doc in document) {
docs = doc;
}
return docs!;
} else {
throw Exception("No file selected");
}
}
Future<String> _readPDFandConvertToText() async {
File file = File(_filepath!);
List<int> bytes = await file.readAsBytes();
final document = PdfDocument(inputBytes: Uint8List.fromList(bytes));
String text = PdfTextExtractor(document).extractText();
final localPath = await _localPath;
File createFile = File('$localPath/output.txt');
final res = await createFile.writeAsString(text);
document.dispose();
return res.path;
}
Future<String> get _localPath async {
final directory = await getApplicationDocumentsDirectory();
return directory.path;
}
}
We can load a PDF as a Langchain Document
file with the code above.
Split and embed
Now, we need to split and embed the document
and store it. To split and embed a Langchain document, we will return to the abstraction created in the langchain_service.dart
. There, we will update it with the code below:
abstract class LangchainService {
List<Document> splitDocToChunks(Document doc);
Future<List<List<double>>> embedChunks(List<Document> chunks);
}
We will create another file within the same directory called langchain_service_impl.dart
to implement this abstraction. Within this file, we’ll implement the LangchainService
abstraction created earlier. splitDocToChunks
takes in a parameter Document
, which is returned from the _pickedFile
method in the IndexNotifier
class earlier. It then gets the page content.
Then, we use the RecursiveCharacterTextSplitter
object to create a document split text into several 1000-character chunks and return it as a Document
list.
Next, we will pass the Document
list to the embedChunks
method, which then creates vector embeddings of this List and returns it as a List< List <double>>
.
Below is how the code should look:
class LangchainServicesImpl extends LangchainService {
final OpenAIEmbeddings embeddings;
LangchainServicesImpl({
required this.embeddings,
});
@override
List<Document> splitDocToChunks(Document doc) {
final text = doc.pageContent;
const textSplitter = RecursiveCharacterTextSplitter(chunkSize: 1000);
final chunks = textSplitter.createDocuments([text]);
return chunks
.map(
(e) => Document(
id: e.id,
pageContent: e.pageContent.replaceAll(RegExp('/\n/g'), " "),
metadata: doc.metadata,
),
)
.toList();
}
@override
Future<List<List<double>>> embedChunks(List<Document> chunks) async {
final embedDocs = await embeddings.embedDocuments(chunks);
return embedDocs;
}
}
Equally, we will update the IndexNotifier
class to control the state of our application while going through all these processes:
Store
So far, we’ve successfully enabled loading, splitting, and embedding the PDF document. Now, we need to store the split and embedded data, which is where the Neon database we created earlier comes in. To do this, we will update the LangchainService
abstraction with the code below:
abstract class LangchainService {
// the abstraction above
Future<bool> checkExtExist();
Future<bool> checkTableExist(String tableName);
Future<String> createNeonVecorExt();
Future<String> createNeonTable(String tableName);
Future<String> deleteNeonTableRows(String tableName);
Future<void> storeDocumentData(Document doc, List<Document> chunks,
List<List<double>> embeddedDoc, String tableName);
}
The checkExtExist
method checks if the vector
extension exists and returns the result from the execution. Also, the checkTableExist
method checks if a table (the private String variable _filename
created earlier) exists within the Neon database and returns the result from the execution, which is a boolean. To do this, we will add the code below to implement the LangchainService
in the langchain_service_impl.dart
file:
Note: Earlier, we mentioned that Neon allows us to write SQL commands directly on the console through their SQL Editor. Equally, we can execute these SQL commands programmatically from Flutter using the
Postgres
package.
The methods createNeonVecorExt
, createNeonTable
, and deleteNeonTableRows
, handle the creation of pgVector extension, a Neon database table (the private String variable _filename
created earlier), and the deletion of any stored rows (this is in the case the user wants to update the document in the database table and there is a name clash) respectively. When creating the Neon table, we will simultaneously activate vector indexing using the ivfflat
algorithm from the pgVector extension. This algorithm provides an efficient solution for approximate nearest neighbor search over high-dimensional data like embeddings.
For the storeDocumentData
we will pass the Langchain Document
, the chunks, the embedded chunks, and the table name to it and execute an INSERT
command in transaction.
Now, we will update the IndexNotifier
to implement the changes to our LangchainServices
accordingly. We will use the checkExtExist
and checkTableExist
as conditional checkers to run the createNeonVecorExt
, createNeonTable
, and deleteNeonTableRows
as they satisfy each condition. Here is the updated code below:
We have successfully stored the PDF data within the database table as an id(text), Metadata (Map or JSON
), and embedding.
To utilize the ChangeNotifier
class within our application, we will mount the ChangeNotifier
class using Provider
for dependency injection. In this process, we will connect the Neon database and our Flutter application using the Postgres
package.
The way to do this is by wrapping the initial stateless widget in the main.dart
with a MultiProvider
. Doing this mounts our Providers
and ChangeNotifierProviders
to the widget tree, allowing us to monitor the state of our application easily. Thus, we will head to the lib/core/dependency_injection/
folder, create a file called provider_locator.dart
, and paste the code below:
The ProviderLocator
class does the following:
- Defines a method
getProvider
that:- Creates a
LangchainService
instance. - Returns a
MultiProvider
with aLangchainService
provider and aChangeNotifierProvider
forIndexNotifier
.
- Creates a
- Defines a method
_createLangchainService
that:- Creates a PostgreSQL connection.
- Creates an
OpenAIEmbeddings
instance. - Creates an
OpenAI
instance. - Returns a
LangchainServicesImpl
instance with the created connection, embeddings, and OpenAI.
- Defines a method
createPostgresConnection
that:- Tries to establish a PostgreSQL connection with specified settings from the Neon connection details earlier.
- If the connection fails, it retries up to a maximum number of times.
- If the connection is not established after maximum retries, it throws an exception.
- Defines a method
_createEmbeddings
that returns anOpenAIEmbeddings
instance. - Defines a method
_createOpenAIConnection
that returns anOpenAI
instance.
Note: For security reasons, we will use a
.env
file to secure our passkey. Kindly follow this article to learn more about how to useflutter_dotenv
.
Now, let’s update the main.dart
file with the code below:
Retrieval
Retrieval is a streamlined process commonly divided into two processes:
- Retrieve: This is done by comparing the vector embedding of a user query with the closest available result present in the database. We perform this comparison using the cosine similarity search to compare a vector with another. Thus, when we get the closest results, we can use it for the second process.
- Generate: After getting the closest result, we can use it as an assistant for the LLMs to generate responses based on that particular information.
To do this programmatically, we will head to the langchain_service.dart
and in the abstraction, add this code below:
abstract class LangchainService {
// do something
Future<String> queryNeonTable(String tableName, String query);
}
The method above returns a string response by following the retrieval process above. Here is the code for the implementation below:
The code above does the following:
- Implements a method
queryNeonTable
that:- Embeds the query using the
embeddings
object. - Executes a SQL query on the
connection
to get similar items from the specified table. - Converts the result into a list of
Metadata
objects. - If Metadata is not empty, it concatenates the page content, creates a
StuffDocumentsQAChain
object, and calls it with the concatenated content and the original query to get a response. - If Metadata is empty, it returns a default message: “Couldn’t find anything on that topic”.
- Embeds the query using the
We will then create a separate ChangeNotifier
class to handle the state of the query. This follows the same pattern as that of the IndexNotifier
class with some slight changes. Here is the code below:
import 'package:flutter/material.dart';
import '../view_models/langchain_services.dart';
class Message {
String? query;
String? response;
Message({required this.query, this.response = ""});
}
enum QueryState {
initial,
loading,
loaded,
error,
}
class QueryNotifier extends ChangeNotifier {
late LangchainService langchainService;
QueryNotifier({required this.langchainService});
final List<Message> _messages = [];
final _messagesState = ValueNotifier<List<Message>>([]);
ValueNotifier<List<Message>> get messageState => _messagesState;
final _queryState = ValueNotifier<QueryState>(QueryState.initial);
ValueNotifier<QueryState> get queryState => _queryState;
userqueryResponse(String tableName, String query) async {
_messages.add(Message(query: query));
_messagesState.value = List.from(_messages);
try {
_queryState.value = QueryState.loading;
String response = await langchainService.queryNeonTable(tableName, query);
final List<Message> updatedMessages = List.from(_messages);
updatedMessages.last.response = response;
_messagesState.value = updatedMessages;
_queryState.value = QueryState.loaded;
} catch (e) {
// Handle errors if necessary
print(e);
_queryState.value = QueryState.error;
await Future.delayed(const Duration(milliseconds: 2000));
_queryState.value = QueryState.initial;
}
}
}
The code above does the following:
- Defines a
Message
class withquery
andresponse
fields. - Defines an
enum
calledQueryState
with states:initial
,loading
,loaded
, anderror
. - Creates a
QueryNotifier
class that extendsChangeNotifier
:- Initializes a
LangchainService
object. - Maintains a list of
Message
objects. - Defines
ValueNotifier
objects formessagesState
andqueryState
. - Defines a method
userqueryResponse
that:- Adds a new
Message
to_messages
. - Sets the
queryState
toloading
. - Calls
queryNeonTable
method oflangchainService
to get a response. - Updates the last message’s response and sets
queryState
toloaded
. - Handles errors by setting
queryState
toerror
, then back toinitial
after a delay.
- Adds a new
- Initializes a
After, we will update the getProvider
method in the provider_locator.dart
file by adding another ChangeNotifierProvider
class to the MultiProvider
. Here is how the code is below:
class ProviderLocator {
// provider tree
static Future<MultiProvider> getProvider(Widget child) async {
final langchainService = await _createLangchainService();
return MultiProvider(
providers: [
Provider<LangchainService>.value(value: langchainService),
// IndexNotifier
ChangeNotifierProvider<IndexNotifier>(
create: (_) => IndexNotifier(langchainService: langchainService),
),
// QueryNotifier
ChangeNotifierProvider<QueryNotifier>(
create: (_) => QueryNotifier(langchainService: langchainService),
),
],
child: child,
);
}
}
That is it — we should have the result for the application as below:
Here is a link to the repository containing all the code.
Conclusion
Retrieval augmented generation (RAG) enhances LLMs by integrating techniques to ensure a factual and contextual response. The collaboration of a vector database like Neon with the RAG technique and Langchain elevate the capabilities of learnable machines to unprecedented levels. This leads to more brilliant virtual assistants, data analysis tools, and more.
In conclusion, the integration of RAG with pgVector and Langchain is a testament to the incredible prowess of AI and its hopeful future.
Resources
Here are some resources that will guide you more in this journey: