Recently, AI coding assistants have gained popularity, promising to revolutionise the way we code (get all us fired). The tide began in 2021 with GitHub Copilot and ChatGPT took it by storm in late 2022.
The above two products represent the 2 flavours of the assistants:
- Text completion plugins - you type something in IDE and get auto suggestions that you can accept and get a code snippet inserted at the cursor position. GitHub Copilot, Tabnine, Amazon CodeWhisperer.
- Chats - you have a separate window to talk and then copy and paste the snippet into IDE. GitHub Copilot X, ChatGPT.
In this blog post, I will explore the capabilities of Sourcegraph's Cody, an AI coding assistant that leverages codebase understanding to provide contextualized suggestions and recommendations.
Current Constraints of AI Coding Assistants
While Copilot brought cautious interest it was ChatGPT that blew my mind. Being able to write runnable code in unknown languages, ideate and iterate over own code, discuss optimisation problems, save time bootstrapping new projects or quickly fixing seldomly used parts. It was true magic!
Yet the backlash didn't take long. I found myself using AI in coding less often. I can outline 2 major reasons:
-
Crisis of faith. Too often LLM hallucinations and inaccuracies derailed the process: catching tricky bugs in runtime, spending more time debugging. After all reading the code that you didn't write requires more effort.
- IMO writing code is also a learning process, it gives time and depth. Generating code hinders that aspect.
- To much work putting up relevant context. When defining a problem for ChatGPT you must invest focus and effort into writing exhaustively what is that you want, making sure you provide all that's needed, copying and pasting code snippets from various parts of solution.
With the first part there's little that can be done (at least now) and one using AI should get used to extensively reviewing it's output and grasp the skill.
For the second part the task seems to be a puzzle, rather than a mystery.
I even tried creating my own AI plugin for VSCode which partly solved 2nd problem (it tries to smartly prime OpenAI model with surrounding code and plug the output directly into IDE).
Still a solution that can let Large Language Model easily navigate the entire code base, give it understanding of code structure, locate relevant dependencies and put up a good and exhaustive context for the given task... That seemed like a next level that could bring my trust in AI coding back and increase the usage of AI tooling.
Codebase-aware Assistants
The first tool that specifically pitched its' superpowers via solution-wide scope was Replit. Yet the feature is only available as part of their online IDE and there're no plugins for your IDE of chose.
Hence the second tool.
Cody
.. is an AI coding assistant that writes code and answers questions for you by reading your entire codebase and the code graph.
Cody uses a combination of Sourcegraph's code graph and Large Language Models (LLMs) to eliminate toil and keep human devs in flow. You can think of Cody as your coding assistant who has read through all the code in open source, all the questions on StackOverflow, and your own entire codebase, and is always there to answer questions you might have or suggest ways of doing something based on prior knowledge.
To start using the assistant you need to download a desktop app, login and point to repository folder. From this point you can start "chatting to your code" via the UI. For a closer integration there're IDE plugins, which use Cody app as a local server executing the requests. I used the one for VSCode (Cody AI).
My Trials
I tested Cody on two different Flutter repositories: a small, freshly created project (https://github.com/maxim-saplin/ambilytics) and a larger and longer living project (https://github.com/maxim-saplin/data_table_2). My trials aimed to assess Cody's abilities in daily routine solving task at hand.
First Impressions
After pointing Cody to one of my GitHub repos, it could navigate the full directory structure and answer questions about the overall codebase. However, it still exhibited the typical LLM limitations around factual consistency that I've seen with ChatGPT.
When I asked it to list all the .dart
files, it omitted the test files located in a separate /tests
folder. The next question was if there were any tests, it was able to list those files (again, a pattern noticed in ChatGPT when you ask a follow up question and LLM correct its' previous error). Lastly I asked it to clarify this inconsistency, it gave a nonsensical response.
So while Cody scans the full codebase, it doesn't necessarily develop a coherent understanding of it.
Task 1: Import Missing Dependencies 🍊
I've started with smaller repo. My first task was to add multiple missing dependencies in currently open .dart
file (copied and pasted to solution folder) file and update pubspec.yaml
file (think package.json). It was a partial success: most dependencies where identified, some incorrectly (flutter_test
instead of test
). Cody correctly identified pubspec.yaml
to be changed (even while it was not open in editor). I had to manually insert the dependencies (no automation). Moreover, the suggested versions were outdated (stale training data and no connection to package manager).
Task 2: Generate Unit Tests 🍅
My second task was to generate unit tests from scratch using Cody's Recipes tab. Unfortunately, the tests it produced wouldn't compile - it referenced non-existent variables and imported the wrong testing packages. When I asked why Cody suggested inserting test-specific code into production code, it offered to use mocks.
Later when I wrote these tests myself (with no help of AI), I realized it required some upfront design and refactoring to make the code testable. So Cody's strictly bottom-up approach resulted in low quality. A more experienced developer would start top-down with a testing strategy.
That looks like a failure to me.
Task 3: Code Smell Recipe 🍊
Cody did better at finding code smells - it flagged some unused imports and long methods for me to clean up. This kind of task plays more to the LLM's strengths since it's just pattern matching on the existing code rather than synthesizing brand new code.
The suggestions were helpful, though not integrated into my IDE. I had to manually scan the transcript, search for ocurence in the file and apply each recommendation, rather than just clicking on them and jumping to the right place.
Again, minimal productivity boost, yet the task is done - a it is partial success to me.
Task 4: Code Completion 🍅
Unfortunately, it was so slow to the point of being unusable compared to CodeWhisperer. I even didn't notice the feature right away cause it was not able to keep up with typing and auto suggestion were hard to come by. Failure.
Task 5: Fixing Failing Unit Test 🍅
In my fifth task I switched back to larger repository. I asked Cody to fix a failing unit test.
I copied and pasted error message from the failing test. Expectation was that LLM will review both the class being tested and the unit-test and will be able to get a deep understanding of both parts.
The root cause - wrong icons expected in the test (correct icons can be found in a private class at the bottom of the same file)
Cody failed by giving generic directions and no fixed code snippet.
Task 6: Generating Simple Unit Test 🍅
My final task involved generating a simple unit test. Cody failed to find the correct context, and the tests created didn't make sense and tested nothing. Failure
Test Results
🍏 Success: 0/6
🍊 Partial success: 2/6
🍅 Failure: 4/6
Conclusion, Cody fails to impress.
Cody's code structure awareness brings little use. It can't understand the code base well, pick up local practices, or find relevant dependencies.
First of all code awareness didn't give any edge. I expected way better/more work done with less effort on my end (explaining and prepping). But it felt like I kept using ChatGPT with IDE integration and there was little to no code awareness in our dialogs.
Secondly I had an expectation of more automation and changes done by the tool in multiple parts of the solution, not copy-and-paste everywhere. While there might be no such promise, this is an intuitive "next big thing" feature making AI coding assistants close to real developers. Not yet.