~ 2 month ago I came across Cursor.sh, "The AI-first Code Editor".
— "Why would anyone create yet another editor?" was my first thought.
— "There are plenty of AI coding extensions for VSCode" was the second one.
— "Nah, it's not worth the time,"
Yet, against the odds, I downloaded the app, opened my fresh experiment project in Python, and started coding... And now Cursor is my daily driver replacing VSCode :)
Over the weekend I have built this private UI for OpenAI chat API, entirely in Cursor, with ~90% of code being generated by GPT-4 Turbo.
IMO, the team behind Cursor (and there are just 6 of them!) managed to squeeze the most (1) out of prompt engineering and GPT-3.5/4 models and (2) frictionless integration into VSCode.
Coming next:
- A few words about AI Coding
- Getting Started with Cursor, Pricing
- Coding with Cursor
- Cursor vs Cody AI
- Why pick Cursor (or any other tool) over CoPilot?
- Why they forked VSCode?
A few words about AI Coding
Over the past year, I have tried quite a few AI coding assistants: GitHub Copilot, Amazon CodeWhisperer, Sourcegraph Cody, Google IDX, a few more VSCode extensions that required OpenAI key and allowed to have chat window in the IDE OR had editor integrations (inline chat and generation, refactoring or explanations of selected code etc.). Though I never settled on any of the options and kept using my very own and very basic cptX plugin for code generation/explanation in IDE or chatting to GPT in a separate (large) window.
Retrospecting I can list a few reasons why I reverted to very limited application of AI assistants:
- Ergonomics - small chat windows, list of numerous 'recipes' somewhere in the sidebar, jumping inline chat, overwhelming diff representation
- Lack of language support (hello to CodeWhisperer and Dart/Flutter).
- Stability - e.g. trial of Project IDX was not good, and wasted time troubleshooting and fixing their cloud IDE (and even after it was fixed AI part was mediocre). The same goes for other options, especially those smaller/open-source VSCode plugins that happened to be buggy.
- Lack of trust - AI generation in many cases was not reliable. The code produced in many cases could have small and hard-to-spot bugs. Tasks that required generating more than one small code snippet (e.g. go create unit tests from scratch) often were a dead end with me going and doing everything manually.
That's where AI coding worked for me:
- Code explanation in unfamiliar code bases - select a line of code and ask what it does OR ask about the current open file and you will likely get a very good breakdown of the code
- Polyglot workflows - with AI you can easily start coding in new languages or augment your projects with new capabilities requiring stepping outside of the primary language (e.g. Bash scripting for automation, YAML for CI/CD, etc.). If you have a good grasp of any of the programming languages it helps a lot. Transferring the learned concepts is not that hard. You will be able to efficiently communicate with LLM by giving problem statements and then providing feedback while fixing/iterating on the generated code
-
Small, self-contained, stereotyped tasks - if you see that a certain code block must be changed in a certain way - it is likely a good task for an AI assistant (e.g.
add docstring for the below function
oradd padding to the selected widget
). The same goes for small single-file tasks (e.g.generate a Python script to list all files in the directory and count their total size
). -
Semantic navigation over the code base - many tools now have the promise of "talk to your code base". What it means is that there's an embedding index and the ability to find code snippets that might be relevant to a specific request. E.g.
How do I change default window size
for a desktop app works ~70% of the time. More complex requests going into interdependent code with multiple files involved will likely not lead to a result.
Update, March 2024. I have been recently contemplating on the uses and here're a few items that can be added to the above list:
- Starting fresh - overcoming the "tyranny of the blank page"
- Saving on "low entropy key strokes" - using autocomplete to swiftly complete what seems to be a trivial ending to what you are currently typing
- Learning a new language - a mix of 'Code explanation' and 'Polyglot workflows'. I find LLM very useful when navigating a codebase in unfamiliar language just selecting certain lines and asking what this construct does in the language and what else is there to tell/demonstrate it
- Hyper-casual 101, exploring new areas or topics - if you know nothing of LLM fine-tuning, just ask strain away (and then proceed Googling try to find a more substantial work:)
Getting Started with Cursor, Pricing
Cursor happens to be a fork of VSCode with (almost) no differences from the original. The out-of-box experience was great for me. There was just one welcome screen that suggested importing all my current VSCode plugins and settings and it took me less than a minute to get started.
The IDE was already configured to use Cursor provided GPT-4 model with 50 GPT-4 requests. Very soon I ran out of the free tier. Next, I could purchase a subscription for Cursor:
And btw, the prices do not look competitive when compared to GitHub Copilot:
Yet (unlike GH CoPilot) Cursor allows using 3rd party models. I switched to GPT-4 Turbo (November version with 128k context) deployed in Azure and used it.
Coding with Cursor
Cursor provides a familiar VSCode experience with the only distinct UI feature being the horizontal toolbar replacing the original vertical one (which can be changed via settings).
What I liked about Cursor UX is its simplicity and efficiency. There are 2 shortcuts (I'll be referring to macOS) that enable AI workflows:
-
Cmd + K
brings up an inline chat popup (in the editor or Terminal) -
Cmd + L
toggles the left sidebar with Chat inside IDE (and also putting there any selected text/code)
And those 2 actions are more than enough to reach every needed feature.
Features:
-
Chat in the right sidebar -
Cmd + L
or using "Toggle AI pane" small button in the top right corner- Using the current file as the context (default mode)
- Using relevant code snippets found in the codebase -
Cmd + Enter
- Interpreter mode that can execute multiple steps and make changes to multiple files/parts of the file - available through top right drop-down, must be enabled first via "More" tabs
- AI reviewer - scan the Git diff and get AI eval, must be enabled first via the "More" tabs
-
Inline chat - put a cursor in the editor (or select a code block), hit
Cmd + K
, write instructions, and get the code inserted/changed. Why is it called chat? Cause you can write follow-up instructions and iterate over the code. -
Terminal chat - works the same as Inline chat, this time you can use
Cmd + K
while using Terminal in the bottom pane- macOS users who like their terminal clean might struggle with the
Cmd + K
shortcut being overridden (by default that is how you clear the terminal)
- macOS users who like their terminal clean might struggle with the
- *Own OpenAI model *(OpenAI or Azure) - Hit the "Gear" button at the top right corner and scroll to the "Advanced" section
- It doesn't seem any self-deployed/private models (not reachable from the internet) can be used as Cursor uses its' own backend to make calls to the model
- Smaller horizontal toolbar replacing VSCode vertical one
- Can be reverted to default VSCode view via "workbench.activityBar.orientation" configuration (set to "vertical")
Missing Features:
- No AI autocompletion as you type (as the original CoPilot worked)
- VSCode Dev Containers are broken (although available through UI)
Cursor vs Cody AI
Back in the Summer of 2023, I thought that the next-gen AI coding assistant was the one that understands the code base, can navigate, and form knowledge of solution structure and its contents. That's how I ended up trying Cody by Sourcegraph. The marketing materials promised to leverage code understanding, before the ChatGPT made the headlines Sourcegraph had built their business around code parsing/search tech and it seemed like a natural integration.
The code understanding capability didn't turn out to be a game changer. And I don't think Sourcegraph's code search made any difference and they ended up using embeddings and RAG (which is the most popular approach to enriching LLM with custom data).
Cody is available as a VSCode extension (as well as for a few other IDEs) and in terms of features it is very close to Cursor, the major differences being the LLM it bundles (it uses Anthropic's Claude models) and the autocomplete feature as in the original GH CoPilot.
I decided to test the code understanding side-by-side with an early version of this project.
Using chat I asked 2 questions:
- List all files in the solution
- Please evaluate their contents and summarize their purpose and role in the solution
And Cursor was a clear winner:
- Cody missed some of the files, hallucinated, and made up the non-existing file
model.py
- Cursor listed all important files (stumbled on non-crucial
.env.template
) - The summary provided by Cursor happened to be light years ahead, it did catch the gist of the solution and provided valid explanations for each of the files. Cody produced some generic text of zero value
- One nifty feature of Cursor - you can click on the file name in chat and navigate to it in IDE, seems a trivial and important feature, yet for some reason, it is missing in Cody
Screenshots:
This single use case can't be representative of all use cases, there can be excuses for Cody, such as using Claude 2 (which is inferior to GPT4)... It is really hard to consistently evaluate AI assistants in developer workflows.
Yet after using for some time both products (and I am not talking only about codebase search) I ended up with the impressions I shared at the beginning of the article (squeezing the most out of prompt engineering, frictionless integration into VSCode). It is the attention to detail, the minimal UX, and the level of execution that puts apart Cursor. Not just from Cody but from other AI assistants I tried as well.
Why pick Cursor (or any other tool) over CoPilot?
It is a hard sell. With almost the same price GitHub CoPilot is a big name. It is the pioneer of AI-assisted coding, backed by Microsoft which has exclusive access to SoTA models by OpenAI, with resources/data/talent they are positioned for the best innovation in AI coding better than anyone else.
With the above facts, it is easy to think that GitHub Copilot is superior in its core capability - quality of code generation. And as an outcome, it should be the best tool for ramping up engineering productivity. Yet if you try to research the subject, you won't find a lot of (1) comprehensive side-by-side comparisons of AI coding assistants and (2) research results of GitHub Copilot's influence on productivity.
Speaking of Copilot's effects on productivity I have come across this study suggesting that CoPilot significantly help with small/isolated/boilerplate tasks - which is not quite representative of typical dev workflows. The most recent one focuses on developer satisfaction. It was touched upon at GitHub Universe'23 with a Shopify case study presented. And frankly, it sounds like the guys gave up on finding any hard evidence on productivity effects and ended up with the "GitHub Copilot makes developers happier" argument.
If even the big players say it is the UX that matters... I can name a few reasons why one might prefer Cursor over CoPilot. If you use VSCode and don't need any other IDE (CoPilot supports multiple IDEs) the advantages of Cursor would be:
- Frictionless start, no credit card is needed, great for beginners who want to try AI coding
- It has free quotas for GPT3.5 and GPT4 requests
- It can use your OpenAI or Azure OpenAI endpoints for inference
- The UI/UX might be better for your taste
- I don't like how CoPilot is loaded with multiple options/recipes, Cursor is cleaner
- I don't like CoPilot's side-by-side diff representation
- It has an experimental feature "Interpreter mode" which allows to use Cursor in an "agentic" mode when the IDE tries to do multiple steps/updates - that might suit your workflows
- I tried the feature, didn't quite work for me, though I can see how it can be a good productivity booster for some stereotyped flows
Btw, that's the example of diff generated by CoPilot inline chat:
And speaking of core capability - code generation. My subjective opinion is that 90% of quality comes from the model used and 10% from the implementation. Most of the assistants out there are somewhat the same in this respect. We have yet to see new models introduced (cheaper, faster, with bigger context windows and fewer hallucinations) and new ways to work with them (agents equipped with tools, etc.).
There are many examples of how big corporations hinder innovation and are slow to adopt new ideas. With Microsoft/CoPilot it took ~ 9 months between the introduction of GPT-4 powered CoPilot X (March of 2023) and the release of GitHub Copilot Chat (December). In summer VSCode Marketplace was loaded with decent extensions integrating Chat and GPT 3.5/4 into the IDE. The next big thing in AI-assisted engineering might likely come from smaller and more agile teams.
Why they forked VSCode?
There are plenty of AI coding assistants available as extensions in VSCode Marketplace. I can speculate that the reason for that is the limitations imposed on extensions. E.g. showing a custom pop-up with rich UI, placing arbitrary view at the right sidebar, showing a popup in Terminal (and being able to read the output there) - those I the few of the constraints I can name (as someone who had created VSCode extension in the past)
The developers made a tradeoff between the burden of maintaining the fork in sync with the upstream VSCode (and eventually having some of the features broken) AND the freedom of altering UX and being able to implement the most nuanced integrations.
And IMO that is one of the reasons behind the overall good impression and satisfaction with Cursor's UX.