Prompt Engineering is Dead– and We’re Glad We Never Did it

Lilly - Mar 30 '23 - - Dev Community

In the past, generative models like GPT-3 often required significant tinkering with the input (prompts) in order to generate the desired outputs. This was because previous versions of the model were not always consistent in their responses, and required careful crafting of the input in order to produce the desired results. This type of tinkering is called “prompt engineering,” which is a skill that humans have been developing to learn how to work with these models, but it might not be necessary anymore.

Prompt engineering isn’t necessary anymore

Newer generative AI models such as Midjourney v4 and ChatGPT have significantly improved in this regard, and no longer require extensive changes to the prompts in order to generate consistent outputs. Entire companies have been built around the concept of high-quality prompt engineering, such as Copy.ai and Jasper. The UIs for these products often focus on disentangling conversational language to prompt “tricks” and “hacks”.

Companies building on top of these foundational generative models must ask themselves: what is our competitive moat? If the underlying technology is accessible and cheap, what differentiated value does a business build on top of it?

The accuracy (or lack of) in Large Language Models (LLMs)

The natural weakness of these models is that they always require extensive data in order to become good at generating outputs. For example, a model like ChatGPT is trained on data up until 2021, so it is not aware of anything on post-training dates.

A screenshot where we asked ChatGPT about Contenda

The biggest problem with LLMs is verifying accuracy. As these models get better at writing as a skill, it becomes harder to identify misinformation. Often, the results are something that sort of sounds right (tools like Grammarly and LanguageTool will be happy with the output), but anything deeper than surface level reveals their inaccuracy. This leads to a lot of “SEO-fluff” being outputted to get more hits and eyes in your content’s direction, but doesn’t actually provide value to your end users.

What does it look like to avoid these issues?

At Contenda, we don’t do prompt engineering, because the “prompts” we get are from the existing content that our users input. We’re labeled as a generative AI company, yes, but our tech stack focuses on Content CI/CD. We test, monitor, and alert for inaccurate information. We make it faster and easier for people to confidently publish high quality content.

When ChatGPT was released, our team tried it right away to see how it might improve our outputted content. And, because of our approach to reuse and regeneration of existing content, we actually didn’t see enough improvement to change our approach. This was exciting for us, and why we feel strongly that this is the strategy for our path forward. Companies working in the generative AI space have to ask themselves: are they relying on the limitations of existing LLMs to succeed? How do you future-proof and scale your content accuracy so that it is providing value to those seeing it?

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .