The Evolution of Language Models: From T9 to GPT-3+

Bala Madhusoodhanan - Oct 16 '23 - - Dev Community

Intro:
Organization around the globe are now in the process of testing this novel technology - transformer model , with the anticipation that it has the potential to transform sectors such as media, finance, legal services, professional industries, and public services like education. This blog talks about a brief history , how happens under the hood and Challenges and Controversies.

History:
A language model predicts the likelihood of a given text appearing based on a probability-driven collection of text, often referred to as a corpus model.

Image description

Image description

What happens under the hood:

Transformers are like super readers. They look at a whole bunch of words at the same time, not just one word. This helps them understand the big picture and find patterns. It also makes them better at translating or creating text. Because they're so good at this, they can learn quickly and work better and faster

Would recommend this article to understand in detailHow

Challenges and Controversies:
"Form vs. Meaning" in the context of language models relates to the idea that these models, like the ones based on neural networks (such as transformers), are excellent at processing the form or structure of language, but they don't truly "understand" language in the way humans do. Language models generate text based on statistical probabilities and patterns, but they don't "understand" the content they're working with (ability to grasp concepts, emotions, or nuances in the way humans do).

  • Environmental Costs : All these model would need compute and have direct impact to Co2 emission. If the problem could be solved through classic NLP recommend that approach.

  • Financial Cost: Compute GPU would have direct $$$ cost. Encourage the team to report training time and sensitivity to hyperparameters

  • Risk associated with dataset behind the foundation model. Bias inheritance due to the quality of dataset (Wikipedia - only 10 to 16% are representation of fact associates with women; Reddit user base is male dominate ~70% ; Skewed data as Gen Z data > Millennial > Gen X)

  • Difficult to align the values for the model

  • Denigration, stereotype threat, hate speech: harms to reader, harms to bystanders. Synthetic text can enter conversations without anyone being accountable for it.

Further Read:
Bender Parrot Research
Guide for LLM

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .