GPT-3 Explained in Under 3 Minutes

OpenAI GPT-3

So you’ve seen some incredible GPT-3 demos on Twitter (if you haven’t, where have you been?). OpenAI’s massive machine learning model is capable of writing its own op-eds, poetry, essays, and even working code:

This is mind blowing.

With GPT-3, I built a layout generator where you just describe any layout you want, and it generates the JSX code for you.


- Sharif Shameem (@sharifshameem) July 13, 2020

Here’s #gpt3 writing some SQL for me.

- Ayush Patel (@ayushpatel34) July 19, 2020

=GPT3()… the spreadsheet function to rule them all.

Impressed with how well it pattern matches from a few examples.

The same function looked up state populations, peoples’ twitter usernames and employers, and did some math.

- 🍉 Paul Katsen (@pavtalk) July 21, 2020

To use GPT-3 right now, you must first seek to be whitelisted by OpenAI. However, the model’s possibilities seemed to be limitless-you could presumably use it to query a SQL database in plain English, automatically comment code, automatically generate code, make hot article headlines, post viral Tweets, and much more.

But what exactly is going on behind the hood of this remarkable vehicle? Here’s a (short) peek inside.

GPT-3 is a language model based on neural networks. A language model is a model that predicts how likely a statement is to appear in the real world. A language model can, for example, categorise the statement “I take my dog for a walk” as more likely to exist (i.e. on the Internet) than the statement “I take my banana for a stroll.” This holds true for both sentences and phrases, as well as any sequence of letters in general.

GPT-3 is beautifully trained on an unlabeled text dataset, as do other language models (in this case, the training data includes among others Common Crawl and Wikipedia). Words or phrases are removed at random from the text, and the model must learn to fill in the gaps using only the context provided by the surrounding words. It’s a straightforward training exercise that yields a powerful and generalizable model.

The GPT-3 model is a transformer-based neural network in and of itself. The prominent NLP model BERT and GPT-3’s predecessor, GPT-2, are based on this architecture, which gained popularity roughly 2–3 years ago. GPT-3 isn’t really innovative in terms of architecture! So, what makes it so enchanted?

IT’S VERY Huge. I’m talking about huge. It’s the largest language model ever constructed, with 175 billion parameters (an order of magnitude more than its nearest competitor! ), and it was trained on the largest dataset of any language model. This appears to be the primary reason GPT-3 sounds so intelligent and human.

But now comes the truly wonderful part. GPT-3 can accomplish what no other model can (well): do certain jobs without any extra tuning thanks to its enormous size. You can ask GPT-3 to be a translator, a programmer, a poet, or a famous novelist, and it can do it with fewer than 10 training examples provided by the user (you). Damn.

This is why machine learning practitioners are so enthusiastic about GPT-3. Other language models, such as BERT, necessitate a lengthy fine-tuning process in which you collect thousands of samples of French-English sentence pairings in order to educate it how to translate. To adapt BERT to a given task (such as translation, summarization, or spam detection), you must first locate a big training dataset (on the range of thousands or tens of thousands of examples), which can be difficult or impossible depending on the task. You won’t have to conduct any fine-tuning with GPT-3. This is the crux of the matter. People are thrilled about GPT-3 because it allows them to create new language problems without having to use any training data.

Today, GPT-3 is in private beta, you can get the API key by applying here:

Originally published at on June 6, 2021.

Data Science Enthusiast