Best Practices

Translation Impact Chat GPT-4 Vs. 3.5

Open AI has released GPT-4, an upgraded generative AI that outperforms GPT-3.5 and other models in 24 out of 26 languages, including low-resource languages. Stay tuned for more information on GPT-4's performance in other languages. Check out our article
Gabriel Fairman
3 min

Open AI just released GPT-4, an upgraded version of its state-of-the-art generative AI.

According to Chat GPT itself, the difference between the models is:

“I am an advanced AI language model with a larger dataset, more parameters, and better fine-tuning, which enables me to deliver more accurate and relevant responses. Despite these improvements over GPT-3, I still have limitations and may not provide perfect answers.”

In plain language it’s bigger, better and meaner. What does this mean in terms of translation and localization? According to Open AI Research:

“Many existing ML benchmarks are written in English. To get an initial sense of capability in other languages, we translated the MMLU benchmark—a suite of 14,000 multiple-choice problems spanning 57 subjects—into a variety of languages using Azure Translate (see Appendix). In the 24 of 26 languages tested, GPT-4 outperforms the English-language performance of GPT-3.5 and other LLMs (Chinchilla, PaLM), including for low-resource languages such as Latvian, Welsh, and Swahili:”

In simple terms: Imagine you are really good at taking tests in school. You have always done well on tests in English, but you want to see if you can do well on tests in other languages too. So, you take a big test with 14,000 questions in many different subjects like math, science, and history.

To make it even harder, you take the test in different languages using a special tool that translates the questions into other languages for you. You try it in 26 different languages, but only 2 of them don't work well.

Surprisingly, you do even better on the test in these other languages than you did on the same test in English. You even do really well in languages that aren't used very much, like Latvian, Welsh, and Swahili.

That's kind of what happened when they tested a computer program called GPT-4. It's really good at understanding questions and answering them, even when they are in different languages. It did better than other similar programs like GPT-3.5 and Chinchilla, even for the harder languages.

Let’s compare a few real world examples in a few languages.

English Brazilian Portuguese GPT-3 Brazilian Portuguese GPT-4
The cat's out of the bag. O gato está fora da bolsa O segredo foi revelado.
Let's call it a day. Vamos encerrar por hoje. Vamos encerrar por hoje.
I'm head over heels for him. Eu estou apaixonada por ele Estou perdidamente apaixonada por ele.
I get the hang of it. Eu entendi Estou pegando o jeito.
Hit me up. Me avise ou "Me ligue" Me chama.
It really took the wind out of my sails. Realmente me desanimou ou "Realmente me deixou sem fôlego" Isso realmente tirou meu ânimo.
They are two peas in a pod. Eles são dois pombinhos Eles são como duas ervilhas em uma vagem.
It takes two to tango. É preciso de dois para dançar o tango ou "É preciso de duas pessoas para dançar o tango" Um é pouco, dois é bom.
It's a piece of cake. É moleza É moleza.
That costs an arm and a leg. Isso custa uma fortuna Isso custa os olhos da cara.

A quick analysis of this limited sample of 10 idiomatic sentences in US English reveals no major differences between the output of GPT-3 and GPT-4. This demonstrates that:

GPT-3 was already quite impressive to begin with.

One must pay close attention to details in order to discern the differences between GPT-3 and GPT-4.

When examining the subtle differences between the two, GPT-4 was in one case significantly better than GPT-3 but also notably worse in another instance. In most cases, it was either identical or slightly better.

The concepts that applied to GPT-3 hold even truer for GPT-4. With more parameters and data, prompts become increasingly important. The slightest variation in prompt suggestion can yield substantially different results. As the AI model becomes more refined, greater precision is required from the user.

Regarding other languages mentioned in OpenAI's research, we will soon conduct our own investigation into their performance and return with findings about translation quality and opportunities in the context of GPT-3 vs. GPT-4.

ChatGPT Paper
Download our 55-page study for an in-depth translation comparison between ChatGPT and the top machine translations in the market.

Webinar Series about ChatGPT
Interested in delving deeper into the applications of ChatGPT in the translation industry? Join our webinar dedicated to ChatGPT

Try Bureau Works Free for 14 days

ChatGPT Integration
Get started now
The first 14 days are on us
Free basic support