Best Practices

GPT-4 vs. GPT-3. OpenAI Models' Comparison and Potential when it comes to Translation

ChatGPT has changed perceptions around the globe regarding the capabilities of artificial intelligence. In broad terms, it went from being clumsy and robotic to being humanlike in an uncanny way.
Gabriel Fairman
3 min
Table of Contents

ChatGPT has changed perceptions around the globe regarding the capabilities of artificial intelligence. In broad terms, it went from being clumsy and robotic to being humanlike in an uncanny way.

How does this impact possibilities in the translation sphere?

Are humans no longer necessary?

Can I just process everything through ChatGPT and never spend another dime on my translations?

This article will explore the potential and limitations of translations offered by large language models.

Want to talk to ChatGPT?

Talking to a generative pre-trained transformer such as GPT3/4 or other similarly capable ai chatbots can be both illuminating and frightening. An answer to your prompt can result in surprise and user dismay. GPT 4 is capable of passing the BAR exam in the 90th percentile and also capable of huge blunders such as fabricating incorrect facts or accepting illogical arguments.

It's all contingent on parameters and training data, but in English GPT 3 and 4 clearly show the potential to shake how we go about our jobs and professional jobs.

But how does it perform in other languages?

How does it work with more context or less context?

Is GPT 3 significantly different from GPT 4?

Machine translation but without a machine translation system

The history of machine translation has clearly shown the limits of technology in our world. When the ideas were first introduced back in the 80s, some were incredibly bullish about not requiring a human anymore to generate accurate translations.

It was just a matter of evolving different statistical and rule-based models in order to create translations that sounded how a human would naturally speak.

But that was clearly not the case. Models improved, and access to knowledge improved, but language just proved to be too nuanced and riddled with exceptions so that engineers could explain it to a machine. Models were not based in a prompt-and-response dialogue but on a single output for input in English text or any other language encompassed by the engine.

Choosing a Model and Estimating a Translation Cost

While choosing a model may seem like the first logical step to translate with gpt 3 for example, it's quite possible that it should be the last step in your translation journey.

You can work for instance with plain text and simply insert it into a gpt 3 or work with openai's api in order to import and export your content for example but that will overlook some of the key challenges around translations.

The first obvious one, for example, is the loss of formatting. As you dig deeper into large language models and other artificial intelligence systems the importance of knowledge management also becomes more critical.

How will you manage past translations?

How will you fine-tune your engine so that it takes into account your terminology or your SEO?

AI chatbots however powerful they may be were not designed originally as translation engines. GPT 3 translation is amazing in so many contexts. It can often present its users with translations that seem human and can sound even better than the English text or any other original language but by itself, it will be nearly impossible to operate at scale or recurringly.

A translation management system that is deeply integrated to artificial intelligence is needed in order to ensure that you extract the key benefits presented by gpt 3 or 4.

Want to talk to, the latest AI by the creators of Google LaMDA?

Chatbot conversations such as powered by Google's LaMDA clearly show how it's possible to have a human-like conversation without any specific context. It's possible to talk to a character that you either build or preselect in a way that seems genuinely human at times.

It's important to talk about LaMDA because often people have a limited perspective on how many systems are out there. Here are a few examples of others:
BERT: Bidirectional Encoder Representations from Transformers, developed by Google AI

  • XLNet: Generalized Autoregressive Pretraining for Language Understanding, developed by Carnegie Mellon University and Google AI
  • XLM-RoBERTa: A Robustly Optimized BERT Pretraining Approach, developed by Facebook AI
  • Cohere: An LLM that can be fine-tuned for specific domains and tasks, developed by Cohere AI
  • GLM-130B: Generative LLM with 130 billion parameters, developed by Huawei
  • Meta LLaMA AI: Meta LLM Architecture with 1.6 trillion parameters, developed by Meta AI
  • Chinchilla: A large language model that can generate natural-sounding text in multiple languages, developed by LG AI Research
  • LaMDA: Language Model for Dialogue Applications, developed by Google Research
  • PaLM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation, developed by DeepMind

GPT 3 was the first model that managed to break the public perception that artificial intelligence was something just for engineers. Gpt 3 was the first language model that showed the world that science fiction isn't that much fiction after all.

Input types in GPT-4 and GPT-3

When it comes to translation, a user can input text in the chat gpt 3 or 4 interface (they ae the same user interface) and get back translated text when preceded by a translation prompt. A user is limited to plain text though.

Any formatting will be lost and the only way to preserve that is by working with a translation management system such as BWX that natively parses files so that information can be exchanged with GPT 3 and 4 without any loss of formatting or other meta-textual information.

Fine-tuning of the OpenAI models

The quality of output of a translation is based on the sophistication of the language model mainly based on parameters and size of training data. But it is also based on the quality of the prompt as well as the additional context provided by a person (human user).

The more guidance the language model receives, the better gpt 3 or 4 is able to power words that make sense to the reader.

If you describe in your prompt for example that you want the translation to sound informal and catchy, gpt 3 will "talk" back in a corresponding way and take the freedom to write its response in ways that resonate with your prompt.

That is why at Bureau Works our machine learning engineers squeeze the best out of zero-shot and few-shot learning.

Without getting too much in detail as to the mechanics of machine learning types and its ability to interpret messages and respond in a way that makes sense for its users, at Bureau Works we have focused on building responses from the ground up.

AI must be able to write messages that are contextualized within the users' ideas and purposes.

If the user for instance wants the translation to sound natural even if it deviates semantically from the original so-called English text, the response may seem strange but it may be perfect for how a person on the internet should react when reading the output generated by the ai.

BWX is natively capable of holding an interaction with gpt 3 or 4 as well as other ai models that will take into account:

  • past translations stored in your translation memory
  • terminology
  • domain
  • formatting/tagging

Through this fine-tuning, we are able to create an environment in which the system can process data and provide an answer that is more aligned with expectations within the context of that interaction.

A fine-tuned message to gpt 3 or 4 will generate words that describe similar concepts with a twist that makes all the difference. Take a look at the following example:

Original Summary:

Bureau Works is a cloud-based translation management system that transforms complex translation and localization management into simple and predictable activities.

Now with Apple Twist:

Bureau Works is a powerful translation management system that lets you handle any translation and localization project with ease and confidence. It’s cloud-based, so you can access it anytime, anywhere.

Now with Kurt Vonnegut Twist:

Bureau Works cares about translation and localization. We think you should too. It’s not easy to reach a global audience with your message. But we can help you do it better and faster than ever before.

Now with Gen Z Twist:

Bureau Works is lit. It’s a system that helps you translate and localize your stuff online. You can use it for anything: websites, apps, emails, documents, etc. You can start for free or choose a plan that fits your vibe.

It's through these responses that gpt 3 is able to talk in many different ways about the same thing. It's the nature of writing. The same concept can be expressed in countless different ways and each way has the ability to evoke different human emotions regardless of whether they were conceived by an ai or not.

GPT-4 vs. GPT-3 model's capabilities

GPT4 is a natural evolution of GPT 3. GPT 4 has about 570 times more parameters than GPT 3 - 175 billion vs. 100 trillion. This results in more consistent dialogue and fewer errors when comparing the different ai models. GPT 4 can also analyze visual input. Both models were not trained to translate but both of them can write incredibly naturally sounding texts word by word.

Defining the context of GPT-4 vs. GPT-3 conversation

Context refers to all the information that informs the text but is not the text itself. In a web of meaning, context, refers to all of threads that support the web as a whole whereas text is the central part of the web.

When it comes to AI, and meaning in general, context changes everything. If I tell the ai for instance that up is down and down as up as context for example and then tell the ai that I am on the seventh floor and want to go to the eigth floor, the ai will expmain to me that I must go down.

The AI will say this because I have given it context that guides how it approaches the text. This is an extreme example of context but in translation conversations with an ai model such as gpt 3, I can define the context as certain words that should be used or avoided, certain tones that should be followed such as formal or informal, or information about the overarching goals for the text that gpt 3 should take into account.

Setting Up the API

When it comes to working with gpt 3 or 4 through BWX, there is no need to worry about setting up the API. BWX has a native API with access to gpt 3 endpoints and a proprietary framework that allows information in this process to be exchanged with gpt 3 or 4 in the most effective way possible.

The beauty is that our API also leverages other ai models other than gpt 3 such as LaMDA and LLaMA. Our API uses a combination of machine learning models in order to maximize relevance of gpt 3 and 4 writing regardless of the prompt used.

Want to use GPT-3?

Using gpt 3 or 4 with BWX is very straightforward. Our web integration is enabled as a default setting in our web app. This article only explores the basic overview and capabiltiies of our gpt 3 web integration but you can find more information in case you are interested in our web support library at

You can sign up for a free trial and of course you can cancel at anytime before the end of your trial period and you will not be charged.

GPT-3 vs. GPT-4 – Key takeaways

As far as GPT 3 vs. GPT 4 applied to translations we have not been able to notice dramatic deviations in writing examples. Each model is capable of producing relevant conversations.

We have a 55 page report filled with examples by gpt 3 that can be downloaded here in case you are interested and we are working on a new report that will compare in detail how each model performs.

As soon as this research preview is available we will notify you through our newsletter in case you are interested. Each model regardless of how it is trained whether through few shot, zero shot, or other kinds of machine learning methods, is capable of producing feeds that can at times startle internet users due to their uncanny human nature.

According to OpenAI, GPT 4 performs dramatically better than GPT 3 in certain scenarios but as far as writing goes, the level is relatively similar. More to come when we publish our detailed report comparing each model.

Want to use Jurassic-1, the largest and most sophisticated language model ever released for general use by developers?

According to Bing, "Jurassic-1 is the name given to a couple of auto-regressive Natural Language Processing (NLP) models developed by Israel’s AI21 Labs1. Jurassic-1 models are highly versatile, capable of both human-like text generation, as well as solving complex tasks such as question answering, text classification and many others23.

Jurassic-1 models come in two sizes, where the Jumbo version, at 178 billion parameters, is the largest and most sophisticated language model ever released for general use by developers23."

GPT 4 is an ai model with nearly 3 times more parameters than Jurassic-1 JUMBO. Jurassic-1 used to be larger than gpt 3 but it was surpassed by gpt 4. Being larger than gpt 3 did not mean that it was able necessarily to produce better results.

It was the trained gpt 3 model afterall that made the world open its eyes to the true web and real world applications of an ai model. As far as integrations go, our current state and projections point to BWX integrating with multiple models in order to extract the best that each model delivers.

Cost of using GPT-4 vs. GPT-3

For information on Open AI API costs please visit:

As far as BWX goes there is no difference in working with GPT 3 vs. GPT 4. Our API naturally routes content through the most effective model from a cost-benefit perspective.

This article does not go into technical detail about how our integration with gpt 3 works but it can be found in the web in our support library here:

GPT-4 is currently more expensive than GPT-3 as this model uses far more computing power in order to produce its responses.

GPT-3.5 Turbo currently costs $0.002 / 1K tokens whereas the 32k GPT 4 prompt model costs $0.06 / 1K tokens$0.12 / 1K tokens, approximately 30 times more.

However the BWX plan for Translators includes the GPT integration as part of the base price that starts at $9 per user with a maximum processing of 200,000 words per user per monthly cycle.

Zero-Shot Translation

Based on our studies so far, zero-shot translations do not produce consistent and reliable results. As amazing as gpt 3 may be as a pre-trained language model, translating without additional context produces mixed results.

We find better results in either highly trained environments or those driven by few-shot learning.

With zero-shot translation gpt 3 or 4 are both likely to produce inconsistent results. As this article has shown, they are both trained by web content and will produce different outputs for the same input and prompt ocasionally by design.

Translation is all about leveraging and building on knowldge bases in order to promote brand consistency, seo, intelligbility and other requirements for user acceptance.

Want to build it yourself?

Building an integration with GPT 3 is relatively simple. The API is well documented and the endpoints perform well. There are challenges around the rate limits, text size, and other constrainsts around the API architecture. But most importantly, building out an integration with GPT 3 or 4 is the easy part.

With the proper context GPT 3 or 4 are both capable in theory of creating the integration themselves. The challenging part is that the integration can get you text in and text out but it will have clear limits around preservation of formatting, maintaining terminological consistency, referring to material translated in the past as reference, learning from user input and other scenarios that are material to producing high quality translations at scale. So can you do it yourself? Certainly.

Will it be effective and scalable? Doubtful. One of the key learnings so far from GPT 3 and 4 is that as powerful as they are, as human like they may sound, they require guidance, structure and the right elements in order to deliver consistent peak performance. That's why we recommend you try BWX for free for 14 days and see for yourself the magic you can extract from such a deeply woven integration with GPT in an environment that was designed from the ground up to produce high quality translations at scale with minimal human intervention.

Gabriel Fairman
Founder and CEO of Bureau Works, Gabriel Fairman is the father of three and a technologist at heart. Raised in a family that spoke three languages and having picked up another three over the course of his life, he has always been fascinated with the role language plays in identity and the creation of meaning. Gabriel loves to cook, play the guitar, tennis, soccer, and ski. As far as work goes, he enjoys being at the forefront of innovation and mobilizing people and teams together toward a mission. In recognition of his outstanding contributions, Gabriel was honored with the 2023 Innovator of the Year Award at LocWorld Silicon Valley.
Translate twice as fast impeccably
Get Started
Our online Events!

Try Bureau Works Free for 14 days

ChatGPT Integration
Get started now
The first 14 days are on us
Free basic support