Computer Assisted Translation: Meaning and Benefits
A good CAT tool will work as the translator’s best friend/assistant. A good CAT tool will typically:
- Provide translators with an environment where it’s easy to see the source text and the translated, typically side-by-side
- Remind translators of previously translated content
- Allow translators to search through previously translated content
- Remind translators of key terminology
- Remind translators of missing formatting and other components
- Preview how the translations are rendering
- Record newly translated sentences to the translated sentences
- Record new key brand terms
CAT tools when properly configured can more than double translation productivity while also enhancing translation quality. The key thing here is how a CAT tool gets set up. We need to examine: file parsing, segmentation, translation memories, glossaries, machine translation, and dozens of other key factors.KEY TAKEAWAY: The CAT tool is like a saw. Give the saw to someone who doesn’t know woodworking and you won’t see beautiful furniture. It’s more likely they’ll get hurt.So in this light, let’s analyze some of the key issues around CAT tools.
Issue 1: Poor Parsing/Segmentation
When you feed a file onto a CAT tool, it will process that file, chewing at the text, and stripping away all the encoding so that the translator has something “clean” to work with. XML, XLIFF, DOCX, YAML. Regardless of the file format, the overall process is the same. The challenge is that some files are written in ways that create messy outputs for translators that can become ultimately impossible for them to handle. Formatting can become ubiquitous tags that require careful handling, variables and code may show up as text, line breaks can incorrectly inform sentence breaks and hand the translators an untenable situation to handle. This happens more often than people realize in localization and it’s the first myth-buster. The CAT tool won’t fix everything. In fact, it will open the hood and introduce even more complex problems onto your localization workflow despite the potential for far greater productivity. Without proper localization engineering, the CAT tool can aggravate segmentation and parsing issues that would otherwise be negligible outside the CAT tool environment.
Issue 2: Translation Memory Setup
Clarity around how to set up your knowledge base will be a determining factor in the success or not of your CAT tool experience. When it comes to Translation Memory in my opinion less is more. I often see clients and translators trying to squeeze the most of translation memories by stacking up several translation memories together and trying to maximize the amount of content that gets leveraged during the translation process. The challenge with this is that often users are unsure of the quality of a given translation memory. Sometimes they know that the quality is iffy and they will apply penalties to that translation memory. A penalty will downgrade a match by a certain amount so that otherwise 100% segments get reviewed and fuzzy matches get downgraded to a lower range. While this is fine in principle in practice it creates an error-prone process for translators. Translation memories are meant to establish the true north as far as linguistic corpus goes. If the translator would naturally use a certain choice of words but the TM uses another language to describe the same concept, the TM should always prevail. Working with shaky TMs introduces doubt and confusion into the translation process. Yes, in theory, you can leverage more and save more time and money with a larger translation memory but we have seen time and time again that translation memories are an all-or-nothing deal. Either they are crystalline benchmarks or they detract from the overall translation process quality.
Issue 3: Multiple linguists working together
Many CAT tools do not focus on the collaboration between different translators working at the same time on a given set of files. With translators working in a local environment through exported localization kits, translators are working in the dark when it comes to linguistic choices made by peers. This can lead to inconsistencies and poor knowledge sharing which in the end over-burdens the review process with the task of standardizing translations. Reviewing maximizes quality when it is about re-reading, flagging, and fixing mistakes. As the scope increases into re-writing, the odds of introducing new mistakes rather than catching errors increase. CAT tools that share translation memory feeds in run-time with translators in different locations contribute greatly to optimal knowledge management practices and overall quality at scale.
Issue 4: Glossaries
With terminology, you run into issues that are analogous to the translation memories or knowledge management in general: less is more. We have observed use cases with humongous glossaries reaching into the tens of thousands of terms and we have observed use cases that keep glossaries down to a few hundred terms we can say with certainty that smaller glossaries contribute to greater overall translation program quality.For knowledge management to work properly, it needs to be verifiable. If you have a huge terminology list, when you run a terminology check you will get hundreds of false-positive alerts. For example, the term is listed in singular form but gets translated in plural form due to the context. The more terms, the more noise, and the more the process becomes noisy the things that should get verified are harder to verify. That’s why it’s important to trim glossaries down to the things that matter: brand-specific concepts, product names that should not get translated, niche-specific concepts that require a certain level of standardization, or SEO-sensitive terminology. Terms that are nice to have should not be a part of a glossary. The glossary should contain only must-have terms in order to effectively guide and manage the overall translation process.
Issue 5: Machine Translation
Some people think of machine translation as the holy grail in translation while others think of machine translation as the root of all evil when it comes to translation quality. Based on our data, we can verify that most top machine translation engines such as Google, Microsoft, Amazon, and Deepl all produce more reliable translation feeds than a 50-74% match coming from a Translation Memory. This reliability tends to continue to increase over time with machine translation feeds requiring fewer edits in order to emulate professional human translation quality. Machine translation is key to CAT tool productivity. Translators that are working with a hybrid knowledge management feed that includes both translation memory and machine translation can experience productivity gains of over 30% compared to translators that are only using translation memory.But there is the argument that machine translation is great for technical texts but terrible for marketing or copy-writing. And while that is true, we argue that even in cases where the machine translation is grotesquely mistranslating a given sentence or concept, it’s still beneficial to the overall translation quality as it creates a dialogue with the translator who can use the machine translation as an idea bouncing board or at worst, a source for a few good laughs.
Even the most amazing CAT tool will not solve problems magically by itself just like a saw and a hammer will not build a beautiful chair by themselves. The people that are using the tools and their knowledge of how to handle these tools play a critical role in the overall success of that particular tool. While a great tool opens the path to greater productivity, savings, and overall quality, expecting that you will get this without the proper mastery in using that tool is likely a belief that will lead you into troubled waters quickly.