If you are considering switching TMS (Translation Management Systems) and someone told you that “it’s easy! Just download your TMX and TBX files and upload them onto your new TMS!” – chances are that you are in for a ride. Switching TMSs although ostensibly simple opens up a pandora’s box of pain. But in this article we will focus on my favorite source of pain in terror when it comes to TMS migration stress: translation memory leveraging and adoption.

The biggest issue resides in Translation Memory Leveraging. This is industry lingo that refers to how the sentences stored in your knowledge-base compare to newly processed content. Translation memories generate tremendous amounts of savings and time efficiencies. That’s one of the reasons why clients are willing to invest bullishly in enterprise-grade TMSs. 

If you translated “The cat is on the couch.” and now have new content that also has “The cat is on the couch.” your TMS will recognize that this sentence is a perfect match and maybe pre-confirm or even lock that content so that you don’t have to touch it ever again.

The challenge is that TMSs store a lot more information than just the words you see.

Even a simple DOCX file has its own encoding and embeds code for formatting, styles, and other matters that are unobservable to the naked eye.


Start for Free

So the sentence “That cat is on the couch.” could be stored as “{1}The cat {1} is on the {2}couch{2}.” If that’s the case and you reprocess “The cat is on the couch.” you will not have a perfect match but maybe a 95 to 99% match. And while that may seem acceptable if you have a large content base that operates across multiple languages, this change in leveraging behavior could lead to hundreds of thousands of segments requiring re-processing in dozens of languages which will lead to a negative outcome for the person that picks up the tab.

So what to do in light of this? Well in light of this, before you migrate ensure that you have extensively tested translation memory leveraging over a wide range of your content repositories and file types to establish a baseline expectation for leveraging. This will allow you to address any potential issues before you have fully committed to your migration and can no longer turn back.

What to do if my baseline doesn’t perform as expected?

If your baseline doesn’t perform as expected the first step is to look for the root cause of this loss of leveraging. It’s either going to reside in your Translation Memory containing the imprint of your legacy TMS parsing and segmentation strategy or your new TMS’s parsing and segmentation strategy.

Before we go any further let’s define parsing and segmentation.

Parsing

Parsing is how any given system breaks up data that is fed into it. Parsing in spoken language for instance is how our ears can distinguish between sounds to understand separate words for instance. Parsing in localization refers to the rules that a TMS follows to break down the source content into translatable content. This includes for instance: how does the TMS record formatting, how does it record variables, and how does it separate code from translatable content. 

Segmentation

Segmentation refers solely to what constitutes a sentence. Is a sentence defined by certain punctuation? Will line breaks result in a new sentence? These different rules will also allow for different leveraging. If you translated in your legacy TMS a PPTX file where several of your sentences wrap around the text box and your TMS understood these line breaks as sentence breaks and your new TMS disregards the line break as a sentence break, you will lose leveraging for that particular sentence. Line break loss for instance can be even more dramatic than parsing loss because the sentence can become entirely unrecognizable.

What can you do about this?

Luckily you can handle these issues with several different approaches:

The brute-force approach

In this approach, you just bite the bullet and own the fact that TMS migrations are messy and some losses are expected. You simply accept the loss of leveraging and understand that at the beginning translating will become more expensive and time-consuming but that will soon level out and return to previous levels once your content base is mostly adjusted to your new reality. This approach works particularly well for small programs (Program Size = Content Base Size x Languages) but it will create some serious budget issues if you are operating with larger programs.

The Translation Memory approach

In this approach, you can have TMS engineers analyze encoding patterns in your Translation Memory and look for opportunities to apply rule-based scripts that will transform these patterns into patterns that are more compatible with your future TMS standards.

So let’s say for instance your TM has “{1} The cat {1} is on the {2}couch{2}.” but your new TMS ignores tag {2} and just records it as “{1} The cat {1} is the couch.”.

By addressing the TM you would run a script that understands in which conditions the {2} tag was introduced and if you are confident that you can remove that {2} without further implications, you run that script.

You can work iteratively and test until you reach a point where you see diminishing marginal returns and it no longer makes economic sense to further the Translation Memory optimization efforts. This approach works well for programs of all sizes but it does introduce the possibility of fundamentally changing the TM in unexpected ways that may result in undesired and most importantly unforeseeable consequences. 

The new parsing and segmentation strategy approach

In this approach, you can address how your future TMS parses and segments content and adjust these rules so that they more closely resemble your Translation Memory. So in the same example as above, instead of cleaning the TM of tag {2} you would ensure that your future TMS also introduces tag {2} in a given encoding situation to maximize leverage. This approach is the most dynamic as you are leaving the Translation Memory untouched and can have a tailored approach for each content type.

So you can have for instance different parsing strategies for YAML vs DOCX files which will provide you with maximum flexibility and predictability moving forward. The challenge with this approach is that it’s typically more complex to implement and not all TMSs offer you the necessary flexibility over parsing and segmentation strategies.

Conclusion

Migrating TMSs is no walk in the park. It’s more like a complex trek through the Peruvian Amazon going up the Andes. It’s important to flag and address potential complications early on and like anything in localization, things mushroom and become exponential very quickly. Make sure you truly stress test your TMS so that you know exactly what you are getting into and create optimization strategies from the get-go.

 

Published On: June 24th, 2022 / Categories: Business Practices, Business Translation /

Gabriel Fairman

June 24, 2022

Find out what cutting-edge companies are doing to improve their localization ROI

Talk to us!