What is Translation Memory Software
Translation memory software: Everything you need to know
What is translation memory software?
Translation memory (TM) software is a type of computer-assisted translation (CAT) tool that stores translated phrases and sentences in a database called a translation use or translation memory database*. This database can then be used to automatically match and pre-translate similar phrases in future translations, which can help to increase consistency and efficiency in the translation process. Translation and translation memory databases software can also track changes and revisions over time and share information between working on the same document or project.
How does it work?
Translation memory software breaks down a source text file (the text to be translated) into segments, usually sentences or phrases. Each segment is then compared to the segments in the translation memory database to see if there is a match. If a match is found, the corresponding translation from the database is automatically inserted into the target text (the translated version of the source text).
The software typically uses a process called "fuzzy matching" to find matches, which means it can identify similar segments, but not exactly the same. This is useful because it allows for matches even if there are slight differences in wording or formatting between the source text and the segments in the context match the translation memory
When a segment has no match in the translation memory, the translation software will flag it as "untranslated" so the translator can provide a new one using translation memory.
The software also allows storing the new translations in the translation memory. This way, the new translations can be reused in future translations, increasing the efficiency and consistency of the translation process.
Overall, Translation memory software can be a valuable tool for freelance translators, translation agencies, and other organizations that frequently translate large volumes of text.
When was it invented?
Translation memory software was first developed in the 1980s. The first commercial Translation Memory system was called "Metatex" and was developed by the company "Star" and released in 1989. The concept of translation memory technology was based on storing previous translations together in a database and reusing them for future projects with similar phrases or sentences, which would help increase consistency and efficiency in the translation process.
Since the introduction of the first TM systems, the technology has continuously evolved and improved, with new features and capabilities being added over time. For example, many modern TM systems now include advanced fuzzy matching algorithms, support for file formats, and the ability to integrate with other CAT tools and machine translation systems
Nowadays, Translation Memory software is widely used in the industry as a standard tool for translation memory, helping increase efficiency, consistency, and cost savings in the translation process. Why is it important?
The best translation memory software in 2023
SDL Trados - The most used translation memory software in the industry. Born out of the fusion of Trados + SDL.
MemoQ - Loved by translators, MemoQ has a great translation editor UI
Phrase - Originally Memsource, was one of the first cloud-first translation management systems. Renamed Phrase after Memsource acquired Phrase. Supports both File-based and String-based localization.
Lokalise - Primarily focused on Developer use cases and integrations with other platforms such as GitHub, Figma, and others. Intuitive UI and growing market presence.
Bureau Works - Originally meant as an Enterprise only Translation Management, Quality Management, and Business Management System, Bureau Works brings together all necessary aspects to manage small to large-scale localization use-cases covering everything from quoting to payment automation.
Smartling - One of the most adopted Enterprise Translation management systems, focusing on Web-Proxy translations and offering services as well.
Transifex - Initially prominent in the open-source community, Transifex has large adoption by Developers and also offers a Web-Proxy solution.
Translation Memory, Glossary, and Style Guides
Translation Memory (TM), Glossary, and Style Guides are tools used to improve the consistency and quality of translations. However, each one serves a different purpose:
- Translation Memory (TM) is a database of previously translated phrases and sentences that can automatically match and pre-translate similar phrases in future translations. The main purpose of TM is to increase consistency and efficiency in the translation process.
- Glossary is a list of terms and their translations specific to a particular subject or industry. It helps to ensure consistency in using specific terminology throughout a translation. A glossary can also help ensure the translations are accurate and appropriate for the context.
- Style Guides are documents that provide guidelines and rules for writing and formatting text in a specific language or for a specific audience. They can help ensure consistency in grammar, punctuation, formatting, and other language-specific conventions.
In summary, while Translation Memory and Glossary are mainly focused on the consistency and accuracy of the translations, Style Guides are mainly focused on the consistency and clarity of the text. They all work together to improve the quality of translations and ensure that the translated text is consistent and appropriate for the target language and audience.
Translation memory vs. machine translation: How do they differ?
Source: Machine Translation
Translation Memory (TM) and Machine Translation (MT) are tools that can be used to translate text, but they work in different ways and have different strengths and weaknesses.
Translation Memory (TM) is a database of previously translated phrases and sentences that can automatically match and pre-translate similar phrases in future translations. The main purpose of TM is to increase consistency and efficiency in the translation process. It is mainly used for repetitive and similar content and is a tool mainly used by human translators to aid them in their work.
On the other hand, Machine Translation (MT) uses algorithms and neural networks to automatically translate text from one language to another. The main advantage of MT is that it can quickly translate large volumes of text and work with multiple languages. However, the quality of the translations produced by MT can vary widely depending on the complexity of the text and the quality of the training data used to train the MT system. It also lacks the cultural and idiomatic understanding a human translator has.
In summary, Translation Memory (TM) is mainly used for repetitive and similar content, and human translators mainly use it to aid them in their work. Machine Translation (MT) is used primarily for large volumes of text and can work with multiple languages. Still, its quality may vary widely depending on the text's complexity and the training data's quality. Both of them can be used together in a Translation process, where the TM is used to enhance the MT output, making it more accurate and more consistent translations.
When should I use a translation memory?
It would be best to use a translation memory (TM) when you have repetitive content that needs to be translated manually or to maintain consistency across multiple translations. Some specific situations when using a TM can be beneficial are:
When you need to translate a large volume of similar or identical content, for example, suppose you have a website, bilingual database, or product catalog that needs to be translated into multiple languages. In that case, a TM can help to ensure consistency and efficiency by reusing previously translated content.
When you need to translate multiple versions of the same document; for example, if you have a user manual that needs to be updated frequently, a TM can help to ensure consistency by reusing existing translations from previous versions of the manual.
When you need to translate content that is updated frequently; for example, if you have a news website or a blog that is updated frequently, a TM can help to ensure consistency and efficiency by reusing translations from previous translation articles.
When you need to work with multiple translators on the same project, a TM can help to ensure consistency by allowing translators to see and reuse translations from other team members.
Overall, Translation memory software can be a valuable tool for translators, translation agencies, and other organizations that frequently translate large volumes of text and need to maintain consistency and efficiency across multiple translations.
Creative texts are not the best match for a TM, but…
Creative texts, such as literature, poetry, or advertising copy, are not the best match for a Translation Memory (TM) because they often contain a high degree of creativity, figurative language, and idiomatic expressions that cannot be easily translated through a simple word-for-word translation
In addition, creative texts often have a unique style and tone that is difficult to capture in a TM. For example, a creative text may use metaphors, puns, and other literary devices that are difficult to translate consistently and accurately.
However, even when dealing with creative texts, a TM can still be used to improve consistency and efficiency in the translation process. One way to do this is by using a TM to store translations of common phrases, same words, and sentences that are used throughout the text. This can help ensure consistency in terminology and phrasing throughout the translation of translated texts. This is true, for instance, for Slogans or Disclaimers that are often recurring in these texts.
It's also possible to use a TM to pre-translate certain text segments and then have a professional translator review and edit the translations to ensure they are accurate and appropriate for the language professionals' current context.
Overall, while creative texts may not be the best match for a TM, they can still be used to improve consistency and efficiency in the translation process. Still, using it in conjunction with human translators and editing is important to ensure the best possible outcome.
Different repetitive content types can benefit from translation memory tools
There are many types of repetitive content that can benefit from using a Translation Memory (TM) tool. Here are some examples:
Technical manuals, user guides, and other types of technical documentation often contain a high degree of repetition, with many similar phrases and sentences used throughout the text. A TM can help ensure consistency and efficiency in translating these types of documents.
Software and website localization
Websites and software applications often contain a high degree of repetition, such as menu items, error messages, and other text types used repeatedly throughout the interface. A TM can help ensure consistency and efficiency in the localization process of these types of projects.
Marketing and advertising materials
Brochures, advertising copy, and other marketing materials often contain a high degree of repetition, such as slogans, product names, and other types of text used repeatedly throughout the materials. A TM can help ensure consistency and efficiency in translating these types of documents.
Legal documents, such as contracts, agreements, and legal forms, often contain a high degree of repetition, such as legal terms and phrases used repeatedly throughout the document. A TM can help ensure consistency and efficiency in translating these types of documents.
Medical and scientific documents
Medical and scientific documents often contain a high degree of repetition, such as technical terms and phrases used repeatedly throughout the document. A TM can help ensure consistency and efficiency in translating these types of documents.
In summary, any content with repetitive segments, be it text, phrases, or sentences, can benefit from using a Translation Memory (TM) tool, as it will help ensure consistency and efficiency in the translation process while saving time and costs.
What are the benefits of translation memory?
Translation memory allows you to:
- never translate the same sentence again
- saves money by not translating the same content again
- saves time by not counting it as translatable content and reducing turnaround time
- increases consistency because it references past translations
- recognize similar sentence structures
- saves money by offering partially translated sentences, and translators only have to "fill in the blanks."
- saves time by offering partially translated sentences
- increases consistency by referencing similar translations in the past
Legal and financial texts applications
- Contract Translation: TM technology can be used to translate legal contracts and agreements, ensuring that all translations are consistent, accurate, and up-to-date. Reusing previously translated segments from the TM database can help to reduce the time and resources required to translate new contracts while also improving the consistency and accuracy of the translations.
- Legal Terminology: Legal texts often contain specialized terms and phrases specific to the legal field. TM technology can help ensure consistency in using these terms by storing and reusing them in the TM database. This helps ensure that all legal translations are consistent and accurate while reducing the risk of errors and inconsistencies.
- Litigation Support: TM technology can be used to support litigation by providing a centralized database of translations that the legal team can easily access. This can be especially helpful in multi-lingual litigation, where the translation of key documents and evidence is crucial. By leveraging the benefits of TM technology, legal teams can quickly retrieve relevant translations and ensure that they are accurate and up-to-date.
- Financial Reports: TM technology can be used to translate financial reports, ensuring that all translations are consistent, accurate, and up-to-date. Reusing previously translated segments from the TM database can help to reduce the time and resources required to translate new reports while also improving the consistency and accuracy of the translations.
- Financial Terminology: Financial texts often contain specialized terms and phrases specific to the finance industry. TM technology can help ensure consistency in using these terms by storing and reusing them in the TM database. This helps ensure that all financial translations are consistent and accurate while reducing the risk of errors and inconsistencies.
- Regulatory Compliance: In the finance industry, compliance with regulations is crucial. TM technology can help ensure that translations comply with regulations by providing a centralized database of translations that can be easily reviewed and updated. This helps ensure that all financial translations are accurate and up-to-date while reducing the risk of errors and inconsistencies.
Technical documentation and product manuals
- User Manuals: User manuals for technical products often contain detailed instructions and specifications. TM technology can help ensure that all manual translations are consistent, accurate, and up-to-date by storing and reusing previously translated segments in the TM database. This helps ensure that users can access accurate and consistent user manuals in multiple languages.
- Technical Specifications: Technical specifications for products often contain detailed information about the product's features, functionality, and performance. TM technology can help ensure that all technical specification translations are consistent, accurate, and up-to-date by storing and reusing previously translated segments in the TM database. This helps to ensure that all stakeholders, including engineers, technicians, and sales teams, have access to accurate and consistent technical specifications in multiple languages.
- Software Localization: Technical documentation for software products often includes user interfaces, help files, and release notes. TM technology can help to ensure that all translations of the software's technical documentation are consistent, accurate, and up-to-date by storing and reusing previously translated segments in the TM database. This helps ensure that software users can access accurate and consistent software documentation in multiple languages.
Perfect and fuzzy matches
In the context of Translation Memory (TM) software, a "perfect match" and a "fuzzy match" refer to the level of similarity between a segment of the source text (the text to be translated) and a segment in the translation memory database
A "perfect match" occurs when the source text segment is an exact match with a segment in the translation memory database (including all the formatting and code surrounding the text). This exact match means that the wording, formatting, and context of the two segments are identical. A perfect match is considered to be the highest quality match, as the translation memory exchange can be automatically inserted into the same target file of text with a high degree of confidence that it is correct.
A "fuzzy match" occurs when the source text segment is similar, but not identical, to a segment in the translation memory database. This means that the wording, formatting, or context of the two segments may be different, but the overall meaning is the same. Fuzzy matches are considered to be lower-quality matches than perfect or identical matches, as the translation memory works may need to be reviewed and adjusted to ensure that it is appropriate for the current context. The software typically uses a percentage match score to indicate the level of similarity between the source text and the segment in the database.
In summary, a perfect match is an exact match of both the source of text and the segment in the TM, while a fuzzy match is a similar match but not the same. A perfect match is considered to be the highest quality match. The translation can be automatically inserted with high confidence. In contrast, a fuzzy match is deemed a lower-quality match, and the translation may need to be reviewed before being inserted.
What are some best practices for using a translation memory?
Here are some best practices for using a Translation Memory (TM) to improve the consistency and efficiency of your translations:
- Start by creating a high-quality TM: Before you begin using a TM, it's essential to ensure that you have a high-quality translation memory with accurate and appropriate translations. This can be done by creating a TM from previously translated content or by having a professional translator review and edit the TM.
- Use consistent terminology: To ensure consistency in your translations, it's important to use consistent terminology throughout your TM. This can be achieved by creating a glossary of terms and their translations and using it to pre-translate similar phrases and sentences in your TM.
- Regularly update and maintain your TM: As you continue to translate new content, it's important to regularly update and maintain your TM by adding new translations and removing outdated or incorrect translations. This will help ensure your TM stays current and accurate over time.
- Use TM in conjunction with other tools: Translation Memory software works best when used in conjunction with other tools such as glossary and style guides. They can help to ensure consistency and accuracy in your translations.
- Use the appropriate TM match level: Translation memory software allows you to set the match level for each segment. Using the appropriate match level for each segment is important to ensure you use the most appropriate translation. For example, using a perfect match for a piece that has been translated previously and a fuzzy match for a similar segment but not exactly the same.
- Review and edit the translations: Always review and edit the translations generated by the TM to ensure they are accurate and appropriate for the current context.
- Train your TM with your specific content: If you are working on a particular domain or industry, it is important to train your TM with specific content and terminology to improve the accuracy and consistency of the translations.
By following these best practices, you can ensure that your translations are consistent and efficient while also maintaining the highest possible level of quality.
Pay attention to the source text even before translation in order to benefit the most from translation memory
There are several ways to optimize source language content to maximize the usage of a Translation Memory (TM) tool:
- Use consistent terminology: To ensure consistency in your translations, it's important to use consistent terminology throughout your source language content. This can be achieved by creating a glossary of terms and their translations and using it to pre-translate similar phrases and sentences in your source language content.
- Use consistent formatting: Consistent formatting of the source language content can help the TM to recognize and match segments more easily. For example, using the same punctuation, capitalization, and spacing can help the TM to identify and match similar segments more accurately.
- Use consistent structure: Having a consistent structure in the source language content can also help the TM to recognize and match segments more easily. For example, using the same headings, bullet points, and numbered lists can help the TM to identify and match similar segments more accurately.
- Use consistent context: Consistent context of the source language content can also help the TM to recognize and match segments more easily. For example, if you translate a user manual and use the exact instructions for a specific step, the TM will be able to match the instructions more easily.
- Avoid idiomatic expressions: Idiomatic expressions are difficult to translate and often don't match the TM; avoiding them or using them sparingly is best.
- Use placeholders: Placeholders can be used in the source language content to represent variables such as dates, numbers, and names. This can help to ensure consistency in the translation of these variables, even if they change in the source language content.
By following these best practices, you can optimize your source language content to maximize the usage of your TM and improve the consistency and efficiency of your translations.
Examples of sentences that could be rewritten to maximize translation memory leveraging:
Here are 10 examples of sentences that are written differently but could be rewritten to be identical to each other:
- "The cat sat on the mat" and "The feline lounged on the rug."
- "I will meet you at the park," and "We will rendezvous at the greenspace."
- "I have a red car," and "I possess a vehicle of the red hue."
- "Please give me the book" and "Kindly provide me with the tome."
- "I am going to the store," and "I am heading to the establishment."
- "He is a tall man" and "He is of a towering stature."
- "The dog barked loudly," and "The canine made a loud noise."
- "I will call you later," and "I will contact you subsequently."
- "I am happy" and "I am in a state of joy."
- "The food was delicious," and "The cuisine was delightful to taste."
These sentences may sound different in structure or wording but still convey the same message. A Translation Memory system can match these sentences and provide the same translation, ensuring consistency throughout the text.
How to create a quality translation memory?
Creating a high-quality Translation Memory (TM) is essential for ensuring consistency and efficiency in your translations. Here are some steps you can take to create a high-quality TM:
- Start with a clean slate: Before creating your TM, it's essential to start with a clean slate by removing outdated or inaccurate translations from your database.
- Gather and organize your source content: Collect all the source content you want to include in your TM and organize it to make it easy to find and reference. This could be in the form of a spreadsheet, a database, or a document management system.
- Use consistent terminology: To ensure consistency in your translations, it's important to use consistent terminology throughout your source content. This can be achieved by creating a glossary of terms and their translations and using it to pre-translate similar phrases and sentences in your source content.
- Translate your source content: Use professional translators to translate your source content. Make sure that the translators are experienced in your industry and subject matter.
- Review and edit the translations: Have a professional translator review them to ensure they are accurate and appropriate for the current context.
- Store your translations in the TM: Once they have been reviewed and edited, store them in your TM in a way that makes them easy to find and reference.
- Keep your TM current: As you continue to translate new content, it's important to regularly update and maintain your TM by adding new translations and removing outdated or incorrect translations.
- Domain-specific content: If you are working on a specific domain or field, use a single TM for that field and do not mix that with other domains.
How to manage a huge translation memory
Managing a large Translation Memory (TM) can be challenging, but several strategies can help to make the process more manageable:
- Use a dedicated TM software: Use a reliable TM software designed to handle large databases of translations. This will make it easier to search, retrieve, and update translations and keep track of changes to the TM over time.
- Use categories and labels: Organize your translations into categories and labels to make finding and retrieving specific translations easier. This can be based on the type of content, the subject matter, or the language of the translation.
- Use filters: Use filters to narrow down your search results and make it easier to find specific translations. This can be based on the source or target language, the type of content, or the match level.
- Use search and replace functionality: Use the search and replace functionality to update multiple translations at once quickly. This can be useful when a term or phrase needs to be updated or corrected in multiple translations.
- Use versioning: Use versioning to keep track of changes to the TM over time. This can be useful when you need to roll back to an earlier version of a translation or when you need to compare different versions of a translation.
- Regularly review and clean the TM: Regularly review and clean the TM by removing outdated, inaccurate, or duplicate translations. This will help to ensure that your TM stays current and accurate over time.
- Use a team of translators: When managing a large TM, it's important to have a team of translators who can help to review, update, and maintain the TM. This will help to ensure that the TM stays current and accurate over time.
By following these strategies, you can effectively manage translation and localization projects on a large TM and ensure that your translations are accurate, consistent, and up-to-date.
How to maintain the integrity of a translation memory
This is the hardest thing in translation memories. They work beautifully when they are small and domain-specific and can hinder the translation process when they become too large, broad, or old.
Most translation programs do not have clear parameters for when a segment will be written onto a translation memory. Standard practice is that every time a translator confirms a segment, it gets written onto a translation memory. This results in the translation memory also containing incorrect suggestions that either were never reviewed or reviewed post-delivery outside the translation memory tool. This results in translation memory pollution.
In order to maximize leverage, many programs also will create a single flat translation memory repository. While this is simple to set up, it's terrible to maintain because you cannot isolate and solve contamination issues. Suppose you segregate your TM by content types, such as Web vs. Support vs. Legal, or by Departments or any other content management hierarchy that makes sense. In that case, it will be a lot easier to ensure the health of each of these translation memories works.
You can occasionally have TM health checks performed by having samples of the TM evaluated by linguists, mapping out potential issues such as inconsistent tagging or incorrect translations, or even language pairs. This allows you to map out patterns for these issues and then look for and resolve those patterns.
The big lesson here is that bigger is not better. People get "greedy" and want to maximize returns on TMs by making them as large as possible, but TMs should be seen first and foremost as quality and consistency drivers. Savings should be a much-desired side-effect, not the end goal in and of itself.
What to do if the translation memory is not 100% reliable
Most translation memories are not 100% reliable. This creates an accountability problem as translators can wash their hands and claim that the TM feed introduced the error. A reliable workaround is to ensure that translators get paid a smaller amount to at least re-read 100% of matches to get more overall context and flag potential TM issues. While this will increase the overall translation cost by some percentage, it can also be a significant translation quality driver.
When is MT better than TM?
This is a question in flux. A few years ago, the answer would have been a categorical NEVER. These days based on our studies, MT is better than TM unless you have a match rate above 80%. Below an 80% match, a translator is more likely to have a smaller edit distance (effort required to take the TM or MT feed from partial to perfect). At Bureau Works, we are working to unify MT and TM into a single feed with a confidence score to make machine translation software output more reliable and productive.
Use automated quality assurance (QA) checks before committing translations to the TM
This is an important best practice, as running automated QA checks prior to committing translations to the TM minimizes pollution and contamination. One of the challenges is that many translators still work in desktop instances that are not cloud-controlled, meaning that organizations may not have that granular control of when a sentence gets written to a TM or not. It's common practice that agencies send out a LocKit (Localization Kit) containing pre-translated bilingual files (typically in XLIFF) format old translation name, a glossary, and a TM. They will do the translation jobs and then receive these assets back and typically import the translation unit of the TM into a larger TM. There are few checks and balances involved in this process typically.
Include in the TM any change done outside of it
This is another huge TM management issue. There is typically a last mile of content editing that takes place outside of the translation management system environment. Lawyers may make changes to legal translations, while marketers may make changes to the copy. These changes will rarely flow back into the TM, creating a problem where the same error may persist in future translations and resulting in an overall feeling of TM unreliability. In this sense, it's fundamental for an organization to work with a cloud-based TM content management system such as Bureau Works that allows for end-to-end collaboration between translators and stakeholders and enforces strict guidelines that ensure changes occur within the proper TM editing environment.
One key difference in TM configurations is whether you allow or not multiple feeds for the same segment. If you allow them, you may have one source sentence with multiple alternatives as translations. This will either force you to have translators review this feed every single time to pick the correct translation or implement a heuristic mechanism such as defaulting to the latest time stamp. You can also set a TM to overwrite any new translation for an existing translation or TM segment. Both paths have their pros and cons. I am of the opinion that less is more and would go for overwriting 100% of the time. There are exceptions where this will break down, such as the same segment meaning different things in different contexts. Still, I would rather deal with exceptions separately than create an entire mechanism to take into account a few exceptions here and there. This is use-case sensitive, though, and it's all about choosing the path of least resistance.
Most TM systems allow you to set penalties to a certain TM. A penalty will downgrade the match level accordingly. So if you apply a 1% penalty, this will cause a 100% match to be downgraded to a 99% match. The challenge here is that, in my opinion, a translation memory is an all-or-nothing kind of deal. Either it is reliable, and there are no penalties attached, or it's not reliable, and having a penalty won't solve your problems. Translators are trained to anchor stylistic decisions based on the linguistic corpus contained in a TM, and even if matches are downgraded, they will still contaminate the translator's writing process. The penalty is ultimately a way for project managers to believe they can salvage a contaminated TM. Still, I think it's best to find ways to clean and maintain the TM, even if it's more costly than applying penalties to it.
TM in the future
As a static database, the TM will likely be replaced by more dynamic and larger data sets, such as Large Language Models. Linear matches governed by Levenstein edit distances are likely to be replaced by dynamic confidence scores that use natural language processing and other models to infer the quality of a feed regardless of whether it derives from a computational model of a TM.
TMs revolutionized the translation industry by adding a cybernetic super-memory that translators could tap into. This was a game changer, particularly when it came to large technical documentation in the 80s. Suddenly, it was feasible to rely on technology for consistency that humans could not deliver. This also resulted in time-to-market efficiencies for translation units as well as overall savings. In many ways, the TM is the cornerstone for the birth of Localization as an industry.
The TM, like most processes, looks good from far but is far from good. When you drill into all the particularities of managing TM quality over many years, vendors, and content types, it's nothing but trivial to build out and manage a reliable TM that drives the good stuff in and keeps the bad stuff out. And when you get into all of the parameters required to properly manage a TM, such as penalties, tagging, duplicate treatment, and exception management, you quickly run into a knowledge gap pickle. Engineers will have the knowledge to manage the parameters from a technical perspective, but they will typically lack the business expertise required to align these technical decisions to business goals. Executives, on the other hand, will typically drive the business goals without clarity on the technical implications. This gap will typically result in TM mismanagement, pollution, contamination, and an overall loss in quality and leveraging. TMs though seemingly simple, are far from trivial and will play a critical role in determining your localization program's success or lack thereof.