Chance encounter of localization and E-Learning on the dissecting table – Part 3 – Word

In the finale of our first series about E-Learning focusing mainly on language engineering, Word is our protagonist, the arch adversary to xml-based localization. Microsoft has been taking steps towards reinventing Word as a mature document processing tool, but its legacy still bogs it down and prevents it from being a full-fledged content management solution. After 30 years, it is still green around the gills. Mentioning Word in the same paragraph with the concept of content management might sound off to you who are familiar with Word’s capabilities, design and purpose; however, it is still in widespread use for creating localization-sensitive material and documentation due to its accessibility and soft learning curve.

Show-stoppers?

Microsoft has been relying on their MUI (Multilingual User Interface) technology to offer localized versions of their products, but Windows 8 and Office 2013 are the first versions to introduce the possibility to choose the language of the UI without an extra charge. Previously, only the enterprise-level, high-priced versions came with this feature. This is cardinal in the case of Word, because the normal template and therefore all other templates that rely on it pull information from the user interface of even Windows, such as the keyboard layout, which is an obvious deficiency in localizability. One of the major stepping stones in the process of rendering Word a more self-contained, localization-aware software came along with the 2007 version, when the implementation of locale-dependent information, such as units, fonts, page sizes and others became independent of the Windows/Office locale for the first time. On the other hand, even in Word 2013, built-in style references still cannot be used from the interface by ID except for the headings, and all style references are literal by default; “Heading 1” in English, and “Überschrift 1” in German for example.

Theoretically, localizing on the xml-level could circumvent the shortcomings of using the native Word environment, but in practice, it is usually not a feasible option. As mentioned in the previous post, the assortment of tags is too high to identify even paragraphs with hundred percent confidence, and the chance of corruption turns the whole endeavour too risky. So can VBA or API be the loophole we are looking for? Unfortunately not, because both are dependent on the whole Word entourage. Moreover, lacking the associated functions, some settings, such as style sets and themes, cannot even be addressed. What all this boils down to is that language engineers are not given too much leeway in this double-edge issue, but they have to resort to a mix of VBA, xml hacking and manual adjustments if they do not want to keep adding comments to the support ticket.

If you happen to find a quicker, less convoluted way, please let us know, in the meantime, here is an outline of what we think all this translates into.

Downplay

Step 1: Extract all content that the CAT you use does not support

Some WYSIWYG elements can be problematic for CAT software to extract, such as certain text boxes, media, variables and references. memoQ uses a lazy approach when it comes to elements it cannot parse (for example Word 2010-specific objects marked up in xml as <w:14>), and can export files successfully by converting these to the latest recognized version, but Trados 2007 and 2009 Studio allows importing, only to fail irreversibly upon export. The upside of being lazy in this case is that even unsupported fields, such as Table of Authorities in Word 2010 can be treated as standard {XE} index elements, and so they do not have to be collected and removed in order to be processed in the CAT environment.

All the same, meta-data may cause problems or end up completely ignored. The citation database, content controls and others will not make it into the translatable bulk. Again, memoQ can parse content controls as fields because of its more loose method, but Trados tends to elegantly cut off the remainder of the paragraph including the content control without any notification.

Therefore, it is safest to use a macro that exports such content and insert an ID, so that it can put it back during post-processing.

Step 2: Translation

This phase blends into the standard localization workflow, but the translators, language leads and project managers have to keep the considerations of our first post in mind about managing E-Learning localization projects.

Step 3: Recreation in the local environment to ensure native default values

Making sure that all document settings and the layout conform to the target locale can be very time-consuming business. Therefore, we built a tool which does most of the work automatically, but because of the restrictions of Word, it cannot process all language settings, while themes and quick style sets have to be modified manually.

Even so, without relying on the built-in Word templates there would be a stellar amount of options to keep record of, so the first step is to set the editing language and UI to the target locale. This ensures that our tool can capitalize on the normal template and, where applicable, the UI references.

After switching UI and editing language, our tool creates a new document, now based on the target language’s normal template, and copies all the content from the translated, but still structurally unlocalized file. It asks for the user’s input to set the theme and style set, and then it adjusts style settings, custom styles, paragraphs, page sizes, margins, indentations, tabs, default fonts (intended substitutions are predefined in its configuration), default units (for example, converting points to characters and lines by .25 rounding for paragraph spacing and indentation for far-east languages), table widths and others. In the first pass it reinserts the objects removed from the file upon pre-processing, and converts paragraph settings, and if character-level formatting is enabled, it also retains sub-paragraph formatting.

Step 4: Small scale formatting

At this point, the document can be considered localized, but to make sure that the layout is correct, a DTP operator has to skim through it and make the final corrections.

Step 5: Final adjustments

Not only Word interface has to be in the target language, but even the layout of the keyboard, otherwise paragraph endings will change language. But switching to, for example, an Arabic layout would be too much fun, and it is no surprise that language engineers grasp to the last ray of hope, and never actually do so. Fortunately, this can be fixed easily within xml by adjusting the language code with regular expressions. If the document contains hidden elements, such as custom bookmarks starting with an underscore, they also need to be added directly in the xml.

And the crowd goes wild

Let us wrap up this epic story with a piece of advice. If you need your documentation localized at any point in the future, and not necessarily a fan of turmoil and excitement in your business life, choose a more fitting tool for this purpose. In the long term, the expense of a dedicated tool will definitely pay off.

Leave a Reply