Data as Asset – Web Architectures #3

This is a post in the series about data as an asset, see previous posts here and here.

follow me

Ruby on Rails, ColdFusion, JSP, ASP, ASP.NET et al

All mainstream back-end application environments incorporate an internationalization framework with high level of abstraction both in terms of API calls that dispense with custom solutions as well as data access and structuring. These technologies support many locales out-of-the-box and facilitate the separation of application logic from data via external resources. Ruby uses YAML; ColdFusion, as a Java EE application, and JSP rely on resource bundles; ASP.NET dynamically compiles into resx assemblies.

While creating locale-specific clusters of pages is a possibility (for example, via servlets using JSP), symmetric sites that use the same layout and content in all languages lend themselves to singular, internationalized structures. The former approach is useful if the local pages differ significantly from each other, but this is rarely the case, and it can create redundancies that can only be alleviated by extracting unique strings, which in turn complicates the localization process.

Content Management Systems

CMSs do as promised, and provide an application layer above the back-end. Be wary of any service that manipulates data not directly originating from the back-end and offers a quick-and-dirty localization solution. Dynamic content cannot be managed using the front-end for input, and such services simply cannot deliver at all. For static, very straightforward personal sites they might be an option, but for commercial purposes should be avoided.

Today WordPress dominates the market by a large margin; however, not in the corporate space, where Sitecore, EpiServer and the likes take the lead. Certain CMS offerings are integrated with localization tools, but limited in features. For best results, a dedicated software is often required.

When choosing a CMS, take the following into consideration:

  • 1:n correlation: Plurality differences are taken into account;
  • Tag order: In-string formatting markup tags are location independent and can be reorganized;
  • Context: Contextual information is provided for the translation team to correct identification of usage;
  • Encoding: May be a moot point today, but make sure that no corruption occurs and the system is devoid of local code pages.

Depending on your requirements, it may be important that the system can be seamlessly integrated with:

  • Machine translation: Possibility for integrated, on-the-fly MT for “good-enough” content, such as support tickets;
  • Procedural content generation: Web stores, travel sites, aggregators, etc. may require code that generates product/service descriptions without human intervention from a predefined set of relational data via fragment assembly. As opposed to machine translation, the end result is always grammatically correct and the text shows deterministic variance.

If the feature is not included by default, most commercial systems can be used with a 3rd party middleware, such as ClayTablet to connect them with the translation provider’s ecosystem. Interoperability solutions almost always produce standalone output that can be distributed outside the CMS, for example xml or xliff files. Alternatively, they can be accessed through a protocol, but this option prompts security concerns because of direct access to the intranet. Relying on standalone files doesn’t mean that the transfer cannot be direct and automated however. Content connectors detect data dumps and send the files directly to the translation tool, which can in turn leverage the content and determine the translatable delta. If turnaround time is of the essence, it can automatically assign the job to selected translators as well without the involvement of project management.


Final words

The technologies discussed in these two posts represent only a subset of the mainstream approaches, not to mention the plethora of available possibilities. Regardless of the implementation, the gist of the message is a universal design decision to create abstractions. The benefits of treating linguistic information as an independent pool of data are self-explanatory on the short term. Development, maintenance and localization cycles shrink while modularity and reusability grows.

This concept can be an asset on a strategic scale as well, but let’s keep you on your toes until next time!

Leave a Reply