By
Arle Lommel
LISA Publications Manager
www.lisa.org
Get the List of 5,400+ Translation Agencies Now!
No Recurring Membership Fees!
Unicode has held out the promise
of simplified multilingual workflows, improved publishing
support for the world’s languages, and elimination
of many hassles that now plague work in the GILT
community. In this article Arle Lommel looks at
changes in the past two years regarding Unicode
support in applications localizers typically deal
with.
In 2001
I reported on Unicode implementation and OpenType
in the LISA Newsletter (available
for LISA members). I reported then, in essence,
that Unicode had made little or no impact on how
most localizers worked at the time, but that some
changes were on the horizon.
The problem
with Unicode so far has been that most of the work
on Unicode has gone into the back end of systems.
While this makes perfect sense from an implementations
standpoint (the front end can’t support Unicode
if the back end doesn’t), it does mean that
Unicode has actually impacted most computer users
very little. Unicode support in a database or an
operating system doesn’t really make much
difference if the applications people are using
either ignore Unicode or, even worse, corrupt Unicode
data passed to them.
Unfortunately
the vast majority of applications on the market
still assume a monolingual world where Unicode is
unimportant. Until the key software most users deal
with can support Unicode, most users will not be
able to take advantage of Unicode. As but one example,
Quark XPress is a staple of almost every publisher
in the world, yet it does not support Unicode in
any meaningful way, so no level of operating system
or font support for Unicode will make one whit of
difference to a publisher doing everything in Quark
XPress.
Most multilingual
computer users are still using the same fundamental
technologies for international text that they were
using over a decade ago, and the time- and labor-saving
potentials of Unicode were, until recently, by and
large still empty promises. That said, recent developments
in end-user Unicode support (as opposed
to systems Unicode support) indicate that Unicode
support is truly entering the mainstream. Increasing
numbers of consumer applications support Unicode
(and OpenType, which is likely to be the default
Unicode implementation for most users), and the
number of applications supporting Unicode (and the
quality of their implementations) has risen dramatically
in the past few years.
Fonts
In 2001,
when I last wrote on this subject, Adobe had 21
fonts available in OpenType format (out of hundreds
in the company’s font library), but Adobe
was actively porting their fonts from PostScript
Type 1 to OpenType. This process is now complete,
and Adobe no longer sells PostScript Type 1 fonts.
Most of these conversions are to “Standard”
fonts, i.e., the fonts are identical in their glyph
complements to the older versions; but a significant
number of the fonts have been converted to “Pro”
versions that contain additional characters, including,
in some cases, full support for all European languages
that use Roman script, as well as Cyrillic and Greek
scripts, plus historical character variants and
dingbat (decorative) character forms.
Other major type foundries
(an outdated term if ever there was one!) have made
similar conversions, so now there are literally
thousands of fonts in OpenType format to choose
from, rather than the slim selection of a few years
ago. In addition, those who wish to build their
own OpenType fonts now have a solid font-editing
choice in FontLab 4.5 (see the review
in this issue), so legacy custom fonts can be converted
to the new format and take advantage of the Unicode
under-pinnings of OpenType.
Both Microsoft
and Apple have also bundled Unicode-rich fonts with
their operating systems and make liberal use of
these fonts, enabling many of the basic applications
made by both companies to support a truly amazing
array of languages (see notes on each operating
system below). The bundling of Unicode fonts with
operating systems (and the implementation of system
calls that recognize these fonts) is a major milestone
in the progress of Unicode, for these fonts provide
developers with access to resources that would be
prohibitively expensive otherwise, and allow various
applications to “talk” to each other.
This level of system support is what is needed to
open the gates for more and more Unicode-enabled
products to enter the market.
Operating
Systems
A few
years ago OpenType and Unicode support in major
operating systems seemed half-baked at best. Despite
claims that OSes were Unicode “under the hood,”
this support did not translate into usable support
in most instances. Input and display support were
spotty at best. Even if the “innards”
of the OS were Unicode, it did not mean much to
users if they could not see or access this support.
Fortunately OS support for Unicode has improved
dramatically since 2001. Because localizers overwhelmingly
deal with just two platforms, Microsoft Windows
and Apple’s Mac OS, I will focus on these
two OSes. Unix Unicode support varies by Unix “flavor”
and installation - Unix installations tend to be
much less uniform than Windows or Mac OS installations
- and is beyond the scope of this article.
Windows
Windows
has supported OpenType and Unicode since at least
Windows 98, but Windows XP has taken this support
to a new level. Within the desktop environment scripts
can be mixed and matched with little difficulty,
and input method support is excellent. Applications
such as WordPad that rely on system text-handling
calls inherit Unicode support from Windows, and
so are now inherently multilingual. Unfortunately
OS support does not automatically make most legacy
applications Unicode-capable, and many major DTP
and drawing applications do not support anything
but the traditional single-byte font range at this
point. These applications will need to be re-engineered
to take advantage of OS Unicode support.
Macintosh
Apple’s
Mac OS X’s Unicode support is very similar
to Windows XP’s. Within the desktop environment
scripts can be freely mixed within file names and
even bi-directional text is handled properly. In
the example below a text file has been given a name
consisting of Japanese, Greek, Hebrew, Devanagari,
Hangul (Korean) and Arabic characters. While most
of the name is random characters, it can be seen
that they coexist quite nicely.
Figure
1. An example of a multiple-script file name under
Mac OS X.
Unfortunately,
as with Windows, most legacy applications are unable
to take advantage of the OS’s Unicode support.
Applications written in the fully native OS X “Cocoa”
environment, including most of Apple’s bundled
applications, seem to deal with Unicode data very
well, while those written in the “Carbon”
environment used to port OS 9 applications to OS
X do not seem to make use of the OS support. Most
major applications are Carbon ports from OS 9 at
this point, so real Unicode support is spotty. This
means, unfortunately, that the large amount of DTP
work done on Macintoshes is still using the same
text and font technology it was a decade ago…
Applications
This
section will focus on two classes of applications:
web browsers and word processing/DTP applications.
I have chosen to focus on these applications, rather
than other sorts of applications, for the simple
reason that browsers and DTP applications represent
the destination for a large percentage of the work
GILT companies do, thus real Unicode support in
these areas would have a disproportionately large
impact on GILT companies.
Web
Browsers
Web browser support
is a perpetual problem for localizers given the
wide variety of browsers and the number of users
of antiquated browsers (such as Netscape 4.7). The
good news is that the latest browsers from Microsoft
and Netscape support Unicode very well, and the
two account for the majority of potential users
of web content. How well these browsers work, however,
depends not only on the browsers’ own capability,
but also on the resources (such as fonts) of the
operating system under which they run. Unicode support
will get even better as the browser developers implement
recommendations of the W3C’s Internationalization
Working Group (see Richard Ishida’s presentation
on W3C
internationalization activity for more
information on the recommendations).
Even Unicode-capable
browsers differ in their capabilities and may or
may not be able to display certain scripts. In the
following example, two different Unicode-capable
browsers (both under Mac OS X) show striking differences
in their display capabilities. Microsoft Internet
Explorer 5.2 fails to render Arabic, Hebrew, or
Devanagari in an intelligible manner, while it replaces
a number of Greek characters with incorrect Roman
glyphs, and the whole Greek line’s appearance
is odd, to say the least. Both browsers fail to
display certain characters in the line in Kazakh
because the fonts available to them do not include
these characters. Note as well, the way in which
characters the browser cannot display are handled
in Kazakh - Internet Explorer renders them as ?,
while Safari renders them with a glyph that identifies
them as unavailable Cyrillic characters.

Figure
2. Different Unicode-capable browsers differ in
their ability to actually render Unicode text. (The
example screenshots are of Richard Ishida’s
presentation on W3C internationalization activity.)
At present
the overwhelming majority of web browsers in use
do not support Internationalized Resource Indicators
(IRIs), the internationalized replacement for URI
web addresses, so developers cannot count on being
able to use IRIs for some time, and are still limited
to US-ASCII URIs for web addresses.
In short,
Unicode browsers are here, but they are not yet
perfect, and Unicode text cannot be relied upon
in all circumstances on the web. This is an area
of rapid change, however, and just a few years ago
browser support was much less reliable than it is
now. My prediction is that within two years, most
users’ needs for Unicode capability in browsers
will be met, and reliance on legacy code pages for
web content will be increasingly needless.
Word
Processing/DTP Applications
Microsoft
Office
The
latest versions of Microsoft Office for Windows
support Unicode TrueType fonts quite well, but do
not support any OpenType advanced features beyond
those required for specific scripts; Roman-script
Office documents under Windows cannot automatically
substitute glyphs or take advantage of any of the
advanced typographic features of OpenType. Most
business users of Office are unlikely to need anything
beyond basic language support for various scripts
however, so Office’s support for OpenType
is adequate for its intended audience.
Office
X for Macintosh does not seem to support Unicode
font display or input, and instead relies on old
code pages for its international support. This means
that Office documents created on Windows may not
display properly on the Macintosh (although Office
does pass Unicode data through unharmed and does
not crash or corrupt the data, so a file opened
under Mac Office X can be returned to Windows without
damage). Office X will recognize Roman OpenType
fonts, but utilizes only the 232 characters available
in a traditional single-byte Roman font. (It will
correctly interpret and use CJK OpenType fonts however,
so Asian-language users of Office are in luck.)
Quark
XPress
Quark
XPress is the undisputed king of page layout programs
(despite Adobe InDesign’s strong showing in
this area), and real Unicode support in Quark would
mark a major turning point in Unicode acceptance.
Unfortunately it seems that the recent release of
Quark XPress 6.0 has not introduced Unicode or OpenType
support to Quark XPress. This is disappointing,
but also consistent with Quark’s requirement
that users purchase separate language versions of
the software to handle various scripts. Aside from
the fact that failure to really support Unicode
or OpenType hampers multilingual work in Quark,
it also means that Quark cannot take advantage of
the advanced typographical capabilities afforded
by OpenType.
Adobe
InDesign
Although
Adobe’s “Quark killer” has not
even come close to breaking Quark’s dominance
in the DTP market, it does have vastly superior
Unicode and multilingual support when compared to
Quark XPress. The support is a bit quirky, but no
other mainstream DTP application on the market comes
even close to InDesign’s Unicode support.
InDesign was built around OpenType and Unicode,
and it is only with OpenType fonts that InDesign’s
capabilities reach their maximum potential. It supports
a large number of advanced OpenType features and
allows free mixing of scripts (with the notable
exception of bi-directional scripts like Arabic
and Hebrew), and can readily import and display
Microsoft Office files that include Unicode text
(something, as noted above, that Office for Mac
cannot do).
Strangely,
at least under Mac OS X, Unicode input methods for
text are not supported, while older code page based
input methods work fine (presumably this is because
InDesign on Mac is a Carbon application that cannot
fully make use of OS X’s native resources).
Use of an “Insert Character” palette
does allow characters to be selected and input,
but this is really not an option for serious text
entry. However, if text can be entered into Microsoft
Office, InDesign can import that, so there are routes
to get Unicode text into InDesign.
InDesign’s
Unicode support should help make InDesign an attractive
platform for localization. It would be especially
attractive for projects that require multiple languages
to be supported within a single document. It is
not hard to conceive of projects that would require
four or more separate copies of Quark XPress’s
various localized versions that could, in principle,
be handled with a single standard installation of
InDesign (e.g., a project in English with small
amounts of text in Simplified Chinese, Japanese
and Korean would need to be opened in four separate
copies of Quark - one to work on each language -
while it could be done entirely within the English
version of InDesign with no special installation
or plug-ins).
Adobe
PhotoShop
PhotoShop’s
OpenType support is similar to that of InDesign.
It supports some of OpenType’s advanced layout
features (such as ligatures and old style figures),
but not as many as InDesign supports, and it suffers
from the same input method restrictions as InDesign
(at least under Mac OS X), while lacking an Insert
Character palette. This means that PhotoShop cannot
access the full character complement of many OpenType
fonts (it cannot access the Greek characters in
Adobe’s flagship MinionPro, for example).
For languages with appropriate input methods, however,
the OpenType support is solid and adequate.
Figure
3. Example of multilingual OpenType text in Adobe
Photoshop. A single OpenType font (Adobe MinionPro)
is used to display text in English, Russian, and
Hungarian. Unicode characters not available to non-Unicode
applications are shown in red.
Adobe
Illustrator
Adobe
Illustrator has limited multiple script support
for TrueType-flavored OpenType fonts at this time,
while PostScript-flavored OpenType fonts are treated
essentially as old-fashioned single-byte fonts.
Illustrator does, however, work very well with CJK
OpenType fonts and has additional language-specific
support for Japanese in the English version.
Adobe
Acrobat
For
obvious reasons Adobe Acrobat depends on other applications
for OpenType support, but output from OpenType-aware
applications to PDF via Acrobat Distiller is generally
flawless and accurate. Getting text back out of
Acrobat, however, is harder, and most characters
not found in old-fashioned single-byte fonts are
lost when text is exported from Acrobat. Most language
professionals know that PDFs are not always a final
destination format however, so this is one area
where Unicode/OpenType support could stand to improve.
Other
applications
Most
applications not mentioned above will support at
least the basic Roman range of OpenType fonts, and
some may have support for other code ranges, but
support tends to be quite basic. This will likely
change in the next few years as more and more applications
begin to take advantage of the OS-level support
made available in recent operating systems.
Summary
Although
meaningful Unicode support is far from universal,
real end-user support for Unicode has become a reality,
and is improving. There are real options for those
wanting to use Unicode today, and these options
are getting better all the time. Unicode is finally
beginning to live up to its promise. While there
is a long way to go before Unicode is pervasive
in everything we do, we are no longer waiting for
Unicode to have an impact on the bread and butter
of the GILT industry.
Reprinted
by permission from the Globalization Insider,
15 July 2003, Volume XII, Issue 3.2.
Copyright
the Localization Industry Standards Association
(Globalization Insider: www.localization.org,
LISA: www.lisa.org)
and S.M.P. Marketing Sarl (SMP) 2004
Read
more articles - Free!
E-mail
this article to your colleague!
Need
more translation jobs? Click here!
Translation
agencies are welcome to register here - Free!
Freelance
translators are welcome to register here - Free!
Subscribe
to TranslationDirectory.com newsletter - Free!
Take
part in TranslationDirectory.com poll - your voice counts!