e-ternals.com

Contact | Site Map | Nanos Home
Unique benefits | Unmatched speed and efficiency | Things you can do with Nanos
User interface | Sharing | Nanos and other programs | Creating websites | ISO-compliant texts | Which fonts?
Linux | Mac OS X | Windows | Nanos and other programs | Greek fonts | Searching and replacing
With other programs | With other users | Operating systems | Java | Data formats and ISO | Fonts | Web sites

Nanos

Nanos Home

Data formats and ISO

Nanos reads and writes the UTF-8 and UTF-16 data formats, as defined by the International Standards Organization (ISO).

Background
UTF-8 and UTF-16 are language-independent text data formats which are also independent from proprietary software and font technologies. They have been adopted by ISO as standard text data formats for text in all languages and writing systems. UTF-8 and UTF-16 files can contain text in all languages and writing systems. UTF-8 and UTF-16 are Unicode-based, so Unicode fonts are a prerequisite. The font technology used may change (from operating system to operating system, or over time), as long as it implements the character ranges defined in Unicode (Unicode has been adopted by ISO as the standard for character ranges and writing systems). In the age of global data exchange and digital long-term archives, data quality depends largely on data compatibility and sustainability. For this reason, international norms and regulations have been formulated under the guidance of the International Standards Organization (ISO) in collaboration with national governments, the academic world and major commercial organisations. These international norms and regulations define standard data formats for various purposes. Operating systems and software packages need to comply with international norms and regulations to be commercially viable, so the ISO norms governing data formats have become factually binding for all modern operating systems and software packages. The UTF-8 format is the default text data format for the Internet standard XML and for many operating systems and software packages, which is why we have also made it the default format for Nanos. Unless otherwise specified, Nanos will open and save files in the UTF-8 format. (Other options are UTF-16 and ANSI.)

For compatibility with older operating systems and software packages, Nanos also writes ANSI files (a pre-international text data standard by the American National Standards Institute).

Nanos is independent from font technologies, as long as the Unicode font and character range standard is adhered to. All modern operating systems adhere to the Unicode standard, as it has been adopted by the International Standards Organization (ISO), as ISO norm 10646 (with additional sub-norms defining international scripts, including monotonic and polytonic Greek). This universal adoption of Unicode by ISO makes it possible to run Nanos on wide variety of operating systems and with a wide variety of fonts. Currently, the most widespread font technologies are

  • PostScript (ATM1 and ATM3)
  • TrueType
  • OpenType

Nanos can be used with all of these. Supplied with Nanos are TrueType fonts, because they are compatible with all the operating systems for which Nanos is available. These are, currently:

  • Linux
  • Mac OS X
  • Solaris
  • Windows

Nanos outputs fully ISO-compliant and font-technology-independent data, so even if in future font technologies will change (as it is bound to do), Nanos output can be read correctly by future operating systems and software packages. The only requirement is that future software developers will continue to adhere to international norms and regulations as laid down by ISO. Considering the overwhelming significance of ISO norms for the economic viability of developed societies and international relations, we expect that this requirement will be met as long as there is Greek national as well as international demand for data in the Greek language and alphabet. The data type which Nanos outputs is the data type most suitable for long-term digital archives and library.

OpenType
Please note: Nanos does not require any OpenType fonts, although it is fully compatible with OpenType. Unfortunately, OpenType is a proprietary technology with innumerable possibilities for undocumented features. For instance, the Minion Pro font has been published in a special OpenType format with features that cannot be interpreted without special insider information. If you are not in possession of these very special little details, your operating systems or software packages cannot use this font without the risk of computer crashes or data loss. Such fonts are therefore filtered out as hazardous by conscientious programmers. Java Runtime Environments will eliminate all such troublemakers automatically from their font lists. OpenType´s undocumented features are mostly employed to implement a technological aberration: to compose final characters out of smaller components. In other words: OpenType aims at composing final characters out of smaller components at runtime, i.e. it does not supply ready-made characters, only the information with which operating systems and/or software packages are supposed to calculate the final characters. Insiders will recognize the old problem of ocmposite characters, a technical dead-end that was rejected by all European nations in favour of pre-composite characters. OpenType goes even further back in its failure than the old idea of composite characters, because composite characters were at least laid down in a now superseded Unicode regulation. OpenType features are not even stipulated by Unicode, even less by any ISO standard. In fact, ISO-standardization of OpenType would mean that all those undocumented features currently used by the makers of OpenType fonts would have to be retrospectively sanctioned, which is impossible (they are immense in number and bound to contradict each other from company to company), or abolished (which would abolish the fonts with them). Why would any sensible software company reinvent the old failure that was composite characters? Probably, because OpenType makes it possible to hide the data of the final characters from the view of others (such as other font makers): many characters cannot be simply copied any more by users to create other fonts, because they will not find the data -- these data are no longer there, only the (proprietary and hard-to-interpret) information telling certain authorized software companies how to interpret them. Unfortunately, this understandable attempt at protecting intellectual property generates potential break-downs of computing reliability and flexibility at the users' end. Thankfully, no OpenType features are necessary for Greek (in fact they are not necessary for anything we can think of). If we made use of OpenType's specific possibilities of making our font work pirate-proof, or if we integrated OpenType-specific features in our text output formats, we would make our output incompatible with operating systems and software packages which use other font technologies, such as TrueType or PostScript. Furthermore, our output would contain features which would not be part of any ISO standard, so data exchange even with other OpenType-based systems would be compromised, and our output would not qualify for use in long-term digital archives and libraries. For such reasons, we strongly advise against the use of any OpenType-based technology for projects with an interest in data sustainability and continuity as well as cross-platform compatibility. This leads us to the following topic:

Why Nanos uses precomposite Greek characters in preference over composite ones

In the early days of Unicode (long before it was adopted by the International Standards Organization) there was a variety of Unicode Greek which made use of so-called "composite characters". These are Greek characters consisting of two or more components, such as lowercase alpha with acute accent. Initially, operating systems and software packages used two or more Unicode characters to represent such composite characters (for example, one Unicode character for the lowercase alpha, and one more Unicode character for the acute accent). This approach (which was the one of Unicode version 1.0) was rejected by the Greek government on the grounds that it resulted in uncontrolled accents and other diacritics floating around on the computer screen and on printed paper, because the same Unicode accent character was placed over all vowel characters. For example, the acute accent that went over the lowercase alpha, was also placed over the lowercase iota, upsilon, and so forth. The results were ugly, and often completely unacceptable. Worse, they were unpredictable, because each operating system and software package was free to invent its own rules for placing the two (or more) components in relation to each other. After the intervention of the Greek national authorities, the Unicode organization abandoned the composite approach for Greek and replaced it with the precomposite approach that had long before been universally adopted for Latin-alphabet varieties such as French, Spanish or Czech. In the precomposite approach (since Unicode version 2.0), each Greek Unicode character is an independent unit, including all diacritics. For example, there is a character that contains both the lowercase alpha and the acute accent. This precomposite approach makes it possible for font designers to craft each character-and-diacritic(s) combination individually. Both in terms of shape as well as in terms of relative positioning, overall character width, and all other character and overall font parameters. So the precomposite approach makes it possible to design Greek fonts with the same professional quality as, say, fonts for French or Spanish, for which the precomposite approach has never been in question. It is only natural that Nanos should therefore follow national Greek requirements and international recommendations to only use precomposite Unicode characters for Greek. In fact, we use the precomposite approach rigorously for all scripts and languages, not just for Greek. The reasons are the same as those which led the Greek government to reject the composite approach in Unicode version 1.0: composite characters are aesthetically unacceptable and lead to technically unpredictable data. They are therefore in direct conflict with the aims of the International Standards Organization. Precomposite characters are not only perfectly and completely designed. They are also unambiguous, individual and fully documented units of writing that fulfil all requirements of research environments, digital archives and libraries.

About this site | Site Map | Contact | Copyright © 2005-2008 Gunthard Mueller. All Rights Reserved.
All trademarks and registered trademarks acknowledged. We do not assume any responsibility for any content that may be encountered on any other web site linked to from here, nor for any content to which these web sites may be linking.