OII Standards and Specifications List





I'M Europe
OII Home Page
What is OII?
Standards List
OII Guides
OII Fora List
Conference Reports
Monthly Reports
Whats New?
OII Index
OII FAQ
OII Feedback
Disclaimer
Search Database

Document Interchange Standards

This section of the OII Standards and Specifications List provides information on the following standards used to interchange formatted and unformatted documents:

 Entry updated this month

Standards for document interchange are prepared by both private and public organizations. The following public bodies are active in this area:

  • ISO/IEC JTC1/WG4 -- JTC1 is the first (and only) Joint Technical Committee of ISO and IEC, and deals with Information Technology. WG4 is the working group of JTC1 responsible for Document description languages
  • ITU -- International Telecommunication Union (formerly CCITT: Comité Consultatif Internationale de Téléphones et Télégraphes)
  • ECMA -- European Computer Manufacturers Association
  • EWOS EG SMMI -- European Working Group on Open Systems Expert Group on Structured Multimedia Interchange
  • W3C -- World Wide Web Consortium.


Section Contents
OII Home Page
OII Index
OII Help

DSSSL

Expanded name
Document Style Semantics and Specification Language

Area covered
Language for describing the way that text and graphics should be presented to users in a two-dimensional environment

Sponsoring body and standard details

Characteristics/description
Language used to associate formatting rules with the elements of a structured document encoded using SGML. Consists of two parts, a tree transformation language that can be used to reorder structured documents prior to presentation, and a formatting process that associates formatting instructions with specific "tree nodes" in the document to be presented.

Both parts of DSSSL are specified using a variant of the LISP list processing programming language called Scheme. DSSSL extends the basic IEEE-defined Scheme semantics by adding functions that can transform tree structures and provide the types of information about page dimensions, formatting rules, and language typically required by a text formatter.

A DSSSL processor does not necessarily format a document. It can simply define the information that a proprietary formatter needs to know to process a structured document. A DSSSL formatting specification is an interchangeable piece of information that can be passed from formatter to formatter so that the same general rules for presenting the associated data can be used by each output device.

In a fully standardized environment a structured document coded in SGML would have its formatting specifications written in DSSSL. These rules would be used by an application specific formatter to produce an SPDL output file that can be used to drive a printer.

Usage (Market segment and penetration)
A number of products were available in beta-form at the end of 1996. The SGML Open consortium have defined a subset of DSSSL called the DSSSL Online Application Profile that will be used by most vendors as their starting point for implementing a full DSSSL system. This subset has been adopted as the basis of the formatting specification language for XML.

Further details available from:
ISO or local national standards bodies, or the SGML Users' Group, PO Box 361, Swindon, Wiltshire SN5 7BF, UK.

To view the text submitted to ISO contact http://occam.sjf.novell.com:8080/dsssl/

Proposal to use XML and ECMAScript as alternative notation
Multimedia and Hypermedia Standards Activity, September 1997


Section Contents
OII Home Page
OII Index
OII Help

ECMA 262

Expanded name
ECMAScript: A general purpose, cross-platform programming language

Area covered
Java-based scripting language

Sponsoring body and standard details

Characteristics/description
EMCAScript is an object-oriented programming language for performing computations and manipulating computational objects within a host environment. ECMAScript is based on several originating technologies, the most well known being JavaScript (Netscape Communications) and Jscript (Microsoft Corporation).

An ECMAScript object is an unordered collection of properties each with zero or more attributes which determine how each property can be used. Properties are containers that hold other objects, primitive values, or methods. ECMAScript defines a collection of built-in object types, including Global objects, Object objects, Functions, Arrays, Strings, Booleans, Numbers, Math objects and Dates. There are, however, no provisions in the specification for input of external data or output of computed results. Instead the associated web browser will provide an ECMAScript host environment for client-side computation that includes objects that represent windows, menus, pop-ups, dialog boxes, text areas, anchors, frames, history, cookies, and input/output.

Usage (Market segment and penetration)
The development of ECMAScript started in November 1996. The standard was made public in June 1997, at which time a number of implementations were already available.

ECMAScript submitted to ISO/IEC JTC 1 for adoption under the fast-track procedure.

ECMAScript specification has been adopted as the evaluation and function definition specification for the XML Style Language (XSL).

Further details available from:
ECMA

Use of JavaScript objects by Microsoft
Multimedia and Hypermedia Standards Activity, September 1997


Section Contents
OII Home Page
OII Index
OII Help

HTML

Expanded name 
HyperText Markup Language

Areas covered 
Markup of text and related data interchanged over the World Wide Web (WWW)

Standard details

Characteristics/description 
The Hypertext Markup Language (HTML) is a simple markup language used to create hypertext documents that are platform independent. HTML documents are SGML documents with generic semantics that are appropriate for representing information from a wide range of domains. HTML markup can represent hypertext news, mail, documentation, and hypermedia; menus of options; database query results; simple structured documents with in-lined graphics; and hypertext views of existing bodies of information.

HTML has been in use by the World Wide Web (WWW) global information initiative since 1990. Version 2.0 (RFC 1866) roughly corresponds to the capabilities of HTML in common use prior to June 1994.

A draft for an extended version (4.0) of the HTML specification was released to the public on 8th July 1997 (revised 7th November 1997). The new draft includes facilities for multilingual data presentation, interactive elements and objects and control of presentation using cascading style sheets.

ISO have drafted a standard that formalizes a set of HTML tags that are well supported for use in the creation of documents for which a stable distribution platform is required.

Usage (Market segment and penetration)
HTML is the data format that has made the World Wide Web possible. Formattable documents transmitted over the Internet are coded using this language.

Further details available from:
Most of the current development is being undertaken by the World Wide Web Consortium (W3C). Details of their current work on extending HTML can be found at http://www.w3.org/pub/WWW/MarkUp/Activity.html.

Release of Version 4.0 Specification to the public
Multimedia and Hypermedia Standards Activity, July 1997
Embedding font information in HTML cascading style sheet specifications
Multimedia and Hypermedia Standards Activity, August 1997
Alignment of ISO-HTML with HTML 4.0 specification
DOM provides standardized API for HTML
OII Standards and Specifications Activity Report, October 1997
Publication of 2nd Edition of Cascading Style Sheets specification
HTML 4.0 given Proposed Recommendation status
OII Standards and Specifications Activity Report, November 1997
Unified Web Site Accessibility Guidelines
OII Standards and Specifications Activity Report, December 1997


Section Contents
OII Home Page
OII Index
OII Help

IPTC IIM

Expanded name
International Press Telecommunications Council - Information Interchange Model

Area covered
Tagged envelope structure for carrying image files

Sponsoring body and standard details
Developed by the IPTC, whose members are news agencies and digital wirephoto services, and their customers

Characteristics/description
The IPTC Information Interchange Model provides a mechanism for carrying either raw pixel data, or standardized or proprietary data formats from a registered list, and supplying the additional descriptive data necessary for the distribution and use of the images in newspaper production environments.

Usage (Market segment and penetration)
This format is being implemented by the majority of newsphoto service providers as they replace their analogue services with digital ones.

Further details available from:
International Press Telecommunications Council, 8 Sheet Street, Windsor, Berkshire SL4 1BG, UK



Section Contents
OII Home Page
OII Index
OII Help

ODA

Expanded name
Open Document Architecture and Interchange Format

Area covered
The interchange of business documents

Sponsoring bodies and standard details

Characteristics/description
ODA defines an architecture that describes typical business documents in terms of their content and two hierarchical structures: a logical structure and a layout structure. Documents can be interchanged in formatted form (using the layout structure only), in processable form (using the logical structure only) or in formatted-processable form (by interchanging both structures). Both forrmatting and structure information can be composed of two sets of information: generic data and document specific instructions.

The key characteristics of a particular class of document are defined in a Document Application Profile (DAP), which are defined as part of an International Standards Profile (ISP). Three levels of ISP have currently been defined for simple document structures (with or without raster graphics), enhanced document structures and extended document structures.

Usage (Market segment and penetration)
To date there have been relatively few applications of the ODA standard. The document structure adopted by ODA is adequate for business correspondence, and sufficient for 'content-driven' publications such as books, journals and reports where the design is based on rectangles.

Further details available from:
ISO or local national standards bodies.

Amendment to Part 7, Additional content codings for bi-level and multi-level images
Multimedia and Hypermedia Standards Activity, April 1997
Submission of final text of ISP 15124-1 for publication
Multimedia and Hypermedia Standards Activity, September 1997


Section Contents
OII Home Page
OII Index
OII Help

OPI

Expanded name
Open Prepress Interface

Area covered
PostScript language comment conventions for placement of publication-quality, separated images

Sponsoring body and standard details
Proprietary specification developed by Aldus Corporation, who are now owned by Adobe Systems Inc

Characteristics/description
The Open Prepress Interface (OPI) allows image file separations to be incorporated into either a PostScript or a non-PostScript environment, thereby allowing traditional prepress systems to work alongside DTP 'front-ends'.

In a typical high-end scenario, the prepress customer takes original photographs to a colour prepress vendor before creating the publication in which the photographs will be placed. The prepress vendor creates two versions of each scanned image: a high-resolution version, which is stored on disk or tape, and a lower resolution colour Tag Image File Format (TIFF) version, which is sent to the customer.

The prepress customer places the TIFF files into the publication, using DTP software to size, position, and crop the image as needed. OPI compatible software includes special PostScript language comments to specify each image's filename and positioning, as well as any size and cropping adjustments made by the customer. The prepress system will use these comments to plan the high-resolution images into the publication at the correct size and position.

Usage (Market segment and penetration)
As OPI is vendor-specific its market penetration is governed by the success of the supporting companies. The most common examples of its commercial application can be found in newspapers and magazines.

Further details available from:
Adobe Systems Europe Ltd, Adobe House, Mid New Cultins, Edinburgh EH11 4DU, Scotland

For further details visit Adobe's website at http://www.adobe.com.



Section Contents
OII Home Page
OII Index
OII Help

PDF

Expanded name
Portable Document Format

Area covered
Page description language: a derivative of PostScript

Sponsoring body and standard details

  • Proprietary standard developed by Adobe Systems Inc.
  • Portable Document Format Reference Manual, Addison Wesley Longman, November 1996, ISBN 0 201 62628 4

Characteristics/description
Adobe's Portable Document Format (PDF) allows preformatted pages to be interchanged over a network.

Key features in PDF are a set of hot linksthumbnail icons of pages, chapter outlines and page annotations. The chapter outlines feature enables information to be added to a document, e.g. summaries, indexing information. Thumbnail icons of document pages facilitate fast browsing and random access. Page annotations act as electronic Post-Its and are user specific; they are not integrated with the document.

PDF has a set of markers for these hyperfacilities, which can either be added to existing PostScript files or passed down from 'front-end' text-processing packages into the final PostScript.

The conversion from PostScript to PDF can be carried out using software such as the Distiller program which is part of Adobe's Acrobat suite of software. Existing hyperfacility markers are converted during the conversion process. Alternatively, hyperfacility markers can be added manually from PDF viewers. A reverse process enables printable PostScript files to be recovered from PDF files.

PDF viewers allow users to view distilled PDF pages on a chosen platform. Functionality of the viewer includes panning, zooming, scrolling, skipping pages and navigating around the document. Existing hyperfacility markers can be used to move from point to point and new markers can be inserted by each user as required.

Usage (Market segment and penetration)
Adobe are now supplying version3 of the Acrobat product range, which conforms to version 1.2 of the PDF specification.

Further details available from:
Adobe Systems Europe Ltd, Adobe House, Mid New Cultins, Edinburgh EH11 4DU, Scotland

For further details visit Adobe's website at http://www.adobe.com.

Possible standarization as Prepress Digital Data Exchange standard
Multimedia and Hypermedia Standards Activity, May 1997


Section Contents
OII Home Page
OII Index
OII Help

PostScript (Levels 1, 2 and 3)

Area covered
Formatted text files incorporating vector and raster graphics in a form suitable for processing on a compatible printer

Sponsoring body and standard details
Proprietary standard developed by Adobe Systems Inc
Level 1 released 1985
Level 2 released 1990
Level 3 released 1997

A full specification can be obtained from PostScript Language Reference Manual 2nd edition, Addison Wesley Longman, December 1990, ISBN 0 201 18127-4

Characteristics/description
PostScript can be considered from several points of view:

  • as a general-purpose programming language with powerful built-in graphics primitives
  • as a page-description language that includes programming features
  • as an interactive system for controlling raster output devices (displays and printers)
  • under restricted conditions as an interchange format. An arbitrary PostScript file cannot be known to be editable on receipt, though it is possible if enough restrictions are placed on the originator of the file.

PostScript's most obvious language features are that it is a stack-based interpreted language which is heavily oriented toward graphics and typography. This design makes it useful as a device-independent page description language for imaging on raster devices. The language evolved from a printer control language into a communications medium on host computers.

The PostScript imaging model has later content totally replacing earlier content pixel-by-pixel, which means that sophisticated operations, such as image merge, cannot be expressed in standard PostScript alone. The Standard Page Description Language (SPDL) extends Level 2 functionality by adding document production attributes.

Usage (Market segment and penetration)
Since its release in 1985 PostScript has been used by high-end printers for the office anc commercial printing markets.

Further details available from:
Adobe Systems Europe Ltd, Adobe House, Mid New Cultins, Edinburgh EH11 4DU, Scotland

For further information contact Adobe's website at http://www.adobe.com.



Section Contents
OII Home Page
OII Index
OII Help

RTF

Expanded name
Rich Text Format

Area covered
Interchange of formatted text and graphics using the standard ASCII character set that is suitable for interchange over the Internet

Sponsoring body and standard details
Proprietary standard developed by Microsoft Corporation

Characteristics/description
The RTF specification details the ASCII representation required for most of the low-level functions supported by Microsoft's Word word processing package. Information about the fonts used, page layout and document management can be stored as part of the header information for each RTF file.

Before a file is converted into RTF any macros used to create the file must be expanded. For example, the names of styles used to create a Word document are not transmitted, only the lower-level formatting changes that resulted from the application of each style.

Images are converted from their binary form into a sequence of digits and letters, each character representing the value of 16 bits of the binary image. Attributes about the form and content of the image can be attached to the image using a generalized object representation.

Usage (Market segment and penetration)
Originally developed to allow Microsoft Word files to be interchanged between different platforms, RTF has now become one of the most commonly supported interchange formats between proprietary word processing systems.

Most word processors and desktop publishing systems provide an option that allows their documents to be converted to RTF. However, as Microsoft update the RTF specification each time they release a new level of software there is no guarantee that a new format of RTF file can be read by any other word processing program.

It should be noted that any function of a word processor that cannot be expressed in RTF will be lost during the conversion process.

Further details available from:
Microsoft Corporation, 16011 NE 36th Way, Redmond, Washington 98073-9717, USA

Limited details of the RTF specification are provided in Microsoft's Word for Windows Technical Reference manual, which must be obtained direct from Microsoft.



Section Contents
OII Home Page
OII Index
OII Help

SGML

Expanded name
Standard Generalized Markup Language

Area covered
Document structuring and interchange

Sponsoring body and standard details

Characteristics/description
SGML provides an object-oriented method for describing documents (and other information objects with appropriate characteristics). The standard defines a set of semantics for describing document structures, and an abstract syntax of formally coding document type definitions. Apart from defining a default (concrete) syntax, based the ISO 646 code set, that can be used for text and markup identification when no alternative is specified, SGML does not suggest any particular way in which documents should be structured but allows users to define the structure they require for document capture or presentation.

Each SGML document starts with a Document Type Definition (DTD) or a pointer to an externally stored DTD. Externally stored files, which can contain either SGML coded data or non-SGML data (coded in a declaared notation) can be referenced using public identifiers that can conform to the rules for Public Text Object Identifiers specified in ISO/IEC 9070.

SGML is a language for coding hierarchical structures and so can be used to mark up hierarchically structured data of the type typically found in books. It is also possible to use SGML to recode the grammar of other heirarchically structured data sets CGM. Advantages to such an approach include:

  • full hypertext linking into and out of the graphics is enabled using the same markup methods and processing used for text elements
  • positioning of text into graphics can be done using the same tools and techniques used to position text
  • the SGML markup would enable re-use of graphic objects
  • the graphic specification would become independent of the presentation system.

Usage (Market segment and penetration)
SGML has made its principal impact in markets making use of structured textual information. This has particularly included those markets managing and producing technical documentation, although not exclusively so. SGML was given early impetus by its adoption within the US Defense Department's CALS initiative, and its use within the FORMEX standard developed by the Office of Official Publications of the European Community. Its take up elsewhere has steadily increased, especially following the arrival of the World Wide Web, where it has been used as the formal basis for HTML and XML. It is to be expected that adoption of the related HyTime and DSSSL standards will increase SGML's market penetration.

Further details available from:
ISO or local national standards bodies, or the SGML Users' Group, PO Box 361, Swindon, Wiltshire SN5 7BF, UK.

Details of current JTC1/WG4 work, and resources related to their standards, can be obtained online fromhttp://www.ornl.gov/sgml/WG8/home.htm. A World Wide Web server providing up-to-date information on SGML is provided by the OpenSGML vendor's consortium at www.sgmlopen.org/sgml/.

Extension to 'Internet-enable' SGML
Guidelines for accessing data and metadata represented in SGML from databases, knowledge bases and search tools
OII Standards and Specifications Activity Report, December 1997


Section Contents
OII Home Page
OII Index
OII Help

SPDL

Expanded name
Standard Page Description Language

Area covered
Formatted text files incorporating vector and raster graphics in a form suitable for processing on a compatible printer

Sponsoring body and standard details

Characteristics/description
SPDL has its origins in the desire to provide a complete set of standard interchange languages for all stages of the traditional publishing process. SGML provides the language used in interchange at the authoring and editorial stages. DSSSL provides the language for specifying to the typesetter (formatter) how the document is to be composed and presented. SPDL provides the language that enables the style and layout decisions of the formatter to be realised on a variety of imaging surfaces (screen, paper, film, etc).

Usage (Market segment and penetration)
The new SPDL standard, being effectively an international reference version of PostScript, is likely to benefit greatly from the already significant presence of PostScript across a large range of markets.

Further details available from:
ISO or local national standards bodies

Further details can be obtained online from http://www.ornl.gov/sgml/WG8/home.htm. This database includes pointers to software for validating and processing SPDL files.



Section Contents
OII Home Page
OII Index
OII Help

TEI

Expanded name
Text Encoding Initiative

Area covered
Encoding scheme for complex textual structures

Sponsoring body and standard details
Association for Computers and the Humanities, Association for Computational Linguistics, and the Association for Literary and Linguistic Computing

Characteristics/description
Major international initiative within the academic community to provide a standard set of SGML tag definitions which can be used to represent all kinds of electronic information, in particular the datasets generated and used by research projects in linguistics, literature and the humanities in general.

Because of its emphasis on research applications, the TEI DTD is highly modular and extensible. The DTD is also unusual in its concern for bibliographic information. All TEI documents include a header which identifies the work and gives details of its source, as well as documenting the encoding and editorial practices applied.

Basic tag sets are provided for prose, poetry, drama, speech, dictionaries and terminological databases, and a method has been defined for creating customised mixes from these basic sets.

Additional tag sets are provided to capture information related to linking, analysis (including feature structure analysis), certainty, transcriptions, critiques, names and dates, nets (graphs, digraphs, trees, etc), figures and corpora. Additional tags can also be defined for use by individual research projects.

Auxiliary tagsets are provided for the definition of the TEI writing system, and for feature and tagset declarations. The first of these allows for the detailed documentation of any user-defined transliteration scheme used within a document; the second provides formal definitions of any feature structure annotation provided; the last defines a scheme for the production of SGML-based technical documentation.

Usage (Market segment and penetration)
Becoming widely accepted in the academic community, particularly amongst librarians, as the best way to encode text that might be applicable to a range of research applications. Many of the concepts used are relevant to traditional publishers and to other organizations that require in-depth analysis of text, or wish to harmonize complex DTDs for different applications.

Further details available from:
The full text of the Guidelines, and of the TEI DTD itself, are freely available from http://www.uic.edu/orgs/tei/, and also for sale, both in printed form, and as an electronic book.



Section Contents
OII Home Page
OII Index
OII Help

TeX DVI

Expanded name
TeX Device Independent File Format

Area covered
Language used to interchange files formatted using the TeX formatting language between different output devices.

Sponsoring body and standard details
Proprietary specification, developed by Donald Knuth of Stanford University, which is distributed under the auspices of the American Mathematical Society

Characteristics/description
Output produced by TeX formatters for interchange between printing devices. Commonly used to produce mathematical and other scientific texts.

The TeX primitives provide a very powerful set of typesetting controls. They are, however, difficult to use in their raw form as they form a fully-fledged programming language, which includes facilities for defining your own character shapes. As TeX also provides powerful facilities for compiling structured sets of macros most users generate documents that are coded using TeX macro sets, of which LaTeX is by far the most popular.

Work is currently underway to extend LaTeX to provide the type of facilities typically provided through SGML and HyTime. The SIMSIM TeX macro package can be used to convert SGML documents into TeX format.

Usage (Market segment and penetration)
TeX is widely used in the academic environment. For document interchange most people exchange unformatted TeX files, but this can lead to problems if different macro packages (or different versions of the same package) have been used. DVI files avoid this problem as the document is interchanged in its formatted form, though it should be noted that as the DVI file contains no information about the fonts to be used to reproduce the file, relying on the receiver having the same fonts as the document's generator. Another problem is that the DVI file can be considerably longer than the source document!

Further details available from:
American Mathematical Society, 201 Charles Street, Providence, Rhode Island 02904, USA

AMS maintain a WWW page that points to a wide range of TeX resources at http://www.ams.org/tex.



Section Contents
OII Home Page
OII Index
OII Help

XML

Expanded name
The eXtensible Markup Language

Area covered
Encoding scheme for delivery of complex documents over the Internet

Sponsoring body and standard details

Characteristics/description
Subset of SGML designed to be transmissible over the Internet in such a way that document browsers do not need to access the document type definition to validate the document before display. As well as requiring all elements to be "well-formed", e.g. have both start and end tags present, the specification provides XML specific attributes and processing instructions that can be used to control the way documents are presented to users.

Usage (Market segment and penetration)
Specification only released November 1996. Prototype products from a number of vendors were displayed at SGML '96 on the day of the announcement. XML is now widely accepted as the way forward for the World Wide Web, and is becoming a standard part of software development kits.

Further details available from:
World Wide Web Consortium (W3C) and IETF

Use by Microsoft for Channel Definition Format (CDF) and Web Connectivity
Multimedia and Hypermedia Standards Activity, March 1997
Proposed use for Maths on the WWW
Multimedia and Hypermedia Standards Activity, April 1997
Multimedia and Hypermedia Standards Activity, May 1997
Proposed use for Lite EDI
Multimedia and Hypermedia Standards Activity, May 1997
Proposed use for Netscape's Meta Content Framework and Microsoft's XML-Data specifications
XML API (XAPI) proposal
Multimedia and Hypermedia Standards Activity, June 1997
Use in proposed IETF standards (OSD, DRP and RDF)
Multimedia and Hypermedia Standards Activity, August 1997
Release of XML Style Langauge proposal
Multimedia and Hypermedia Standards Activity, September 1997
DOM provides standardized API for XML
OII Standards and Specifications Activity Report, October 1997
Release of new draft
Use for Synchronized Multimedia Integration Language
OII Standards and Specifications Activity Report, November 1997
Release of proposed recommendation
Relevance of Document Object Model (DOM) to XML
OII Standards and Specifications Activity Report, December 1997
Use in Open Trading Protocol (OTP) specification
Name Spaces in XML published as W3C Note
XML-Data published as W3C Note
Draft Simple API for XML (SAX)
OII Standards and Specifications Activity Report, January 1998


Section Contents
OII Home Page
OII Index
OII Help

Vendor-specific Page Description Formats

Area covered
Proprietary page description interchange formats

Sponsoring body and standard details
None: each standard is supported by its principal developer

Characteristics/description
The following standards have been developed by software suppliers to allow descriptions of document contents and layout to be interchanged between software packages and output devices:

GQ      Epson's page description language  PCL     Hewlett-Packard's Printer Control Language  

Usage (Market segment and penetration)
Widely implemented in DTP and desktop applications.

Further details available from:
The developer.



Section Contents
OII Home Page
OII Index
OII Help

This information set on OII standards is maintained by Martin Bryan of The SGML Centre and Man-Sze Li of IC Focus on behalf of European Commission DGXIII/E.

File last updated: January 1998

Home - Gate - Back - Top - Docstand - Relevant