<cite>OII Standards and Specifications List</cite>


I*M Europe
OII Home Page
What is OII?
Standards List
OII Guides
OII Fora List
Conference Reports
Monthly Reports
EC Reports
Whats New?
OII Index
OII FAQ
OII Feedback
Disclaimer
Search Database

OII Guide to Labelling, Rating and Filtering

This OII Guide discusses labelling in the context of electronic data, and rating and filtering as specific applications of labelling. This guide first discusses the relationships between these three concepts and then discusses each of the concepts. It concludes with a brief discussion on the implications for searching.

1. Relationships between Labelling, Rating & Filtering

While these three terms are often used together in a single discussion, it is of crucial importance to clearly distinguish between them:

  • Labelling is a means of describing what is in the content associated with the label without users having to open the container to examine the contents.
  • Rating is a process of assigning values to content based on certain assumptions/criteria. Rating is an application of labelling in that the results of rating could be stored in labels.
  • Filtering is a process of excluding things (blocking) which have certain properties. Filtering is an application of labelling in that these properties could be based on information stored in the label, such as the results of rating.

Labelling, rating and filtering could be done in a variety of ways and could be carried out by a variety of parties. This guide in particular seeks to identify the different mechanisms and parties involved within the scope of labelling.

2. Labelling

Labelling by definition provides data about data. Labelling is therefore a specific instance of applyingmetadata. However, it should be noted that a label does not necessarily contain all the metadata that is relevant to the content associated with the label. The key to a labelling system therefore is the kind of data provided in the label and what the data in the label actually says. Both are crucial for identifying the content to the user and to enable the user to decide whether he wishes to go a step further: to open the container and access the content.

Labelling can be contentious where a value judgement is involved in the information provided in the label. Take the following example of the information provided on a cigarette packet:

  1. 5mg Tar, 0.5mg Nicotine
  2. Luxury mild
  3. Tobacco seriously damages health

It could be said that different degrees of value judgement are involved. Assertion #1 is probably the most objective of the three. Note that objectivity is also linked with the precision of assigning values. The measurements of tar and nicotine can be independently validated. In contrast, what is "luxury mild" for some could be, say, "appallingly strong" for others. Assertion #3 obviously has the implicit message to (potential) users of the content to take certain actions (stop smoking, smoke less). The same message is arguably also implicit in Assertion #1. However, it would only be obvious to those who understand the significance of the presence of tar and nicotine in the content. Moreover, unless users know what constitutes an "acceptable" level of tar and nicotine, they cannot use the data to determine a course of action.

The ability of the user to grasp the relevance and significance of the information provided in the label is a major factor for the value judgement involved in responding to the label. However, a user may not necessarily trust the veracity of the information provided. A user may also choose to ignore the information provided in the label altogether.

It is important to appreciate that regardless of what information is provided in the label, it is ultimately the decision of a user on how he wishes to make use of the information in respect of access to the content. Cigarette labelling is entirely different from banning cigarettes, where a (potential) user is denied access to cigarettes through the decision of others.

The same considerations apply equally to labelling in the realm of electronic data. The Internet contains a wide range of materials and any materials that are openly available on the Internet, particular those available via the World Wide Web, are openly and uniformly accessible to users of the Internet. This is in contrast to traditional media, where there are usually means of identifying the content without having to access it first, in virtue of other conditions, such as the context in which the content is placed, e.g. pornographic magazines placed behind the counter of a store. The labelling of materials on the Internet is therefore vital for Internet users to be able to identify content, so that they can make a judgement of whether they would wish to access the content.

In the realm of electronic data, labelling has the following major requirements:

  • The format of the label -- the format needs to be supported by the server which distributes the label and the client software (typically a browser which processes the label)
  • Links between the label and the data to which the label is applied -- there are in principle three kinds of links:
    • external, explicit links
    • internal, implicit links (typically the label is embedded in the header of a document)
    • stored in a separate third document (although this raises the question of whether it is labelling or whether the third document becomes a kind of content in its own right)
  • Generic description of the data to which the label is applied (i.e. metadata)
  • Label assignation and distribution -- labelling on the Internet could in principle be carried out by anyone: content creators/publishers, third parties and end users. Obviously, the same content could be labelled by different parties and therefore attract different labels. Labels assigned by the content providers are generally embedded in the document, whereas labels assigned by third parties are usually distributed via label bureaux.

The most widely deployed labelling system on the Internet today is the Platform for Internet Content Selection (PICS), developed by the World Wide Web Consortium (W3C). Technical details of PICS are covered on the OII web site (OII Guide to Metadataand Metadata Interchange Standards section of theOII Standards and Specifications List), which the reader is advised to consult.

PICS provides a common format for labelling, and is independent of labelling vocabulary and criteria, as well as of label assignation. There are two important applications of PICS: content rating and software that uses rating systems to filter content. These are discussed in the following sections.

PICS labels can describe anything that can be named with a Uniform Resource Locator (URL), includingFile Transfer Protocol (FTP), Gopher, Usenet newsgroups, as well as e-mail messages from discussion lists (but not normal e-mail messages). A URL scheme for Internet Relay Chat (IRC) is under development at W3C.

3. Rating

As discussed in section 1, rating is a process of assigning values to content based on certain assumptions/criteria. The results of the rating process can be stored in a variety of ways. They could be stored in a label, as a banner, as a flag, etc. This section focuses on rating as an application of labelling.

In addition, the results of the rating process can be used in a variety of ways, including for filtering, which is discussed in the following section.

There are two basic approaches to content rating:

  • Self-rating: content providers to evaluate their own content
  • Third-party rating: interested third parties evaluate the content published by others.

The PICS labelling system supports both approaches. A PICS label can carry multiple rating results, regardless of who carries out the rating and the specific rating scheme that is used.

Examples of self-rating services that are compliant to the PICS labelling system include:

Examples of third party rating services that are compliant to the PICS labelling system include:

Among these, RSACi (sponsored by the Recreational Software Advisory Council), SafeSurf and NetShepherd are the most widely used today. RSACi in particular has been implemented in the latest versions of Microsoft's Internet Explorer and Netscape's Navigator browsers. It has been claimed that some 77,000 websites have been rated by RSACi at the time of writing (July 1998), including 60% of the top 100 sites. This list is also said to be expanding by 4,000 websites per month. NetShepherd has teamed up with Altavista and Catholic Telecom, Inc.

Fundamental to rating services are the criteria that are used for rating -- the categories and gradations within the categories that are used for classifying content. By way of illustration:

  • RSACi: rating categories include "Violence", "Nudity", "Sex", and "Language", with 5 ratings within each category
  • SafeSurf: rating categories include "Age Range", "Profanity", "Heterosexual Themes", "Homosexual Themes", "Nudity", "Violence", "Sex, Violence, and Profanity ", "Intolerance", "Glorifying Drug Use", "Other Adult Themes" and "Gambling", with 9 distinctions for each category
  • NetShepherd: rating categories include maturity levels ("General", "Child", "Pre-teen", "Teen", "Adult", and "Objectionable") and quality levels (1-5 stars).

Rating services have been the subject of intense debate because of its perceived civil liberty implications. Such discussions are outside the scope of this guide. However, it should be noted that both publishers of family-oriented materials and publishers of adult materials are generally in support of rating services. One obvious advantage is that rating would enable the content to reach its intended audience more effectively. It should be noted that the message "You must be over 18 or over to enter this site", which is typically displayed on the home page of adult sites, is an example of rudimentary self-rating and self-labelling -- i.e. this is a site of the category "Adult". However, as discussed in the cigarette packet example in section 2, it is entirely up to the user to determine what action is to be taken regarding this information. On the other hand, the typical need for users to provide credit card information prior to having access to the materials on adult sites constitutes a kind of filtering, which imposes restrictions on access to the content.

Another debate sparked off by rating services is the relative advantages and disadvantages of self-rating versus third party rating and the need or otherwise to monitor the operation of one or both types of rating services (and by whom). Such a discussion is an aspect of the wider civil liberty debate and is similarly outside the scope of this guide. However, it is important to note that the issue of the specific results of any rating service is intrinsically different to and independent of how the results are to be used (e.g. for filtering, as discussed in the following section). Secondly, the validity of the specific rating results, from both self-rating and third party rating, is relative to the assumptions and criteria for the individual rating services. The validity of a rating result in any rating service says nothing about the validity of the assumptions and criteria upon which the rating service is based.

Any content provider who considers self-rating and wishes to use one or more existing self-rating systems would need to consider a number of major issues. Although the content provider could in principle subscribe to all the available self-rating service, the resource requirement for performing the valuation process is a practical constraint:

  • The categories used by the self-rating service -- e.g. would the categorisation of a rating service misrepresent the nature of the content?
  • The popularity of the self-rating service -- this could present the content provider with a dilemma: he either accepts the ratings of the popular services, even if those ratings misrepresent his material, or refuses to rate his material, with the consequence that this might cause his material to be unavailable to some users (see the filtering section below)
  • User-friendliness of the self-rating service -- the resource requirement for performing the initial as well as subsequent self-rating as the material grows and/or evolves.

Concerns raised by third party rating services include:

  • The content provider may not know that his material has been rated, and/or what the ratings are
  • Inaccuracy in the ratings
  • Lack of adjudication procedures where the content provider disagrees with the third party ratings
  • Lack of regular/timely updates of the ratings
  • Cost of rating (although most rating services are currently free, some fear that this will change over time).

Any individuals or organisations could in principle start their own rating service. However, the resource requirement for such an undertaking should not be under-estimated. In addition to establishing the criteria for rating, any rating service is only relevant if it is widely used. The creator of the rating service must convince both content providers and users of content to adopt his rating service. He also needs to create third party rating for the contents on the Web (and there are over a million Websites on the Internet).

A major point about rating is that the very nature of the market for rating is biased against any new rating service which relies on self-rating. Users would only choose a rating service that has many sites rated; conversely, content providers would generally adopt a rating service for self-rating if there are already many users. Secondly, in so far as Microsoft is shipping browsers pre-configured using RSACi, and with Netscape following suit, this is a decisive influence on the choice of rating systems to be made by users and content providers alike in practice.

The criteria used for rating inevitably reflect a particular cultural perspective and from a European viewpoint, English based mono-cultural rating services (which account for the vast majority of rating services available today) are severely inadequate. There should in principle be a variety and diversity of rating services, using different categorisations as well as categorisations in different languages, to cater for both the cultural and linguistic requirements for particular communities, including ethnic minorities. Both the resource requirements for creating rating services as well as the market logic for rating services, however, present enormous challenges to meet this objective.

There is yet a further dimension to the limitation of rating services. Just as labelling cannot conceivably provide an exhaustive description of the data to which the label is applied, a rating system cannot provide a comprehensive valuation in respect of all the possible assumptions or criteria that can be applied to a particular content. Moreover, the applicability of specific assumptions and criteria are often context dependent, depending on, for example, to whom the information is being made available, for what purpose the user is accessing the information, and under what conditions the user is accessing the information. An individual may access the same information for very different purposes (e.g. law enforcement, legitimate research, kids monitoring, prurient interests) and under different conditions (e.g. during paid working hours or in leisure time). No rating system, irrespective of its objectives and scope, can cater for all the possible variations of the context in which it is applied.

4. Filtering

As discussed in section 1, filtering is a process of excluding things (blocking) which have certain properties. The key points of filtering are:

  • What to filter
  • How to filter (i.e. mechanisms of filtering).

What to filter is a question of selection -- what are the selection criteria and who should determine these criteria? These criteria could be determined "upstream" by governments or "downstream" by users of the information. Alternatively, it could be determined by the intermediaries, notably the Internet service providers in the context of Internet content filtering. Concerns for filtering generally relate to the upstream control, which is commonly termed censorship. In contrast, downstream filtering by the user, or at his request by his service provider, is the antithesis to censorship -- the user has total control over the choice of information that he, or those in his care, wishes to access. Note however that if the user who exercises control over information access is different to the end-user of the information, there is censorship involved (e.g. for children, censorship by parents or teachers).

The question of what to filter is closely related to how to filter -- i.e. at what point of the information transfer chain should the control (the blocking) be exercised.

In the context of electronic data, filtering can be carried out in a variety of ways, for example by:

  • Black listing -- the programme checks a list of banned Internet sites, and denies access to users to visit a site which is on the banned list
  • White listing -- the programme checks a list of allowed Internet sites (e.g. "portals" for children), and denies access to users to visit a site which is not on the allowed list
  • Keywords -- the programme checks a list of banned words/phrases, and denies access to users to visit a site which contains the banned text
  • Rating -- the programme looks up a rating attached to a particular Web page, and denies access to users to visit the page if the rating does not match the parameters that have been set up in the browser.

There are a myriad of filtering techniques and many of the filtering techniques for textual information were widely available and used prior to the advent of the Web. The Web has created particularly interests in filtering based on labels because of the ability of the Web to readily transmit vast quantities of images, which at present can only be filtered via labelling.

Filtering is an application of labelling in that these properties could be based on information stored in the label, such as the results of rating. This is the area of filtering which is covered within the scope of this guide. It should be noted that many filtering products available today provide monitoring, warning and tracking features, which are outside our scope. (As discussed in section 3, the need for users to provide credit card information prior to gaining access to the content of adult sites is an example of filtering which is however not based on the results of rating of the content. Instead an entirely different kind of rating -- one which is not stored on the label -- is involved here, namely the credit worthiness of the user based on the credit ratings given and held by the credit card company.)

The key point to filtering as an application of labelling is the range of rating services that could be supported by the filtering system. A distinction needs to be drawn between a standalone filtering system -- a complete filtering solution provided by a single vendor in which the filtering decisions are made by the vendor -- and a protocol-based filtering system, which enables the user to make the filtering decision. The PICS labelling system supports protocol-based filtering system in that a user of PICS-based software could choose between the rating services which carry the PICS label that he wishes to adopt. The filtering is then based on the parameters that the user sets in relation to the categorisation gradations of the chosen rating service. A Website whose PICS compliant rating exceeds a certain value on the rating scale or which does not have a PICS label would then be blocked.

Examples of client-based PICS compliant filtering software include:

Filtering software vendors vary greatly in the amount of information and control that is made available to users. Many vendors do not publish the (third party) ratings that they use. Some vendors provide detailed descriptions of the criteria used for rating and filtering. Some might allow users to add sites to the filtered lists, either in their own software or by sending sites to the vendor for review. Some might also provide password-protected control to users to override the filter. However, to the extent that a PICS-based system denies user access based on rating values and does not provide any contextual detail, there is a practical limit to the amount of information and control that could be directly and readily made available to users of PICS-based filtering software in respect of specific instances of filtering.

There is no doubt that the sophistication and user-friendliness of filtering software will become (even) better. From a user's perspective, areas of consideration include:

  • Ease of installation and un-installation
  • Degree of customisation, e.g. filter configuration, configuration for multiple users and with multiple levels of protection, facilities for a master user, override control, password protection and change, etc
  • Additional (non-rating related) facilities, such as time management options for monitoring usage, a variety of warning messages, tracking and logging of sites accessed, etc
  • Cost of software and updates.

In general, there is a trade-off between customisation and ease of configuration, and between flexibility and ease of use. It is also often commented that those for whom the filtering software is primarily intended (notably children) are generally more adapt at installing and configuring the software than those who decide on implementing the software (i.e. their parents and teachers).

5. Searching

Labelling, rating and filtering have important implications for searching mechanisms. At present, the ability of searching mechanisms to cater for searching by label and rating information given on a label is severely limited. There is no known searching service which provides access to the different labels and ratings which are ascribed to a particular content, which would provide users with informed choices about alternatives. As discussed in section 4, searching is an integral element in the process of filtering -- the results of searching determine at what point of the information transfer chain is the user being denied access to the content.

However, no searching mechanism, and therefore also no filtering, is 100% foolproof. Better searching mechanisms, for example by more widespread implementation of a common set of metadata, would improve the accuracy as well as efficiency of filtering. Searching could also impact on the adoption of rating systems in that a search engine which would list entries according to the ratings of a rating system would greatly enhance the usage of this rating system. Such search engines are not known to be available at present.

Note: In 1998 the European Commisssion published "Labelling, rating and filtering: an overview" ( DE EN ES FR IT). as part of its"Action plan on promoting safer use of the Internet".



Section Contents
OII Home Page
OII Index
OII Help

This information set on OII standards is maintained by Martin Bryan of The SGML Centre and Man-Sze Li of IC Focus on behalf ofEuropean Commission DGXIII/E.

File last updated: November 1998

Home - Gate - Back - Top - Labels - Relevant