Gateway to Educational Materials (GEM): metadata for networked information discovery and retrieval

Stuart A. Sutton

Syracuse University, School of Information Studies
Syracuse, NY, U.S.A.

Enhanced access to educational materials on the Internet for the nation’s teachers and students is one of President Clinton’s second-term goals. In pursuit of this goal, the National Library of Education (NLE) identified lesson plans and teacher guides as a critical area in which library and information science expertise should be applied in order to improve the organization and accessibility of large collections of educational materials that are already available on various federal, state, university, non-profit, and commercial Internet sites. The U.S. Department of Education and the NLE charged the ERIC Clearinghouse on Information and Technology at Syracuse University with the task of spearheading a project to develop an operational framework to provide the nation’s teachers with “one-stop/any-stop” access to this vast pool of educational materials. The goal of the Gateway to Educational Materials project (GEM) is to achieve this goal through development and deployment of a metadata element set and accompanying procedures for its use.

Metadata; Dublin Core; Educational objects

1. GEM project objectives and element set

The four major objectives addressed by the GEM project were to: (1) define a semantically rich metadata profile and domain-specific controlled vocabularies necessary to the description of educational materials on the WWW; (2) develop a concrete syntax and well-specified practices for its application using current HTML specifications; (3) design and implement a set of harvesting tools for retrieving the metadata stored as HTML meta tags; and (4) encourage the design of a number of prototype interfaces to GEM metadata.

From the outset, GEM developed around emerging standards for networked information discovery and retrieval (NIDR). The Dublin Core Element Set (DC) became the base referent for the GEM element set. One of the underlying assumptions of the DC founders was that it would be extensible in two fundamental ways: (1) additional elements could be added to meet the needs of particular domains, and (2) its elements could be enriched through the use of a broad range of qualifying “schemes” and “types” (the “Canberra Qualifiers” [6]). The GEM element set is an example of both of these extensions.

A GEM package of 8 elements was added to the 15-element DC package: (1) Audience, (2) Cataloging Agency, (3) Duration, (4) Essential Resources, (5) Educational Level, (6) Pedagogy, (7) Quality Assessments, (8) Academic Standards. In addition, a number of GEM controlled vocabularies and form schemes were defined. Many of the elements have an array of types that modify element semantics. The Quality and Standards elements may exist independent of the descriptive metadata for a resource thus permitting third-party agencies with appropriate expertise to handle quality assessments and standards mappings.

2. Creating GEM metadata

While GEM metadata can be created with any text editor, a publicly available metadata-generating module called GemCat was developed to ease the process of creation by making it possible for the cataloger to focus solely on content. Currently implemented for Windows 95/NT, a cross-platform Java version of GEMCat is nearing completion.

In GEM, the storage of metadata is handled in one of two ways. First, where the resource being described is an HTML-tagged document, the GEM metadata can be saved within the resource as meta tags. Where internal storage of the metadata is either not possible or undesirable, it can be saved as meta tags in a separate HTML document that references the resource being described.

3. Syntax

Since HTML currently rests at the heart of the web, the GEM Working Group focused on both its evolution and on other relevant initiatives of the World Wide Web Consortium. The changes in HTML’s ability to effectively accommodate richly structured metadata through meta tags has been chronicled elsewhere [6]. From the beginning of the DC dialog, it has been recognized that only the simplest implementations of metadata can be accommodated effectively by HTML 2.0 meta tags (see [6]). Given the HTML 2.0 limitation of relevant meta elements to NAME and CONTENT, there was no other means for dealing with additional information (e.g., “schemes” and “types”) other than through what is called “overloading content” [6] of the form: <META NAME=“GEM.subject” CONTENT=“(SCHEME=GEM) (TYPE=levelOne) Science”>. The first generation syntax of GEMCat was based on content overload.

HTML 4.0 comes close to eliminating the content overload problem through the addition of a SCHEME element. When this addition is combined with the appending of “type” information to the NAME value, content overload is eliminated. The following GEM metadata example illustrates the integration of “scheme” and “type” information in HTML 4.0 meta tags:

<META NAME=“DC.subject.levelOne.1” SCHEME=“GEM” CONTENT=“Science”>
<META NAME=“DC.subject.levelTwo.1” SCHEME=“GEM” CONTENT=“Biological sciences”>
<META NAME=“DC.subject.levelTwo.1” SCHEME=/GEM” CONTENT=“Life sciences”>
<META NAME=“DC.subject.levelTwo.1” SCHEME=“GEM” CONTENT=“Technology”>

The GEM Working Group is watching closely the World Wide Web Consortium’s work on both the Resource Description Framework (RDF) (see [4]) and Extensible Markup Language (XML) (see [2]). Both of these initiatives promise a rich structural environment for GEM metadata and mark the migration path for GEMCat.

4. Distributing GEM metadata

As the web matures as a publishing environment and generally accepted metadata schemes serving specific subject and practice domains evolve, the existing (and future) WWW crawling programs such as Alta Vista, Excite, InfoSeek, Lycos and Webcrawler will provide increasingly efficient and effective access to information. In addition to these general retrieval services, a number of services fashioned to meet the needs of specific domains are also emerging [3]. Readily available tools such as Harvest make local “harvesting” of metadata possible and its extension to multiple web sites serving a specific community has been demonstrated [1]. Based on these models, GEM metadata can be distributed through two mechanisms: (1) through future harvesting by general purpose web crawlers, and (2) through harvesting of select repositories by means of a GEM harvester. The result of the latter is the GEM Union Catalog (GUC) that provides access to the collections of a consortium of “high integrity” repositories.

The rationale for the GUC can be found in the following observation of Lagoze, Lynch and Daniel in their exploration of issues surrounding the Dublin Core (1996, p. 6):

“[T]he use of the Dublin Core in a limited context might produce very positive results. For example, assume a set of ‘high-integrity sites’. Administrators at such sites might tag their documents . . . with Dublin Core metadata elements using a set of well-specified practices that include relatively controlled vocabularies and regular syntax. Retrieval effectiveness across these high-integrity sites would probably be significantly better (assuming harvesting and retrieval tools that make use of the metadata) than the unstructured searches available now through Lycos and Alta Vista.”

The growing GEM Consortium is just such set of “high-integrity sites.”

5. Resource discovery

In the project’s first phase, two prototype interfaces to a test database of GEM metadata were developed. At Syracuse, a search and browse environment called GemAccess was built using PLWeb, a full-text, relevance ranking search engine by Personal Library Software. At the University of Washington, a relational database driven interface was developed.

6. Conclusion

As the World Wide Web grows exponentially, discovery and retrieval of useful educational materials grows more problematic. The GEM project seeks to meet the needs of educators, students and parents through development and wide deployment of the GEM standard in the form of a metadata element set, an accompanying array of controlled vocabularies, and a well-defined set of practices in their application. The developmental work of the first phase of the project is largely complete. Full scale application of GEM by Consortium members has begun.


[1] Beckett, D. and N. Smith, The ACademic DireCtory–AC/DC, Ariadne, 1996,

[2] Bray, T., J. Paoli, and C. Sperberg-McQueen, Extensible Markup Language (XML): W3C Working Draft 07-Aug-97, 1997,

[3] Dempsey, L., Meta detectors, Ariadne, 1996,

[4] Iannella, R., Application of RDF for extensible Dublin Core metadata, 1997,

[5] Lagoze, C., C. Lynch, and R. Daniel, Jr., The Warwick framework: a container architecture for aggregating sets of metadata, TR96-1593, 1996,

[6] Weibel, S. and R. Iannella, The 4th Dublin Core Metadata Workshop Report, D-Lib Magazine, 1997,


Dublin Core Home Page. URL:
ERIC Clearinghouse on Information and Technology. URL:
Gateway to Educational Materials (GEM). URL:
Personal Library Software. URL:
U.S. Department of Education. URL:
U.S. National Library of Education. URL:
World Wide Web Consortium. URL: