XML and RDF: the promise and the reality of new Web architectures


Jon Mason (Office of the AVC-IT, The University of Melbourne/EDNA)


Chris Blackall (University of Melbourne)


Brian Denehy (Australian Defence Force Academy)
Renato Iannella (DSTC Pty Ltd and University of Queensland)
Carl Lagoze (Cornell University)
Stuart Weibel (OCLC Office of Research)

The Internet and the World Wide Web have provided a dynamic and unprecedented global information infrastructure. This continues to develop with new networks proliferating and many established with intentions of fostering collaboration and co-operation. An exemplar in this respect is Education Network Australia (EdNA). Facilitating this has been the initial simplicity of Web architectures, the HTML standard and the HTTP protocol, which have both significantly lowered the entry level for publishers and information users on the Internet. The relative ease of creating HTML documents and serving them to a global audience has been a revelation to many users and has been a major factor in the phenomenal growth of the Web. However, this relative simplicity of Web architectures is set to change. The World Wide Web Consortium (W3C) is developing a new generation of Web standards and architectures aimed at improving its capacity for delivering effective resource discovery — but at the same time, is set to radically change the Web as we now know it (see W3C Data Formats by Tim Berners-Lee at the W3C homepage).

Clearly, the impact of the Web on information infrastructure has been dramatic since its inception. But before the holy grail of smooth and transparent access to networked information resources even on a national scale can be achieved, however, a number of major problems need to be addressed and solved. For example, within the Higher Education sector where the convergence of library information systems and I.T. is occurring at a rapid rate, these problems are clearly not just technical. Neil McLean, chief librarian at Macquarie University, has recently said:

One of the most critical issues, in terms of both organising and sharing information resources in the electronic environment, is to determine the size and nature of the integration necessary to offer optimal levels of service. The library world has always been driven by a curious mixture of autonomous initiatives and bold co-operative ventures. The opportunities and threats inherent in our current environment are powerful incentives to rethink the nature of the service paradigms, the practical parameters for delivery and the organisational/political alliances necessary to fund and operate the services. (McLean, Global Access To Scholarly Information: The Quest For Sustainable Solutions, Macquarie University, Australia, 1997.)

The panel will discuss the impact of the new W3C architectures, in particular, XML (eXtensible Markup Language), a generalized descriptive markup language which will significantly extend the functionality of the Web, and the Resource Description Framework (RDF), an application of XML for describing metadata that will assist resource discovery and Web site management. The XML standard and the "family" of XML applications, such as RDF, promise a major evolutionary step in the progress of the Web, and are being greeted enthusiastically in many quarters. But will the new standards deliver on the promise? Perhaps it will assist in delivering more effective resource discovery but will the extra functionality that is provided be useful to entry level publishers on the Web? The new Web architectures will help solve some of the shortcomings of the first generation Web systems, but will they overcomplicate the Web for many users and publishers? Will the solutions provided by these new developments also bring new problems overwhelming many who have adopted the Web primarily because of its simplicity? Who will really benefit from the new architectures? The XML learning curve: who can afford to climb it? What will first generation HTML publishers need to know to take advantage of XML? What about legacy first generation HTML files and sites? Who will build the XML tools? How will HTML and XML publishers best integrate metadata into their publications? What metadata standards should be used and where should the metadata reside? How will the Web facilitate meaningful collaboration beyond basic communications to resource sharing and development?

These are just some of the questions the Panel will explore when they discuss the important issues, challenges and opportunities and that the next generation of W3C architectures offer the Web community.

Panelists have been selected for both their expertise and contrasting viewpoints in this area and the debate promises to be a must for anyone wishing to come to grips with the range of issues concerning the next generation of Web publishing and resource discovery.

Take-home message of the panel

A more complex and powerful Web architecture doesn't have to mean a less user-friendly or functional Web.
New W3C architectures, i.e. XML and RDF, offer a great deal to new and existing members of the Web community. For major Web publishers, the transition from first generation HTML to XML will require careful planning and consideration. XML is great for Web developers and programmers. For "newbie" Web publishers and markup tag-shy users, XML will allow for more sophisticated Web development tools and applications that will hide the complexity of XML (and even HTML). Metadata and RDF will make the Web much easier for information retrieval.


Brian Denehy (Information Services Division, Australian Defence Force Academy)

Position statement: Just as research libraries can not function without the organising work of the catalogue, so the Web requires organisation which will suit the different individuals seeking to use it. Finding a balance between ease of use for the author during creation and fine discrimination in retrieval for the intended audience is a task which requires more support from both the underlying protocols and infrastructure and the authoring tools. Those who ignore the effort which enables description of their Web documents face the high likelihood of rapid disappearance of those documents in the vast sea of retrievals.

Biographical information: Brian Denehy is a senior member of the Information Services Division at University College, Australian Defence Force Academy. He has been or is currently involved in a number of Australian initiatives concerning networked information infrastructure, including the Australian Vice-Chancellors' Committee's working party on Electronic Publishing, the Metaweb and Pandora projects, and other AARNet and AARNet2 information infrastructure related projects. He is currently concerned with resource discovery and authentication issues, as well as caching and mirroring of network resources.

Renato Iannella (DSTC Pty Ltd and University of Queensland)

Position statement: Metadata – Interoperability is the critical issue. We need mechanisms for heterogeneous metadata to exist and to be semantically interchanged. These include the need for: metadata schema registries; schema inheritance and extensibility; support internationalisation; and standards for metadata repositories. XML will bring better structure to the Web. It is desperately needed! RDF will exploit XML and provide a syntactic level of interoperability. The next level is semantic. Metadata communities on the Web need to progress metadata to a high level of semantics — not just the element set — but the content/values of those elements.

Biographical information: Renato Iannella is a Senior Research Scientist at the Distributed Systems Technology Centre (DSTC) based in Brisbane, and is Leader of the Resource Discovery Unit which investigates technologies used in the discovery, access, and retrieval of electronic resources on the Internet. Renato is an active contributor to World-Wide Web Consortium (W3C) working group on Metadata (the Resource Description Framework) and the Internet Engineering Task Force (IETF) working group on Uniform Resources Identifiers. Renato also consults to Government agencies on Whole-of-Government search architectures and strategies.

Carl Lagoze (Cornell University)

Biographical information: Carl Lagoze is the Director of Digital Library Research in the Computer Science Department at Cornell University. In this capacity, Mr. Lagoze leads a number of digital library research efforts. NCSTRL is an internationally distributed digital library of computer science technical reports and research publications. The technical basis of NCSTRL is Dienst, a protocol and system for distributed digital libraries, which was developed by Mr. Lagoze with Jim Davis of Xerox. Mr. Lagoze is internationally recognized for his work on metadata standards and frameworks, most notably on the Dublin Core and Warwick Framework. Current research work includes design and prototyping of digital library object repositories, protocols for meta-searching, and techniques for distributed searching and resource discovery. Mr. Lagoze has spoken internationally on these research topics and is an active participant in a number of digital library standards efforts.

Stuart Weibel (OCLC Office of Research)

Biographical information: Dr. Weibel is leader of the Dublin Core Metadata Initiative, an international and interdisciplinary effort to build consensus around the semantics necessary to support discovery of electronic resources. The workshop series that support this initiative has focussed the efforts of several hundred participants from sixteen countries on four continents on this task, resulting in a model that is being tested in more than three dozen pilot projects in ten countries. The effort has helped to marshal support for a metadata architecture (the Resource Description Framework) that will serve many other metadata description models as well.


Chris Blackall (Office of the AVC-IT, The University of Melbourne/EdNA)

Biographical information: Chris Blackall is a Project Officer for Education Network Australia (EdNA), a national cross-sector Web-based educational Directory Service. EdNA uses the "Dublin Core" metadata standard to assist the automation of directory management and cataloguing services. Chris is based at the University of Melbourne where he has also previously held I.T. management and training positions.