Requirements for multimedia markup and style sheets on the World Wide Web

Jacco van Ossenbruggena, Anton Eliënsa, Lloyd Rutledgeb and Lynda Hardmanb

aVrije Universiteit, Faculty of Mathematics and Computer Sciences,
De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands and

P.O. Box 94079, 1090 GB Amsterdam, The Netherlands and

The unequivocal acceptance of Cascading Style sheets by all major browser vendors clearly indicates the need for separating content and style issues in HTML documents. However, in search of ever-richer content, providers rely on platform-specific plug-ins and browser-specific extensions such as dynamic HTML and scripting to achieve the desired (multimedia) functionality. We consider such an ad-hoc approach to delivering multimedia content as inadequate, and henceforth propose a multimedia document markup language (SMIL, which has recently been submitted to W3C) and discuss the requirements for an associated style sheet language supporting temporal and spatial constraints, hyperlinking among continuous media objects and adaptivity with respect to Quality of Service. In this paper we propose extensions to current style sheet languages to meet the requirements imposed by the capability to specify the presentation of multimedia content in a declarative manner.

Synchronized hypermedia; Style sheets; SMIL

1. Introduction

The advantages of separating structure from style and layout issues are well known. Authors can quickly adapt their documents by applying an alternative style to the same document, without the need to edit the document itself. Additionally, reuse of style sheets leads generally to smaller documents that are easier to maintain, and faster to download. These advantages could, in theory, be applied to multimedia presentations as well. Both HTML and CSS are, however, geared towards page-based output, and are not suited for defining the synchronized presentation of hyperlinked media items — for which we use the term hypermedia.

Our group participated in the development of SMIL, the Synchronized Multimedia Integration Language [2]. SMIL has been developed by the W3C Working Group on Synchronized Multimedia (SYMM [3]) to provide a declarative, open and platform independent format to disseminate multimedia presentations over the Web.

The focus of SMIL is on the temporal scheduling of the individual media objects. In contrast to HTML, whose document model is based on a spatial hierarchy, the document model of SMIL is based on temporal composition. A SMIL document describes a tree of parallel and sequential elements, which can have additional attributes to define more precise synchronization constraints.

In the following we discuss the requirements for a style sheet language for hypermedia document formats such as SMIL.

2. Requirements for hypermedia style sheets

The requirements of style sheet languages for hypermedia documents such as SMIL differ from more traditional text-based languages. While style sheets are expected to be applicable to SMIL documents, current style sheet languages such as CSS, XSL and DSSSL do not meet these requirements. A style sheet language needs to be able to define a mapping from a hypermedia document model onto the output model of the play-out environment. Due to their page-based output model, current style sheet languages are unable to satisfy the following fundamental hypermedia style requirements:

2.1. Temporal constraints

In addition to the spatial dimensions of text, hypermedia contains another, temporal, dimension. SMIL documents define a hierarchy in which media items are grouped in parallel or sequentially synchronized elements. The elements can define more refined temporal constraints to define hard and soft synchronization, to introduce delays, to synchronize begin and end times, and to define looping behavior. Current style sheet languages do not support the specification of such temporal relations because these cannot be expressed within the page-based output models which include only spatial dimensions.

2.2. Spatial constraints

The specific spatial layout of the media items of a hypermedia document requires a layout model other than the text-flow model. Resizing the browser's window for example, should not result in re-formatting the document to generate a new text-flow, but result in resizing of the media items playing. This may involve constraints to preserve aspect ratios or cropping of media items which do not support scaling. Because their output models are based on a flow model, the layout requirements of hypermedia documents cannot be specified by CSS1, DSSSL or XSL. A proposal for explicitly positioning HTML elements on the browser window has recently been developed as part of CSS2 [1]. To avoid requiring style sheets to specify the position of every individual HTML element in the document, positioning information is inherited down the document tree. This inheritance mechanism works for HTML documents because the logical hierarchy specified by the document tree (more or less) mirrors the spatial layout hierarchy of the renderer. For multimedia, the assumption that the logical grouping of a document is based on the spatial hierarchy of the presentation does not hold. Inheritance of spatial layout properties down a tree which reflects the temporal structure leads to unexpected and potentially unwanted results.

2.3. Hyperlinking within multiple active media streams

In hypertext, a window usually displays a single document where, on traversal, the destination of a link replaces the complete source document, or is displayed in a new window. In hypermedia documents, multiple streams of media items might be active simultaneously and link traversal should not necessarily affect all of them. As a consequence, links have to define what their context [4] is in terms of their source (i.e. which of the currently active streams are affected) and in terms of their destination (i.e. which of the streams of the destination will be activated). Synchronization constraints between objects belonging to the streams involved might further complicate the link processing. Link traversal might lead users to a point somewhere in the middle of another document. This might involve fast-forwarding the presentation to start it at the right moment, and such behavior should be expressible in a style sheet language.

2.4. Adaptive environments

In adaptive environments, the style sheet conversion might also be based on information about how to adapt to user characteristics (e.g. level of expertise) or changing system resources (e.g. network bandwidth). The hypermedia style sheet could be used to indicate how to deal with limited resources (by specifying alternatives or QoS negotiation protocols) on different platforms, thus making the document source independent of platform specific details.

3. Conclusions

Since CSS is a style language tailored to HTML documents, we consider CSS to be too limited to be applicable to more complex hypermedia documents. The applicability of XSL and DSSSL to hypermedia is limited by their page-based output model (ee Table 1). Despite their limitations, we think both XSL and DSSSL can provide the fundamentals for a future hypermedia style sheet language. To overcome the limitations described above, we propose a number of extensions.

Table 1. Style sheet features
Language ModelTemporal positioningSpatial positioningLinking
CSS1 page-based direct display relative to text flow anchor characteristics
CSS2 page-based direct display idem + absolute positioningidem
DSSSL page-based flow tree idem + absolute positioningidem + adding new links
XSL page-based flow tree + HTML/CSS idem + absolute positioning idem + adding new links
XSL/SMIL flow tree extended by SMIL SMIL sync primitieves CSS + SMIL basic idem + adding new links


First, a set of hypermedia output objects needs to be defined as an extension to the DSSSL and XSL flow object tree. This extended tree should be able to model the temporal structure of a hypermedia presentation as well as its spatial layout. Additionally, it should extend the linking output model of XSL in order to provide for hyperlinking within multiple synchronized streams of continuous media. Secondly, we need an extension to the DSSSL and XSL query language which provides access not only to the input document, but also to the run-time play-out environment and to a profile of the current user.

To provide an initial version of such an extension, we propose to add the core synchronization elements of the new SMIL format to the DSSSL and HTML/CSS core objects of XSL, as listed in the last row of table 1. This will provide a simple hypermedia output model for SMIL documents and XML document types other than SMIL. It will enable XSL implementations to apply style definitions to these documents and generate synchronized hypermedia output which will be readily playable on future Web browsers.

This extension, though, has some disadvantages. First, both the XSL and SMIL formats are in a very early stage of development, and it will take a considerable amount of time before the specifications of these formats will stabilize and the formats themselves will be implemented. The same holds for the features which will be used for defining the spatial layout mechanisms. Secondly, the current SMIL proposal provide only limited support for hyperlinking. Explicit specification of link contexts, as advocated in [4] is not yet supported.


B. Bos, H.W. Lie, C. Lilley, and I. Jacobs, Cascading style sheets, level 2, November 1997, Work in progress; W3C Working Drafts are available at

S. Bugaj, D. Bulterman, L. Hardman, J. Jansen, R. Lanphier, N. Layaida, J. Marsh, A. Rao, L. Rutledge, W. ten Kate, J. van Ossenbruggen, M. Vernick, and J. Yu, Synchronized Multimedia Integration Language (SMIL), November 1997, edited by Philipp Hoschka, Work in progress; W3C Working Drafts are available at

W3C SYMM Working Group, W3C activity on synchronized multimedia; more information at

L. Hardman, D.C.A. Bulterman, and G. van Rossum, Links in hypermedia: the requirement for context, in: Proceedings of ACM Hypertext '93 (Seattle), ACM, November 1993, pp. 183–191.