Author: Lobin, Henning [profile]; Rehm Georg [profile]
From Open Source to Open Information: Collaborative methods in creating XML-based markup languages
Until the beginning of the last decade, the Internet was primarily used by scientific, educational, and
military organisations for the exchange of information such as data files and electronic mail. The
advent of the
easy-to-use hypertext system World Wide Web has, however, begun a new era of the world-spanning
computer network.
In our paper we examine a part of the Information Marketplace (Dertouzos,1997) that will give users of
the World Wide Web a wide range of new possibilities for gathering information, a task that is
predominantly carried out using search engines.
One major property of search engines is the lack of semantic certainty that results from both the
absence of structure of the indexed documents as well as insufficient methods of information
extraction and information retrieval.
When using a search engine, a user is almost always confronted with hundreds or thousands of
documents but real relevance regarding the keywords entered is not necessarily given.
The aforementioned lack of structure in Web documents will be overcome in the next few years by an
augmented use of XML and a simultaneous turning away from HTML that only allows a rather coarse
annotation of textual elements such as headlines, tables, or paragraphs.
However, the new structural variety and liberty of XML bears the dangers of the continuous re-
invention of the wheel: As XML allows for a free definition of concrete markup languages like HTML, a
lot of proprietary XML-based annotation schemata will emerge that will make the process of automatic
information extraction by search engines difficult. A large part of the success of the Internet and
especially the World Wide Web is based on the standardization of markup languages.
In our paper we will outline a development that will counteract this XML babel. The main reason for
this development is a paradigm in software development which has been successful for almost 20
years now. This paradigm, called Open Source (diBona et al., 1999, Raymond, 1999), made possible,
among other things, the free operating system Linux; it will give also the deployment of quasi
standardized XML-based markup languages new anddecisive impulses.
These impulses will result in what we want to call Open Information. Our paper will at first give a brief
introduction by highlighting the current state of the art in annotating information for their use on the
World Wide Web.
Besides XML we will look at new standards that have emerged lately for making metainformation more
explicit and for the building up of conceptual hierarchies.
Subsequently, we will take a look at the roots, the motivation and the current understanding of the
phenomenon of Open Source.
The main part of the talk will combine the paradigm of Open Source software development with the
collaborative creation and maintenance of XML-based markup languages and annotation schemata for
Electronic Publishing.
References:
Dertouzos, Michael (1997): What will be. New York: HarperEdge.
DiBona, Chris; Ockman, Sam and Stone, Mark (eds.) (1999): Open Sources: Voices from the Open Source Revolution. Cambridge etc.: O'Reilly.
Raymond, Eric S. (1999): The Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary. Cambridge etc.: O'Reilly.