10.1993b FYI France Essay :


December, 1996

Internet Digital Libraries

by Jack Kessler, kessler@well.sf.ca.us

For Connexions: the Interoperability Report,
v. 10 n. 12, December, 1996, p. 2, available from the Connexions Archive in elegant .pdf at,



Anyone -- certainly any printed journal -- in possession of an archive and an information channel as rich as those of Connexions, will be looking nowadays at both print and digital media. Text and images and many other things besides all can be stored and searched and retrieved and used now, in print and online and in perhaps too many different ways.


Digital Libraries and the Babel of the Internet

The Digital Libraries term is in use today to describe projects which come in a variety of flavors: "about 400,000" entries for it in Internet indexes like AltaVista [1] justify some curiosity as to just what the Digital Libraries term actually means:

These four categories themselves might be multiplied endlessly. There are that many Digital Libraries projects -- both so-called and as named by others -- which are operating already and which rapidly are growing in number. There are Digital Libraries in Thailand and Australia and Japan, and Digital Libraries projects under way ranging from local efforts to organize slide collections to international work on the collections of the Vatican Library. [7]

The effort to understand what all of this has in common is the effort which interests me personally the most. There is an old saying in academia that "If something is everything then maybe it's nothing." This has become increasingly applicable to many aspects of the digital information revolution.

The range of interests in digital information is in fact enormous. The World-Wide Web is not the whole Internet, and the Internet is not even William Gibson's entire Matrix; HTML is not the whole world of marked-up text, and SGML is not the entire picture in the digital representation of information. There are so very many projects, so many of them unaddressed by current standards efforts and even by current terminology. There is a growing, crying, need to discriminate and to define. There seems to be a real need for defining something -- some element in common, hopefully -- at which all of the various Digital Libraries efforts are trying to grasp. The term never would be in use.in common, in so many different places -- if a lot of people did not at least sense that there was some common good purpose which might be served.


Data versus Information

To define this general Digital Libraries purpose better, another distinction is useful, that between data and information: information as data organized and presented so that users can use it -- information as the bits with knowledge added, data as just the bits without the brains.

The outstanding current digital information problem arises from the fact that today's Internet, to today's newest digital information user -- the general public -- is just data: just the raw material without enough "value added," just the bits without the brains. If information is defined as bits which have been organized and presented "so that users can use it," then today's information-overloaded, on-again/off-again, hypertext loop-plagued Internet, is not "information": at least it is not very usable to the new general public users, who never have been online or even near a computer before.


New markets: Users, Clients and Customers

This was not so much of a problem back when all Internet users were engineers possessing vast computer knowledge and experience. But since the "acceptable use restriction" gloves were taken off, in 1992, a whole new general public market -- that 63% of US households which still do not have a computer, and even a sizable majority of the house¬holds which do -- has begun to discover the delights and confusions of online digital information. Organizing and presenting for this general public market is a whole new exercise, very different from the same effort formerly aimed primarily at computer engineers.

Digital Libraries currently, then -- the 1) Systems projects, the 2) Computer projects, the 3) Information projects, and the 4) Library projects -- all might best be viewed in light of their responses to this latest challenge, that of converting online digital data into inform¬ation for this new general public group of users. This is not so much the older problem, any longer, of converting other types of information into data; nor is it the problem of the "storage and searching and retrieval" of that data. It is the problem of getting the data -- eventually, somehow -- into information formats acceptable to the entirely new class of general public users.

The "value added" needed for general public users is less technical: "images" and "links" and "sound" of course, rather than just dumb printed text installed online.but it has less to do generally now with the technology, on which so much successful work already has been done, and more to do with the users themselves, with the approaches and psychology of sales and marketing and customer service and professional assistance.

One principal architect of a leading Digital Libraries project refers to the "glue" which holds digital information together (Stanford's Terry Winograd) [8]. This glue no longer is one of the high-grade, overly-sensitive, and very-expensive adhesives of the Internets earlier test-bed era. Now, in the coming decade of America Online and Network Computers and Netscape and ©Home and TCI and Viacom, we are talking Elmer's -- lots of it.

The shift has been from research and applications development which used to focus upon the technology, to research and applications development which now focuses upon the users. Technology enthusiasts should be happy. Xerox PARC [9] has been preaching for some time that digital technology would become successful only as it succeeded in becoming 1) ubiquitous, 2) inexpensive, and 3) invisible: i.e., found everywhere, assumed to be useful, taken for granted, like the telephone and the television and the toaster. The day has arrived, for the general public, at least.

It makes basic marketing sense. You can sell more units and services to more people this way. You can lower margins and raise volume, and realize the marketer's magic economies of scale.


Something new: The un-interested General Public User

The key to any marketing, though, is to understand the user -- the client, the customer -- thoroughly. This was not so much of a problem back when digital information was used only by engineering students and professionals. Back then there was a single user profile, and one which was fundamentally friendly toward the technology. Most engineers, faced with a computer and an information system, were fascinated, and wanted very much to learn to use it all no matter what it did.

This is not the case now, though, with the new general public users. They only want the "information". Yesterday's Internet users were in love with the idea of the Internet, nearly regardless of what information -- or data -- which it might or might not contain. Today, that interest no longer is there. Today's users do not "want to know how the car works," they just "want to drive it."

It is not that general public users are less intelligent, or even less-educated, than their computer engineering forebears were. It is just that they have other interests: car repair, stamp collecting, changing a diaper, going to the beach -- they are busy with those. Since the demise of acceptable use restrictions, online digital information increasingly is having to address an entirely new phenomenon: the un-interested user. Today's Digital Libraries -- all of them, the 1) Systems projects, the 2) Computer projects, the 3) Information projects, and the 4) Library projects -- are designing for this un-interested general public user, or at least the more up-to-date ones among them are.


"R" going one way, "D" going another?

This has distressed some members of the online digital information community. It may even have caused a rift. There are plenty of sophisticated applications under development -- vast numbers of high-bandwidth and otherwise-expensive ideas and projects -- which would be entirely derailed by a total migration of online digital information to a world populated only by America Online and Network Computers and Netscape and @Home and TCI and Viacom.

As UC Berkeley's Stephen Cohen says, "People forget that in *R&D', companies don't do the 'R', they only do the 'D'" [10]: people also forget that we would not have the "D" -- that present world of America Online and Network Computers and Netscape and @Home and TCI and Viacom -- if it had not been for the Internet testbed "R" which preceded it.

Proposals under way for a high-speed, research-oriented Internet II [13] -- already being called "Son of Internet" by some, "Grendel" by others -- indicate that what may emerge may be an online digital information split, with high-speed research applications going one way, and the less-expensive and less-capable general public market going the other.

But Digital Libraries can help in the higher-end efforts as well. The MBone, one of the more promising high-bandwidth transmission projects, already has its catalog -- of past and future transmissions -- under development online, and its archive, and all of the attendant problems of the categorization and classification and indexing and abstracting and search and retrieval of same by users [11].

Similar questions come up with the Internet's URNs and URLs and domain names, and with the proliferating SGML DTDs, and with the W3 indexing META hidden tag system which looks increasingly like the old MAchine Readable Cataloging/MARC format used for years for printed books.

These -- catalogs and archives and categorization and classification and indexing and abstracting and search and retrieval -- all are traditional library questions. They were questions asked in the past about illuminated manuscripts and about printed books, and they are being asked now about digital "documents" and online information. They have less to do with the "digital" side of the Digital Libraries equation, than they have with the fuzzier, less clearly-defined, "libraries" side.

They have to do with users, and with what it takes to make data usable to a user as information: whether this is data recorded in ink on parchment or registered as bytes in a bitstream, and whether the eventual information is to be used by "high-tech researchers" or by members of the general public.

Digital Libraries, then, is somewhat of a misnomer. As used currently, it describes too many things. But the term describes so many things that it must describe at least some one thing which they all have in common. That something, I suggest, is the conversion of data into information, the latter being "whatever is useful to the current group of users."

One great challenge of the 1990s is that this "current group of users" suddenly has exploded out to include not only the traditional specialists but also a general public which is un-interested in the underlying technique. Digital Libraries methods for coping with this challenge can help meet the demands of more technically sophisticated and/or interested users as well.


Something else new: the international un-interested General Public User

One other great late-1990s challenge for digital information, then, is that users very suddenly are located in many nations around the world.

There are radio modems in use suddenly in Cambodia; China is online; Mozambique just acquired connectivity; the Internet's 9+ million host count jumped to 12+ million just in the first six months of 1996 [12], and great percentage growth figures now may be found outside the US. I send out an e-journal myself, every month, to readers in over 70 countries, which reaches all of them in milliseconds: a publication and distribution miracle which must have W.H. Smith and W.R. Hearst and H.R. Luce all turning in their graves -- such broad distribution was their great dream, too.

The great challenge of the coming decade is going to be that of ensuring not that all sorts of users everywhere will have access to digital information -- this problem rapidly is being solved now -- but that the inundation of such information will not be such that users split into "high-end" and "low-end," "digital knowledge" and "digital ignorance," "the digitally empowered" and "the digitally disenfranchised."

It all will be digital: "print" already is -- "photography" and "sound" and "TV" and "cinema" and "telephony" all are getting there. There are many remaining technical challenges and problems, from distributed processing and the scalability of high-speed transmission, to multilingual techniques and the development of object-oriented programming and relational databases. But increasingly now the key problem is not how to digitize, but how to organize and present whatever is digitized to users. This is the fundamental problem-in-common to which all Digital Libraries are dedicated.


Internet Digital Libraries: The international dimension

There is a lot of Digital Libraries work under way now, in a lot of places, all of it filling at least the four categories of 1) Systems projects, the 2) Computer projects, the 3) Information projects, and the 4) Library projects suggested here. I just now am publishing a book which tries out a beginning overall view:

The book does not address the Digital Libraries problem so much theoretically or philosophically, as I have done a bit here, as it does internationally: it is filled with examples of current Digital Libraries work under way in places like Chiang Mai, Thailand, and Surabaya, Indonesia, and Lyon, France.coupled with the suggestion, implied throughout and declared directly whenever I can, that the develop¬ment of any Digital Libraries solutions now will be/will have to be internationalist in its approach.

A great deal has been written on the general subject, however. There are some such references given in my book. Other materials by me and by others, and references and live links and nice pictures and even bibliographies and resource lists, may be found at:

The general problem of Digital Libraries has been dealt with recently by thinkers and writers as diverse as Blaise Cronin, Michael Buck-land, Walt Crawford, Clifford Lynch, Kenneth Dowlin, Michel Melot, Jesse Shera and Wilfred Lancaster.

If you embrace the broad definition which I am encouraging.that the conversion of "data" into "information" is what is involved here -- there are resources of interest to you about the 15th century transition from whatever-preceded-it to print (Elizabeth Eisenstein), and about any time and place where one mode of expression has been succeeded by or even simply influenced by another (Walter Ong, Jack Goody, Pierre Levy, Marshall McLuhan, Henri-Jean Martin, Erich Auerbach, etc., etc...).

There is no real need to reinvent the wheel on all of this. It has not all been invented already in the past and elsewhere, but there is a lot which can be learned -- and money which can be saved -- by looking. Digital technology may be new, but its newest un-interested general public user has not changed much in a very long time. The Digital Libraries problem no longer is whether to choose the new digital medium over traditional print, but to choose it, and use it, effectively for the users.



JACK KESSLER, has been an Internet Trainer and Consultant since 1991. He studied philosophy, politics, law, and library and information science at Yale, Oxford, and the University of California, and spent 15 years in the importing business in East and South Asia, the US and Europe. Since 1992 he has edited FYI France, ISSN 1071-5916, a monthly e-journal of digital information news about France and Europe. He writes frequently on subjects ranging from the Minitel and the Internet and W3 to libraries and librarianship, printing history, and the information professions. Beginning in 1997 he will edit the FYI France Online Service, http://www.fyifrance.com (already online), an experiment in combining W3/passive and e-mail/active marketing techniques, in the commercial provision of online information service. Nowadays he is nearly impossible to reach except via e-mail to: kessler@well.sf.ca.us






