Jack Kessler, kessler@well.sf.ca.us

April 15, 2007 issue. This file presents an archive copy of the issue of the FYI France ejournal, ISSN 1071-5916, which was distributed via email on April 15, 2007.

From this point you can link / jump up to the main page for,

The European Digital Library, europeana.eu


An initial installment in the French & European response to Google's Digital Library effort now is online, at,

and it is impressive: [tr. JK]

-- there even is a "Ma bibliothèque" feature, the little green box in the upper right corner, offering,


The initial interface is simple and clear: a "Onebox", in fact, for the intuitively-inclined --

-- via which "kessler" currently pulls up fully 30 short-format entries, 10 per page -- there were lots of Kesslers and Koestlers and Keszlers and Köstlers, back in 19th c. Hungary, and the Széchényi must offer quite a few of those, mostly normalized to "Kessler" -- one Keszler too though, I see, plus 10 Koestlers -- and "Köstler" interestingly pulls up one "LE COMTE KOSTIA", one "Comte Kostla", a "Kôstler", and a "Der Kirchun Josu kostl".

So I suppose the Onebox is looking at everything OCR'd, then: very useful for certain types of searching, but I wonder how that will scale up to a really large collection?... searching the whole BnF's digitized fulltext, bibliographies and footnotes and all, non-indexed, on a string such as *hugo*...

The entries retrieved offer a promising-looking term frequency statistic -- also, beneath that, a relevance ranking composed of slider-bar plus percentage, i.e.,

-- although I can't figure out why that last entry which has 7 "kesslers" in it is ranked #8 while the previous two, which have only 2 or 1, are ranked higher, or how the item ranked #1 came to contain "100% kesslers"... They're using Apache Lucene: "Le document disposant du score le plus élevé obtient une pertinence de 100%", they say... But I suppose someone is checking all this out, and that there is some sort of logical explanation for all of it: gotta get the ranking-algorithm right, so I hope they have.

The site offers full explanations, and fascinating background as to the procedures followed and choices made so far, at,


And for the not-so-intuitively-inclined, us non-Mac-users, two boxes offer indexing: very useful, although problematic too, perhaps, as the collection scales up in size and complexity --

-- so for example the collection currently holds 51 documents from the 16th c., 3929 in Hungarian, 1041 from the NLPortugal --

-- and 25 "Generalities" documents concern "printing & publishing", while 2 concern "bibliophilie", and currently 11 documents are about "Earth sciences", 2 are on "English literature", 249 are on "Geography and voyages"...

And for both "Criteria" and "Themes", simply clicking on the relevant category conveniently brings up the short-form catalog entries with links to the documents themselves:

-- and the digitization appears to be very legible -- original scanning resolution being the key, 300dpi being very often too little, 1200dpi being often but not always too much...

The Hungarian digital fulltexts, curiously but also very interestingly, come up via an interface in Hungarian offering,

-- both Zipped and non, if my Hungarian gets that right -- but all of which appears to be sitting in Budapest, for now, or at least my San Francisco DSL retrieval of 13.75MB crawled *very* slowly toward me from wherever the file currently is residing, about 2 long minutes to download that .pdf, I'd say --

-- although when it finally reached me I couldn't really read it, owing to Finno-Urgic deficiencies with my own ancient familial tongue, but the cave-exploration images came across well and do look fascinating. I expect download delays eventually will be improved via systems improvements and mirror sites, though.

The NLPortugal texts, too, are reached via an interface different from both the BnF and the Széchényi... so what they have done here, for europeana.edu, is link distributed databases in a true "virtual" union catalog... fascinating...

So an NLPortugal text offers, for example,

-- my "user's" question being whether any of this, or any of the Hungarian texts, really will be amenable to annotation and other text treatment in my personal "Mes étiquettes" fiddling, within my own little "Ma bibliothèque"... Not yet, apparently: for now I can annotate BnF documents, but it seems the Széchényi and the NLPortugal documents still are "out there" in the Internet aether somewhere -- viewable and downloadable and therefore usable, but not yet amenable to the full "Ma bibliothèque" personalized treatment -- haircut but no full spa treatment, yet -- that will have to await "standards" meetings and protracted negotiations of the future, I expect.

Still though, all three, the NLPortugal and the Hungarian Széchényi and the BnF, are indexed and presented together, here, in one place. The interface is, as I said initially, simple and clear, and its presentation of three collections as far-flung in European terms as they can be is very impressive.


A Note:

Even more significant, perhaps, is that the Europeans have presented this in direct response to Google's digital library efforts. See the extensive discussion in the press and presse, throughout Europe and in the anglophone digital library world, on "le défi Google" and Continental Europe's reaction to it...

That prompts the more general thought that the controversy involved may have been, may still be, simply another instance of glass half empty / glass half full: yes there has been a "challenge", here, as J-J Servan-Screiber would have put it, but sometimes a challenge is what is needed to get others moving -- and in this case move they did, and it is a very useful and interesting result.

Google's approach was different. These Europeans -- the BnF here, together with its Hungarian and Portuguese partners -- are approaching the same goal with different ideas, different mindsets and preconceived notions, different strategies and tactics, different tools -- even a different "Onebox", maybe.

But long live les différences... alors... If we analyze both, side-by-side, each side will learn something, and ultimately both will benefit. The "virtual" union fulltext database offered by the Europeans may be the better way to go, or one as good, or one simply different; the unified and simply-operated Google approach may be very useful too; perhaps both have a valuable role to play going forward, then, serving different publics or perhaps serving the same but in different ways.

So the more the merrier, and the better for us the users... Now instead of "just Google", the pioneer, we *also* have "the Europeans", with their differences whatever those will be. And these are only two, after all: we also need a Chinese online digital library, and an online digital library from India, or several or even many of each -- who else will bring us adequate treatment of the eccentricities of Kannada and Uighur texts, or challenges at least to the accepted treatments of same in use in the rest of the world -- some things simply are approached differently, in different places on the planet, and virtual digital libraries can reach them all.

So congratulations to the team(s) assembling europeana.eu : may their efforts increase, and thrive, and may we all learn much we never knew about Portuguese bibliography, and Hungarian speleology, and search & retrieval generally, as a result. Their new site offers much which ought to be of great interest to both Googlers and the many others who will want to contribute time and effort to digital libraries.


And a pps., suggesting a digital-libraries viewpoint:

Archilochus figured the world is composed of both foxes, who have many ideas, and hedgehogs, who have one big idea...

France is focussed on their présidentielles next weekend: la nation goes to the urnes on Sunday the 22nd, in what promises to be their most significant election since 1958 -- "On Elise", as Le Canard Enchainé puts it.

But the focus of foreigners, at least, perhaps should be more on the *ongoing* nature of things French: the very immediacy of our digital media leads us too often to believe that all details are significant -- but just because a thing appears on CNN doesn't make it so, or at least that doesn't make a thing important.

Of course a presidential election may be an important thing; but then presidents and governments come and go -- constitutions, too -- while much remains the same far longer, in a place as large and as old as France. Via our modern media we see too much detail, perhaps -- "The world is too much with us", the poet said -- we lose sight of immense forests, and of sometimes-encroaching wastelands, in our daily fascinations with individual trees.

So GoogleEarth can be of great help, in this effort to view both the forests and their trees: try focussing on France from Space, using GoogleEarth, then zoom in to see some tiny detail and zoom out again -- France is a big place, with a lot going on inside it, and not all of that in Paris, en dépit...

Wikimapia, too: the comments there of small folks, and simply the indications of what to them seems important, show up well: every little person in Rajasthan nowadays is scribbling something onto Wikimapia about his own tiny village, it seems, ditto villagers in the Vaucluse -- it is a fascinating process, and one taking place very independently of the more mediagenic events surrounding the Elysées Palace and its occupants...

Isaiah Berlin's modern update of Archilochus' notion provides some instruction, and some comfort, in any politically - tumultuous time. Berlin was in love with politics, and he masterfully morphed the old Greek's more general image into a metaphor, describing the daily lives of political writers and politicians and their endless controversies, everywhere. Berlin's "The Hedgehog and the Fox" (1953) points out that both "hedgehog" and "fox" views are valid, although we very often do not know which one we ourselves represent, and we do tend to vacillate between the two.

It often is not the details but the generalities, then, which matter: not the science but the music, what used to be called the Harmony of the Spheres -- and there is a harmony, in all of this, no matter which way a given election swings. It is just that the level of focus necessary for finding that harmony tends to vary.

Aux urnes, then: the results there may merit detailed attention, as they do in the US too and in other human "jurisdictions". The political foxes may provide a harmonious result. But even if not, the hedgehog view from farther out still will be instructive, and may provide comfort: even if one has to look at it from *very* far out in Space... from 16,000 miles out, as GoogleEarth does.

From Space, at least, it still is Archie MacLeish's Earth: "small and blue and beautiful in that eternal silence where it floats" -- and the Hexagone still is one of the more beautiful and promising portions of our little planet -- as demonstrated by that view of it from Space, and by ongoing cultural achievements such as europeana.eu


Jack Kessler




