by Jack Kessler, kessler@well.sf.ca.us

Dec 15, 1992 issue.
From: Jack Kessler 
Subject: Online French Fulltext, and InfoSci statistics! (15 Dec 92)

December 15, 1992

	FYIFrance:  Online French Fulltext, and InfoSci statistics!

edited by: 	Jack Kessler

I've asked around, and no one seems to have heard either of FRANTEXT or
of its North American incarnation, ARTFL. 2500 major French language
fulltexts online, armed with "absolute and relative term occurrence
frequency" statistics for information scientists to ponder, would seem
to be a pretty interesting and valuable resource for a lot of folks.
What follows, then, is my own translation of a fascinating forthcoming
article on FRANTEXT, details on reaching both FRANTEXT and ARTFL, and
information about a new fulltext book which may be of great interest to
librarians, French students, information scientists, rare book
scholars, and network hackers alike. Joyeux Noe:l.  *<:-)}

Jack Kessler


"The Point of View of a FRANTEXT User: FRANTEXT at the
Bibliothe`que Publique d'Information" (citation appears below)

by Jacques Lemarignier, BPI, Paris (with permission)

FRANTEXT has been in use for five years at the Bibliothe`que Publique
d'Information. It has attracted a very diverse following, composed
essentially of students, professors, researchers, and professionals
from the worlds of cinema, publishing, and advertising. Great numbers
of users have been able to pose questions of great variety -- in their
complexity, their level of difficulty, the object of their search
(doctoral or master's thesis, publication in an anthology, simple
curiousity) -- and they have greatly appreciated the results obtained.

This impressive output can be explained by the remarkable flexibility
of the software, which can adapt itself to such a variety of questions.
It might be interesting to recount these many possibilities, placing
ourselves in the point of view of the user, who can translate a
question into search terms and use all of the resources of FRANTEXT to
surmount any difficulties found. The choice of several examples of
increasing complexity will help us comprehend what one can obtain from
this database:

The search for a quotation is the simplest and fastest operation.  If
it is found in one of the 2500 texts of the corpus -- which is not
always the case -- one can identify it in several seconds. In addition
to amazing the user, one renders her a great service.

A documentalist from Gallimard publishers, who is preparing an edition
of the correspondance of Drieu La Rochelle, had looked in vain for a
phrase cited without a reference in one of the letters:

"Il tenait a l'affu^t les douze ou quinze sens
Qu'un faune peut braquer sur les plaisirs passants"

In an instant these two lines were retrieved on the screen of the
terminal, with the citation to their source, Victor Hugo, _La le'gende
des sie`cles_, "Le Satyr", and the page of the cited edition.

A film-maker was struck in her childhood by a poem which she wished to
use for a scene in a film, but she could remember only a single verse
with any accuracy, and she had forgotten entirely both the author's
name and the title. With the same ease, FRANTEXT showed her that the
verse was taken from one of the _Poe`mes barbares_ of Leconte de Lisle,
"Les elfes".

FRANTEXT also can render appreciable and rapid service in illuminating
the history of a word. It is believed that "ordinateur" is a recent
word, derived from the invention of the electronic calculator. But a
search for this word on FRANTEXT shows us immediately an older use:

"The insufficiency of all purely mechanical solutions is a new motive
for us to resort to a pre-established arrangement. Why should we make
such vain and ridiculous efforts to show ourselves as being the
*ordinateur*? Is it not always necessary that the collection of
secondary causes points at last to a resolution in the first cause, of
which the sublime and consoling idea so entirely satisfies and
completes the heart and the spirit?"

--Charles Bonnet, _Contemplation de la Nature_, 1764, page 267, part

Here the adjective "ordinateur" signifies the Supreme Being, reminding
us of the "Dieu horloger" of Voltaire. It is this word, fallen into
dis-use, which has been taken up, two centuries later, to translate the
English word "computer". These later occurrences begin in the year
1960, using the modern definition. In 1964 one already speaks of an
"ordinateur e'lectronique" (_Histoire ge'ne'rale des sciences_, t.3,
vol.2, 1964, page 108). Then the examples multiply, and the word
becomes common.

Searches of the same type, for uncommon meanings or usage of words,
often are of great interest. If one is interested in the concept of
"Lumie`res", for example, which is used to characterize the entire 18th
century, now called the "Sie`cle des Lumie`res", FRANTEXT shows us that
its intensive use extends from 1750 to 1850, with no pause at 1799. The
following table, of word-occurrence frequencies by chronological eras
of twenty years, can be obtained in less than a minute:


NOTE: relative frequencies are expressed in millionths

Total absolute frequency: 3226
Maximum frequency: 652, in the period 1780-1799

Scale: an asterisk represents an absolute frequency of 20

               abs. rel.
1700-1719:      41   17  ***
1720-1739:      55    9  ***
1740-1759:     246   36  **************
1760-1779:     460   46  *************************
1780-1799:     652   86  **********************************
1800-1819:     517   93  ***************************
1820-1839:     479   43  *************************
1840-1859:     315   26  *****************
1860-1879:     278   24  ***************
1880-1899:     183   18  **********

The absolute frequency represents the number of usages of the term
"lumie`res" in the texts of the corpus, divided up by periods of 20
years; the relative frequency is the relation between the number of
usages in the texts considered and the total number of words in these
same texts. Thus, in the period 1780-1799, "lumie`res" is used 652
times in the texts of the corpus of FRANTEXT: that is the absolute
frequency. Dividing by 652 the total number of words of the texts for
this period, one obtains 86 millionths: that is the relative

The figures and the diagram show us the immense increase in usage of
the term "lumie`res" up to 1799, notably in the last 20 years of the
18th century, then the slow and steady diminution of its use.  One
clearly sees the influence of the term "lumie`res" during the French
Revolution, and the persistence of this influence during the Second
Empire and the Restoration.

Another table, obtained just as rapidly, can show the absolute and
relative frequencies of the term for each author of the same period,
except for those authors who didn't use the term at all.  Between 1750
and 1800 the authors who made greatest use of the term are not those of
whom one first would think: the Abbe' Pre'vost (144 usages), the Abbe'
Barthe'le'my (138 usages), the Abbe' Ge'rard (104 usages), Bernardin de
Saint-Pierre (101 usages), Diderot (89 usages), Condorcet (82 usages).
This shows that, if the ideas of the "Lumie`res" were developed by
philosophers who are famous today, they were taken into the language
and popularised largely by writers who are less known or nearly
forgotten today, but who had, in their time, a very great influence,
and who played an essential role in the preparation of the Revolution.
Between 1800 and 1850, when Madame de Stae:l and Chateaubriand come to
the fore, with respectively 194 and 160 usages, the concept of
"Lumie`res" becomes fixed in the popular imagination and takes on a
mythical value.

Subject-searching is more complexe, but FRANTEXT has such flexibility
that the result is excellent. If one is searching within the works of a
single author, or within a single work, it is enough to enumerate the
terms which define the theme. Thus, in a study of the color "blanche"
in _Madame Bovary_, one first automatically searches all the words
formed on the terms "blanc" and "pa^le"; to this list one may add
"blafard", "livide", ble^me", "cadave'reux", "neige", "lumie`re",
"immacule'", "candeur". From the result appears an evolution from
lively colors, often contrasting, through to pale and livid hues,
paralleling the gradual fall of Emma into suicide and nothingness.

As shown by its 218 examples, the color "blanche" has great importance
and symbolic value. At first it signifies health, propriety, elegance,
beauty, purity. When Charles sees Emma for the first time, he first
notices the whiteness of her fingernails, which surprises him; "her
hand was not beautiful, in fact it was perhaps a bit pale"; as a
counterpoint, her eyes are brown and appear to be black, her look clear
and direct, "reaching you frankly, with a strong candour". The black
here is placed into a natural opposition, as a sign of life, of force
and of beauty. A few lines further, Charles gazes at "her neck which
extended from her black collar" and "her hair, for which the two black
ribbons seemed each to be of the same piece". A bit later, after a warm
spell which has melted the snow, Charles visits Emma, of whom "her
parasol, of gently reflective silk, blocking the sun, highlighted the
white skin of her figure with the play of its reflections" (I,2). Love
and happiness are associated with an intense and alive light, primarily
one that is pale and white.

During the early part of the marriage of Charles and Emma, one finds an
entire range of colors, from black to white, in an atmosphere of
happiness. Charles, "was staring at the sunlight, passing between the
bedcovers and her blonde hair"; her eyes, "black as shadow and deep
blue as daylight, contained layers of color which, in deeper hues in
the depths of her eyes, shone with enameled brightness on the surface"

A soon as Emma breaks through the sterile dream of this imagined ideal
world, the white and the black come to oppose themselves in the usual
way, thus, "Emma wished to live...like the ladies...who passed their
days ...in watching the approach from the depths of the country a
soldier with a white plume in his hat, galloping on his black horse"
(I,6). The world then becomes more pale, less real. White carries her
away, the image of her dream: "She wandered, her desperate eyes upon
the solitude of her life, searching for any white sail in the mists of
her horizon" (I,9).

The colors follow the movements of the soul of Emma in the novel.  The
progression from dream, to boredom, to disgust with life, to despair,
is expressed by the intrusion of pallor: the washed-out tint, the
whitened day, wan shades, the pale sky, the livid river; finally the
eyes of the corpse of Emma, "disappearing in a viscous pallor".

The minute examination of the 218 occurrences obtained, the report of
the text of the edition of Belles Lettres, for which the page is
indicated each time, enables the enrichment of these few remarks, the
precision of each nuance of color, their placement in relation to their
context. A study can obtain an exhaustive list of examples in twenty
minutes using FRANTEXT.

(Next: conclusion of Lemarignier's description of FRANTEXT, how
FRANTEXT and ARTFL may be reached online, and a new book on online
fulltext work in France.)

Jack Kessler



This is the conclusion of Jacques Lemarignier's description of
FRANTEXT, a collection of 2500 French classical fulltexts, which may be
reached online either in Europe or from ARTFL in North America. Access
instructions appear below, as does a description of some of the very
interesting work in online fulltext being done today in France.


(Lemarignier on FRANTEXT, continued:)

If a subject search is attempted over the entire FRANTEXT corpus,
serious problems arise because of the multiple meanings of certain
terms. Such is the case for a question about "WC's", which might take
one wandering through equally specialized terms such as "latrines" or
"gogues"; but then, arriving at "cabinets", one would wander off into
"ministerial cabinets" and related terms. One can, however, limit a
search of the entire corpus to the novel, poetry and the theater to
discipline the search in this sense. The sorting of texts by literary
genre often permits the limitation of word usage to the sense desired,
by isolating a given semantic field.

In another case, however, the examples found serve to enrich the list
of words which define a theme. This is what happened with the research
of texts which might illustrate and clarify the perception which the
French have of the Arabs. About thirty words gradually were found:
"Arabe", "Turcs", "Islam", "islamisme", "Mahomet", "Maures",
"Sarrasins", "musulman", "mahome'tanisme", "Coran", "Alcoran",
"mosque'e", he'gire", "houri", "calife", "be'douin", "se'rail",
"pacha", "eunuque", "Avicenne", "Allah", "La Mecque", "Egypte",
"Damas", "Maroc", "Tunisie", "Palmyre", "Constantinople", "Bagdad". The
terms were used in part or entirely according to the period, the
authors or the texts which particularly interested the users. The
results revealed extreme reactions with regard to the Arabs, very
favorable or very hostile, almost never neutral, and citations such as
the following:

"I never rejoice for our victories over the Arabs...I love these
people, rough, persistant, lively, the final type of primitive
societies, who, halting at mid-day, lying in the shade, beneath the
bellies of their camels, smoking their 'chibouk', scoff at our grand
civilization which quivers in its own rages." Flaubert,
_Correspondance_, August 6, 1846.

"I demand in the name of humanity the destruction of the black stone,
to throw the bits to the wind, the destruction of Mecca, and the
desecration of the tomb of Mohammed. This is the way to demoralise
fanaticism." Flaubert, _Correspondance_, March 1, 1878.

"This evening, at the home of Daudet, Larroumet spoke up curiously for
Morocco, which is the last refuge of old Islam and where torture has a
ferocious quality surpassing that of the tortures of China." Goncourt,
_Journal_, t.4, 1896 (November, 1895).

FRANTEXT cannot, certainly, respond to all needs. Its limits are those
of its corpus, which encompasses 2500 French language texts, from the
16th century to our day, and which had no original purpose other than
to offer an assortment of the French language in its different levels
and over the course of its evolution. No citation may be found which is
not from a text included in the corpus, and no subject search of a work
is possible unless that work is part of the corpus. But experience
shows that one rarely must eliminate these sorts of questions, which
proves, one more time, that the texts were remarkably well-chosen.

This database is useful in a great number of situations, and it offers
immense resources, which marvelously complement, easily and quickly,
more traditional means of research. The results please those who try
it. This brief recounting of typical questions perhaps will give
incentive, to both researchers and the merely curious, to experiment
with the riches of this resource.

(original in French by:) 

Jacques Lemarignier, Bibliothe`que Publique d'Information
Centre Georges Pompidou, 19, rue Beaubourg, 75197 Paris Cedex 04

Jacques Lemarignier may be contacted via e-mail c/o Jacques Faule
at faule@univ-rennes1.fr, or via fax to (Paris) 44-78-12-15.

The above article will appear in its French original as follows:

Jacques Lemarignier, "Le point de vue d'un interrogateur sur FRANTEXT:
FRANTEXT a` la Bibliothe`que Publique d'Information", in _Les banques
de donne'es litte'raires, comparatistes et francophones_, edited by
Alain Vuillemin, Limoges: Presses de l'Universite' de Limoges et du
Limousin, January 1993 (forthcoming)

This book contains enough that is exciting and new -- there appears to
be a great deal going on in France in online fulltext -- to make
worthwhile the listing here of its table of contents (again, with

"Avant-propos", by Jean Claude Vareille, Pre'sident de l'Universite' de

"L'informatique litte'raire: de quelques effets corollaires", by
Jacques Fontanille, Doyen de la Faculte' des Lettres et Sciences
Humaines de l'Universite' de Limoges


"La lecture assiste'e par ordinateur et la station de lecture de la
bibliothe`que de France", by Jacques Virbel, CNRS, Institut de
Recherche en Informatique de Toulouse, U.Paul-Sabatier (Toulouse I)

"Le re'seau 'litte'ratures francophones' de l'UREF et la recherche
bibliographique", by Jean-Louis Joubert, Universite' de Paris-Nord
(Paris XIII), Universite' des Re'seaux d'Expression Franc,aise,
Coordonateur du re'seau 'Litte'ratures francophones'

"Des banques de donne'es sur les e'tudes litte'raires francophones", by
Claire Panijel, URFIST de Paris-Ecole des Chartes

"Te'le'informatique et litte'rature franc,aise", by Jacques Faule,
Bibliothe`que Publique d'Information, Centre Georges Pompidou

"Banques de donne'es et recherche litte'raire: proble`mes eet
perspectives", by Claude Cazale-Be'rard, U.de Paris X - Nanterre

"La boite en valise ou le poste de travail du litte'raire", by Henri
Behar, Universite' de la Sorbonne Nouvelle (Paris III)


"Les litte'ratures d'expression franc,aise", by Jacques Chevrier,
Universite' du Val-de-Marne (Paris XII)

"Le programme 'LIMAG' (litte'ratures maghre'bines), by Charles Bonn,
Universite' de Paris-Nord (Paris XIII)

"Aux sources de 'LIMAG': regard porte' sur la cre'ation d'une banque de
donne'es", by Fe'riel Kachoukh, U.Paris-Nord (Paris XIII)

"Projet de cre'ation d'un lieu ressource dans le domaine de la
litte'rature maghre'bine d'expression franc,aise", by Fe'riel Kachoukh,
Universite' de Paris-Nord (Paris XIII)

"'LITAF': une banque de donne'es de litte'ratures africaines", by
Virginie Coulon, Universite' de Bordeaux I

"'LITAF': petit manuel pratique", by Virginie Coulon, U.Bordeaux I

"La base de donne'es bibliographique 'langue et culture en Louisiane'",
by Maguy Grassin, Universite' de Limoges

"Le point de vue d'un interrogateur: FRANTEXT a` la Bibliothe`que
Publique d'Information", by Jacques Lemarignier, Bibliothe`que Publique
d'Information, Centre Georges Pompidou

"Peut-on re'gler son compte a` la 'raison'?", by Etienne Brunet,
Universite' de Nice

"Le vert de Saint-John Perse", by Eveline Caduc, U. de Nice

"ARIEL", by Pierre Brunel, Universite' de Paris-Sorbonne (Paris IV)

"L'aventure du projet 'ARIEL' ou la gene`se de la banque de donne'es
comparatistes et froncophones 'Ariel-litte'ral' de l'univeriste' de
Paris-Sorbonne (Paris IV) 1981-1991", by Alain Vuillemin, Universite'
de Limoges

"'SPIRIT': aide a` la constitution de bases de donne'es
bibliographiques", by Fre'de'ric Foussier, INSTN-CEA-Universite' de
Paris-Sud (Paris XI)

"Projet d'une banque de donne'es des 'exempla' me'die'vaux", by
Marie-Anne Polo de Beaulieu, CNRS


"Re'alisation partage'e d'une e'dition de texte a` distance", by
Fre'de'ric Foussier, INSTN-CEA-Universite' de Paris-Sud (Paris XI)

"'EL HADJ': une maquette de banque de donne'es litte'raires,
e'ditoriale et bilingue, en litte'rature compare'e", by Alain
Vuillemin, Universite' de Limoges

"Pour un syste`me de stylistique informatise'", by Bernard Gicquel,
Universite' du Maine

"Bases de donne'es et ge'ne'ration de textes", by Jean-Pierre Balpe,
Universite' de Paris VIII

"La banque de donne'es d'histoire litte'raire", by Michel Bernard,
Universite' de la Sorbonne-Nouvelle (Paris III)

"La base de donne'es iconographiques des vide'odisques des manuscrits
de la biblioth`que Vaticane", by Je'ro^me Baschet, Ecole des Hautes
Etudes en Sciences Sociales (Paris)

FRANTEXT at the BPI, Paris

FRANTEXT may be consulted at the BPI library at the Centre Pompidou,
Paris (first floor, Bureau 8 - literature), for a fee, from 1 to 5 pm..
Responses are printed out.

FRANTEXT in Europe

Subscriptions are available from the Institut de la Langue Franc,aise,
Tre'sor Ge'ne'rale des Langues et Parlers Franc,ais, (Centre National
de la Recherche Scientifique), 52, boulevard de Magenta, 75010 Paris,
telephone (Paris) 42-45-00-77. FRANTEXT's own publicity lists 183
million word-occurrences, 2330 works, 3241 "treated texts", of which
20% are non-literary texts taken from 70 disciplines from the 19th and
20th centuries, 900 authors, 450 publishers, and 53 operating
public-access sites in addition to the BPI, including sites throughout
Europe and in Japan.

FRANTEXT in North America -- the ARTFL database and service

ARTFL is the "North American antenna" for Frantext, according to its
director, Mark Olson (e-mail: mark@gide.uchicago.edu, telephone:
312-702-8488). It contains a copy of the FRANTEXT database, which it
makes available via telnet (to artfl.uchicago.edu) to subscribers (US$
500 per year -- 40 major campuses currently are subscribed) together
with a special, improved interface, and e-mail, ftp, and offline
photocopying services. Olson freely distributes extensive user
documentation and a good bibliography on his ARTFL service and on the
general FRANTEXT concept.

n.b. Does any of this have SGML markup? How friendly is it to ASCII?
How easy is it to get to? (You can get to it on both Minitel and the
Internet: that's pretty easy.) How inexpensive will it be?  Will
certain types of scholar prefer it to the printed book? Will certain
types of reader? I've neither answered nor frankly asked any of these
questions yet of FRANTEXT. But it is interesting that it's there, and
that it already is as accessible as it apparently is. It's not the only
thing becoming available now in France in online fulltext, moreover, as
the headings shown above from the book in which M. Lemarignier's
article will appear indicate.


