A Welcome and a Caution (2024)

Ian Lancashire, editor,
Isabelle Zhu, assistant editor,

with contributions from

Julia DaSilva, Paramita Dutta, Xueqi (Sherry) Fan, Sharine Leung, Sky Li, Kristie Liu, Tim Alberdingk Thijm, and Shirley Wang

Lexicons of Early Modern English (LEME) is a research project about Early Modern English lexicography and lexicology. The database and corpus, their survey of language texts, the mapping of headwords in LEME texts with those in the OED, historical inquiry into how the monarch and his advisers determined what word-books shall be published, and above all the structure of a neglected genre – the word-entry – are the subjects of this research program.[1] LEME texts came online as early as 1996 in the Early Modern English Dictionaries Database (EMEDD) as a response to interest from other researchers. Today LEME texts hold over 1.1 million word-entries in over 30,000 pages of text from 1475 to 1755. Coverage begins with late medieval manuscript vocabularies (legal, chemical, herbal, and Latin) and closes with Samuel Johnson’s great dictionary in 1755. Increased demand for the encoded texts themselves has led to the publication of the first part of the corpus, about half of more than 400 language texts surviving from 1475 to 1625 alone, excluding re-editions.

Only two dozen of these texts are dictionaries, half of them bilingual Latin-English in nature, six of them bilingual French-, Italian-, Spanish-, Welsh-, and Anglo-Saxon-English, and only three are monolingual English hard-word glossaries derived from bilingual texts. Most LEME texts in this period are of different genres: Latin-English grammars, hard-word glossaries, herbals, spelling lists, glossed texts, collections of definitions, concordances, and nearly two hundred antiquarian manuscript essays about etymologies delivered before the Elizabethan Society of Antiquaries. What do these texts have in common? They are all collections of word-entries, variously structured, differently purposed. Tudor-Stuart English education loved words, and its young men spent years studying them to the exclusion of most other useful arts and sciences. Educators, lawyers, secretaries, courtiers, entertainers, and even tradesmen earned their living by their ability with words.

Studying (in grammar school) historical languages opened up to literate English people four centuries ago an amazing network linking time-present English words (whether taken consciously from Latin and Greek, or descending from then, or simply taken from other current vernaculars) with such time-past words. John Florio’s title, Worlde of Words (1598, 1611), memorably announces a revelation of the lexical entanglement of English with other tongues, driven by the general borrowing of foreign words by English speakers and writers. Borrowings led to the study of etymology, a tool by scholars that uncovered the past of everything that had names.

LEME is hardly alone in building an edifice of language texts in this period to assist research,. Jürgen Schäfer in Early Modern English Lexicography (OUP, 1989) analyzed hard-word glossaries up to 1640. He claimed truly that they contributed many new words to English, thousands more than (say) Shakespeare had. However, Schäfer did not analyse multilingual lexical works, which map classical and modern continental languages to English, or spelling lists, herbals, and definitions. Schäfer did not cover the period 1641-1755 or release his digital texts. The Salamanca Corpus also disseminates diachronic dialect texts in digital form from ca. 1500 to 1950, most of them later than the Early Modern period. The English Dialect Dictionary (1898-1905) project, recently placed online by Manfred Markus, creates a searchable database of a great late-nineteenth-century dictionary based on the work of antiquarians going back to the Early Modern period. The Seville Corpus of Northern English includes texts up to the sixteenth century, most very early. The original of diachronic corpora is of course the Helsinki Corpus, which has inspired other Finnish projects such as the Corpus of Early English Correspondence, the Corpus of Early English Medical Writing, the Helsinki Corpus of Older Scots, and the Helsinki Corpus of British English Dialects. Another early diachronic example is the Lampeter Corpus of Early Modern English Tracts. Varieng at the University of Helsinki offers up-to-date information on all these. The making and use of corpora dominate linguistics research in Europe but are less common in America. However, LEME is a close relative of the Early English Books Online / Text Creation Project (EEBO/TCP) because both have digitized lexical STC and Wing materials. Like EEBO, ECCO releases images of large 18^th-century dictionaries, such as ones by John Kersey, Nathan Bailey, Samuel Johnson, and Joseph Nicol Scott, but not their encoded texts, as LEME has.

LEME does not exist to collect and disseminate early lexical texts. It is a tool to understand their word-entries. Attending to word-entries of the time closely enables us to recognize how different the Early Modern view of language was from that of the past two centuries. The alien quality of the Early Modern view of language is betrayed by two oddities of usage: grammarians said that nouns were names for things, and rhetoricians explained that only things could be defined. How then could words be explained or defined? Where was word-meaning? Today we go to dictionaries to understand what a headword means, and we find an answer in a description of the non-word thing it names and that ideation forms in our minds. The Early Modern English, however, did not confuse the meaning of a thing denoted or named by a noun, and the meaning of that noun. Thomas Wilson in his vade-mecum Art of Logic (1550) solved this concundrum for us by explaining that words were things too (!) and thus, like other things that they named, they could themselves be defined. Yet the definition of a word consisted of only of other words. Wilson, following a humanist tradition, averred that the perfect definition of a word, in fact, was its etymology, the word from which it ultimately descended. The unpacking of what appeared alien lay in the structure of the word-entry’s double definition. This was a popular theme in the decades that followed.[2] It is perhaps no surprise that practical kings and counsellors had little enthusiasm for delving into ancient word-histories by publishing monolingual English dictionaries. Even Sir John Sidney thought that it was absurd to think that his contemporaries needed a dictionary to teach them their own language. We need not muse for long why the etymon-obsessed Elizabeth Society of Antiquaries did not last long in the reign of James I.

This distinctly unmodern view of language has implications for anyone studying it and its literature. For one, it calls into question any gloss of an Early Modern English word that assumes our own ideas about word-entries and meaning. We are free to impose our own ideas on how word-entries worked on long-dead Early Moderns, but should we? For another, it requires an encoding system for LEME lexical texts that respects the beliefs of those alive four hundred years ago.

Before I leave you to these texts, a word about why LEME is giving them away. LEME is no free transcription service. Our federal research agency does not require us to give our research materials away, although it probably will do so soon. Yet how could anyone restrict access to a corpus whose texts are uncopyrighted and whose semantics are archaic? I applied to SSHRC in 2016 to open up LEME and accordingly adopted a very generous Creative Commons license (Attribution 4.0), which enables everyone (commercial and research users) free access and development of XML-encoded texts. Our first shared encoded text has been my TEI-encoded transcription of Samuel Johnson’s dictionary (1755), which I gave to the University of Central Florida’s NEH-funded Samuel Johnson edition in late December 2019.

You too are most welcome to use our materials. Only remember that the cui bono[3] that launched LEME are those who wanted to learn from research about Early Modern English.

Ian Lancashire

Toronto

21 January 2020

[1] 31 essays document this program (1992-2019), which gradually unravels a theory of Early Modern word-entries that suggests a pervasive, unsettling anachronicm in our understanding.

[2] For example, Blundeville 1599.

[3] Latin “as a benefit to whom?”