• Skip to main content
  • Documents
    • A Welcome and a Caution
    • The History of LEME
    • Editorial Practice
    • Encoding
      • From Database to LEME-XML
        • LEME XML Tagset
    • Lemmatizing
      • Lemmatizer
    • Source Detection
  • Download Encoded Texts
    • LEME Chronology 1475-1755
    • Detailed Primary Bibliography
    • Indexes
      • Index of Proper Names
      • Index of Topics
      • Abbreviations for Languages
      • Parts of Speech
    • Works Cited
  • Help
    • Getting Started
    • People
    • Google Analytics
    • Acknowledgements

LEME

Lexicons of Early Modern English

Lemmatizing

LEME very much respects the thinking of the Early Modern English about their own language, although its distinctive features are quite different from the thinking of contemporary lexicography. Here we link the two lexical systems because English is one language. That link is the LEME lexeme element held by the <form> and <xpln> tags, which ties Early Modern English spellings to the corresponding modern spellings (and parts of speech) of headwords in the Oxford English Dictionary. We did not design our own spelling system for Early Modern English because we do not speak the language of its contemporaries. If John Hart had no standing in persuading Elizabeth’s council to accept his new orthography, how can we? Scores of OED lexicographers, over a century and a half, have already devised headwords for the history of English in all periods and places. Today tools and historical scholarship have enabled us to know more about Early Modern English history than its speakers could have.

How could we lemmatize consistently? Manual look-up, even on the OED database, was time-consuming for error-prone human beings. It was in 2015 that OED itself offered us – without asking — the leverage to do so: an Excel file of all 97,800 headwords found to be active between 1475 and 1625 in Early Modern English, attached with first and last dates of occurrence, innovating author, number of quotations, and entry URL. With this, a program could lemmatize most old-spellings correctly.  That program was written in 2018-19 by Xeuqi (Sherry) Fan. Read more.

Ed. Ian Lancashire and Isabel Zhu, with contributions from Julia DaSilva, Paramita Dutta, Xueqi Fan, Sky Li, Kristie Lui, Annika Sparrell, Timothy Aberdingk Thijm, and Shirley Wang
© 2025 Ian Lancashire