• Skip to main content
  • Documents
    • A Welcome and a Caution
    • The History of LEME
    • Editorial Practice
    • Encoding
      • From Database to LEME-XML
        • LEME XML Tagset
    • Lemmatizing
      • Lemmatizer
    • Source Detection
  • Download Encoded Texts
    • LEME Chronology 1475-1755
    • Detailed Primary Bibliography
    • Indexes
      • Index of Proper Names
      • Index of Topics
      • Abbreviations for Languages
      • Parts of Speech
    • Works Cited
  • Help
    • Getting Started
    • People
    • Google Analytics
    • Acknowledgements

LEME

Lexicons of Early Modern English

LEME XML Tagset

Structural Elements

Root

Attributes: None

Children:

  • Structural: <leme>
  • Textual: None

The <root> element is the sole parent element to all the other elements of LEME-XML, as per its standard XML definition. As LEME-DB already identifies <leme> as the sole parent element of all LEME-DB elements, <root> may only contain one <leme> element.

Leme

Attributes: @no: string

Children:

  • Structural: <section> <wordentry> <wordgroup1> <heading> <closing>
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <set> <sic> <term> <xref>

The <leme> element is the sole parent of all the other elements of LEME-XML beneath the <root> element. It has a single attribute @no which should contain a number from the text’s bibliographic page.

Section

Attributes: @type?: string

Children:

  • Structural: <wordentry> <wordgroup1> <heading> <closing>
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <section> element surrounds a generic section of the dictionary such as a title page, preface or dedication. The type of the section may be optionally given by the @type attribute.

Set

Deprecated: do not include

Attributes: @tag: string, EITHER @lang: langstr OR @font: fontface

Children: None

The <set> element provides initial information on the global parameters of the document. It sets the default language (with a @lang attribute) or default font (with a @font attribute) for the given @tag for the rest of the document. If no <set> appears, the LEME defaults are used (per the database processor).

Wordgroup1

Attributes: @type: grouptype, @lang?: langstr, @object?: string

Children:

  • Structural: <wordentry> <wordgroup2> <alpha> <heading>
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <wordgroup1> element surrounds a section of the dictionary such as words beginning with the letter “A”. It has a @type attribute which specifies what grouping is being made (see grouptype). The element may also have an optional @lang attribute to specify the language of the enclosed content, and an optional @object attribute which typically contains an editorially-spelled uppercase form of the group’s header, such as “A”. This element may directly contain word-entries, or it may nest them inside a <wordgroup2> tag. If a word-group’s type is alphabetical, its first element should generally be an <alpha> tag.

Wordgroup2

Attributes: @type: grouptype, @lang?: langstr, @object?: string

Children:

  • Structural: <wordentry> <wordgroup3> <alpha> <heading>
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <wordgroup2> element is a subdivision of <wordgroup1>. In all other respects it is identical to <wordgroup1>. It may only occur inside a <wordgroup1> tag.

Wordgroup3

Attributes: @type: grouptype, @lang?: langstr, @object?: string

Children:

  • Structural: <wordentry> <alpha> <heading>
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <wordgroup3> element is a subdivision of <wordgroup2>. In all other respects it is identical to <wordgroup2>. It may only occur inside a <wordgroup2> tag.

Wordentry

Attributes: @type: entrytype, @joinnext?: string, @anchor?: string

Children:

  • Structural: <form> <xpln>
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <wordentry> element surrounds each complete word-entry. It has a @type attribute which identifies its function. Its two optional attributes are used for linking word-entries: @joinnext is given a positive (1 or greater) integer n and identifies the current word-entry and the following n entries as being related or “joined”. Only the first entry in the sequence should have a @joinnext attribute. Note that no intervening text should appear between joined word-entries outside of the <wordentry> tags and their children. @anchor specifies a string value used by an <xref> element to identify this word-entry (see Xref for more).

Form

Attributes: @lang?: langstr, @number?: string, @location?: EITHER “text” OR “margin”, @lexeme?: string, @type?: string

Children:

  • Structural: NONE
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <form> element contains the headword or headwords of the word-entry. It has several attributes: the common @lang attribute, an optional deprecated @number attribute, an optional @location attribute to specify whether the form appears in the margin or the main body of the text (the default is “text”), an optional @lexeme attribute which lists the modern dictionary lexemes associated with the form, separated by “|” characters, and an optional deprecated @type attribute which can specify if the form is given as an erratum for another (this is the only expected use of this attribute).

Although every <wordentry> should have one <form> element, the schema allows it not to for compatibility reasons: LEME-DB may however not correctly process a standard LEME document missing a <form> tag in a <wordentry>.

Xpln

Attributes: @lang?: langstr, @location?: EITHER “text” OR “margin”, @lexeme?: string, @type?: string

Children:

  • Structural: <subform> <subxpln>
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <xpln> element contains the general explanation of the word-entry’s headword(s). It encodes similar attributes to the <form> element, including @lang, @location, @lexeme and @type, which are all used identically to how they are in the <form> element. <xpln> tags may also contain <subform> and <subxpln> tags, however, to pair up relevant information in the explanation.

Subform

Attributes: @lang?: langstr

Children:

  • Structural: NONE
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <subform> element is a simple version of the <form> element which appears within an <xpln> element. Compared to a regular <form> element, it may only have a @lang attribute.

Subxpln

Attributes: @lang?: langstr

Children:

  • Structural: NONE
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <subxpln> element is a simple version of the <xpln> element which appears within an <xpln> element. Compared to a regular <xpln> element, it may only have a @lang attribute.

Heading

Attributes: NONE

Children:

  • Structural: <alpha>
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <heading> element contains text that acts as a heading, i.e., beginning a section of the dictionary. It should always be the first element of the containing <section> or word-group. A heading may contain an <alpha> element but this is not necessary; generally, if the <alpha> element is the only content of the <heading> element, then the <heading> tag is unnecessary.

Closing

Attributes: NONE

Children:

  • Structural: NONE
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <closing> element contains text that acts as a closing, i.e., ending a section of the dictionary. It should always be the last element of the containing <section>.

Alpha

Attributes: NONE

Children:

  • Structural: NONE
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <alpha> element contains some text that typically acts as a heading to a word-group tag (<wordgroup1>, <wordgroup2>, <wordgroup3>) which organizes its contents alphabetically. In those cases, its contents will typically be the section identifier, e.g. “A” or “AB”. It has no attributes and only contains textual elements.

Textual Elements

Blockquote

Attributes: NONE

Children:

  • Structural: NONE
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <blockquote> element contains some quoted text, formatted as a block quote. It is rarely used.

Br

Deprecated: do not include

Attributes: NONE

Children: NONE

The <br> element identifies a line break in the text. This information is expressed by spacing naturally, and hence the tag is unnecessary.

Cit

Attributes: @work?: string

Children:

  • Structural: NONE
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <cit> element contains a citation or bibliographic reference. The @work attribute optionally provides the name of the source work. It is rarely used.

Class

Attributes: @type: classtype, @lang?: langstr

Children:

  • Structural: NONE
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <class> element classifies a word-entry using the given @type (see classtype). Generally it is non-essential. It should normally not have a @lang attribute; the exception to this is when the @type is “etymlang”, in which case @lang is required; note however that in such a case, <etymlang> is preferred.

<class> elements should generally only appear within <form> and <xpln> tags. This is a case where the LEME-XML design is more flexible than LEME-DB.

Damage

Attributes: @type?: string, @source?: string, @extent?: string

Children:

  • Structural: NONE
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <damage> element identifies damaged text. The optional attributes @type, @source, and @extent specify the nature of the damage. The @type attribute specifies what the damaged text may say, although generally this will be marked “unclear”: if the text is legible, <emend> is preferred. The @extent attribute estimates how many letters are affected by the damage.

Editoraddition

Attributes: NONE

Children:

  • Structural: <subform> <subxpln>
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <editoraddition> element identifies text that has been added by the LEME editor, usually to accommodate the elision of such text in the original document. If two entries have the same explanation, we may write the explanation in the first entry and use this tag to include it in the second.

Emend

Attributes: @err: string, @source?: string

Children:

  • Structural: NONE
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <emend> element identifies text corrected by the editor. The @err attribute specifies what the incorrect original text looked like, while the actual contents of the <emend> element should be the corrected form. The @source attribute describes where the correction comes from. Generally, <emend> should not have any children.

Etym

Attributes: @lang: langstr

Children:

  • Structural: NONE
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <etym> element identifies etymological information in a word-entry. The provided @lang identifies the language of the identified words. <etym> tags should only be used within <form>, <xpln>, <subform> and <subxpln> tags, although some very old documents do not obey this standard. New documents should also avoid including <etym> tags in <form> or <subform> elements and instead prefer to include them at the start of the <xpln> or <subxpln> element.

Etymlang

Attributes: NONE

Children:

  • Structural: NONE
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <etymlang> element identifies a linguistic abbreviation used in an etymological context. It is equivalent to the <class type=“etymlang”> element, and preferred to it. It generally surrounds the name of the language (preferred) but may also precede it.

Expan

Attributes: @type?: string

Children:

  • Structural: NONE
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <expan> element identifies abbreviated text. The @type attribute contains the original, abbreviated form, while the contents of <expan> should be the expanded form of the abbreviation.

Expression

Deprecated: use <term> instead

Attributes: @lang: langstr

Children:

  • Structural: NONE
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <expression> element identifies foreign text of the @lang language. It is equivalent to <term>.

F

Attributes: @type: fontface

Children:

  • Structural: NONE
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <f> element identifies the current font face of the document, based on the value of the @type attribute. As LEME-DB does not require <f> tags to be closed, most <f> tags in LEME-XML are instead made empty by the LEME-DB to LEME-XML preprocessor as a correctness measure. As the font face generally implies a semantic meaning, it is considerably more useful to identify the semantic meaning of the font (etymology, foreign word, inflection, marginal note) than it is to simply acknowledge its face.

Hungword

Attributes: NONE

Children:

  • Structural: NONE
  • Textual: NONE

The <hungword> element is a simple tag to identify portions of the text where a word or words extend over the end of the line and are noted below by the lexicographer. It may contain nothing but text.

I

Deprecated: use <f> instead

Attributes: NONE

Children: NONE

The <i> element identifies a section of text that is italicized. Like <f>, it is closed to ensure correctness of the program, unlike in LEME-DB. It is deprecated.

Infl

Attributes: NONE

Children:

  • Structural: NONE
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <infl> element identifies inflections given in the text. Typically, an <infl> element will contain only text, but occasionally <emend> or <expan> tags.

Lemeformat

Attributes: NONE

Children:

  • Structural: NONE
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <lemeformat> element identifies a quotation, which is formatted as verse. It marks sequences which the LEME database processor formats the text’s lineation literally as it occurs between the tags.

Lemenote

Attributes: @type?: “display”

Children:

  • Structural: NONE
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <lemenote> element is a note left by the editor for the word-entry. The @type attribute is optional, and only supports a “display” type. It should generally only appear inside a <wordentry>.

Lemepagenote

Attributes: @type?: “display”

Children:

  • Structural: NONE
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <lemepagenote> element is a note left by the editor for comments on this particular page. The @type attribute is optional, and only supports a “display” type. It should generally only appear outside <wordentry> tags.

Note

Attributes: @lang: langstr, EITHER @type: notetype OR @location: notetype

Children:

  • Structural: NONE
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <note> element contains a note by the lexicographer on the text. The @type and @location attributes are used interchangeably and both identify where the note appears on the page (see notetype). Generally, a <note> tag should only appear inside a structural element.

Ornament

Attributes: @type: string

Children: NONE

The <ornament> element describes an image or ornamentation that appears on the page. The @type attribute contains a very brief description of the ornament.

P

Deprecated: do not include

Attributes: NONE

Children: NONE

The <p> element identifies a paragraph in the text. This information is expressed by spacing naturally, and hence the tag is unnecessary.

Pb

Attributes: @no?: string, @pg?: string, @fol?: string, @sig?: string

Children: NONE

The <pb> element identifies a page break in the text. Unlike in LEME-DB, which uses <page> tags, LEME-XML uses <pb> tags to conform to XML standards. The four attributes @no, @pg, @fol and @sig are all optional and describe the number, page, folio and signature, respectively, of the page. The @sig attribute is preferred for encoding location.

Sic

Deprecated: use <emend> instead

Attributes: @corr: string

Children:

  • Structural: NONE
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <sic> element identifies incorrect text that appears in the dictionary. It functions identically to <emend> (using the @corr attribute in place of the @err attribute) and is hence deprecated.

Term

Attributes: @lang: langstr

Children:

  • Structural: NONE
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <term> element identifies foreign text of the @lang language. Typically, it identifies when a temporary change in language occurs. It can occur anywhere in the document. Note that <term> may also be used with the @lang specified as “quo”, as an alternative to <cit>.

Xref

Attributes: @type: EITHER “external” OR “notwe”, EITHER @lexeme: string OR @anchor: string OR @target: string

Children:

  • Structural: NONE
  • Textual: <br> <blockquote> <cit> <class> <damage> <editoraddition> <emend> <expan> <expression> <etym> <etymlang> <f> <hungword> <i> <infl> <lemeformat> <lemenote> <lemepagenote> <note> <ornament> <p> <pb> <sic> <term> <xref>

The <xref> element identifies cross references in the document to other entries. The text appearing between the <xref> tags is considered a “link” to another word-entry in the lexicon. There are several ways of identifying the relative or absolute position of the reference, based on the <xref> element’s attributes. The @type attribute is used to identify <xref> elements which point to other lexica (“external”) or information in the same lexicon but outside a word-entry (“notwe”). The @lexeme, @anchor and @target attributes determines the correct word-entry to point to. The @lexeme attribute expects a value that matches the @lexeme attribute of a <form>. The @anchor attribute expects a value that matches the @anchor attribute of a <wordentry>, or a non-zero integer which indicates how many entries “up” (if negative) or “down” (if positive) one should go to find the referenced entry. The @target attribute specifies the <form> text which is directly pointed to by the <xref> element, to avoid issues caused by introductory terms like “vide” or “lok” in the <xref> element’s text.

LEME-XML Attribute Values

Fontface

  • “r” – regular (or Roman)
  • “bl”, “bk” or “br” – black letter
  • “i” or “I” – italic
  • “l”, “oe”, “f” – deprecated faces

A fontface may also be described by combining these values, e.g. “bli” for “Black letter italic”.

Notetype

  • “e” or “endnote”
  • “f” or “footnote”
  • “m” or “margin”
  • “lm” or “lmargin”
  • “rm” or “rmargin”

Entrytype

  • “h” or “headword”
  • “g” or “gloss”
  • “d” or “definition”

Grouptype

  • “a” or “alphabetic”
  • “t” or “topical”
  • “u” or “undifferentiated”
  • “b” or “bilingual”
  • “p” or “polyglot”

Classtype

  • “type”
  • “accent”
  • “archaic”
  • “author”
  • “borrowing”
  • “compound”
  • “contraction”
  • “derivation”
  • “dialect”
  • “diminutive”
  • “distinction”
  • “etymlang”
  • “gender:androgynous”
  • “gender:common”
  • “gender:doubtful”
  • “gender:female”
  • “gender:male”
  • “gender:male&female”
  • “gender:neutral”
  • “gender:omnia”
  • “inflection”
  • “kind”
  • “mnemonic”
  • “neologism”
  • “new”
  • “term”
  • “number:plural”
  • “old”
  • “word”
  • “pos”
  • “pronounce”
  • “proverbial”
  • “register”
  • “semantic”
  • “spelling”
  • “unknown”

Langstr

For the complete list of valid language strings, see here.

Ed. Ian Lancashire and Isabel Zhu, with contributions from Julia DaSilva, Paramita Dutta, Xueqi Fan, Sky Li, Kristie Lui, Annika Sparrell, Timothy Aberdingk Thijm, and Shirley Wang
© 2025 Ian Lancashire