{"id":36,"date":"2019-10-31T16:15:54","date_gmt":"2019-10-31T16:15:54","guid":{"rendered":"https:\/\/leme.utoronto.ca\/?page_id=36"},"modified":"2020-01-28T17:26:23","modified_gmt":"2020-01-28T17:26:23","slug":"encoding","status":"publish","type":"page","link":"https:\/\/leme.utoronto.ca\/?page_id=36","title":{"rendered":"From Database to LEME-XML"},"content":{"rendered":"\n<p>LEME maintains three separate encodings for each document\nin its database: an \u201cXML-like\u201d database encoding (henceforth referred to as\n\u201cLEME-DB\u201d); a strict XML-compliant encoding (\u201cLEME-XML\u201d), and a TEI-compliant\nXML encoding (\u201cLEME-TEI\u201d). <\/p>\n\n\n\n<p>The LEME-DB encoding was produced by Ian Lancashire\nand Marc Plamondon in 2006 for the LEME 1.0 database (2006), based on the Early\nModern English Dictionaries Database encoding (1996). LEME-XML, by Timothy\nAberdingk Thijm, is produced programmatically from LEME-DB and hence must\nremain in some sense \u201cbackwards compatible\u201d with it. LEME-TEI, by Sky Li, is produced\nprogrammatically from LEME-XML. Yet LEME encoding is rooted less in the\ndatabase than in the original structures of its old lexical texts. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Basics<\/h2>\n\n\n\n<p>LEME maintains three separate encodings for each\ndocument in its database: an \u201cXML-like\u201d database encoding (henceforth referred\nto as \u201cLEME-DB\u201d); a strict XML-compliant encoding (\u201cLEME-XML\u201d), and a\nTEI-compliant XML encoding (\u201cLEME-TEI\u201d). The first two encodings are of primary\ninterest to the LEME editorial staff.<\/p>\n\n\n\n<p>The LEME-DB encoding was produced by Ian Lancashire\nand Marc Plamondon in 2006 for the LEME 1.0 database (2006), based on the Early\nModern English Dictionaries Database encoding (1996). LEME-XML and LEME-TEI are\nproduced programmatically from LEME-DB and hence must remain in some sense\n\u201cbackwards compatible\u201d with it. Yet LEME encoding is rooted less in the\ndatabase than in the original structures of its old lexical texts. <\/p>\n\n\n\n<p>Timothy Aberdingk Thijm describes the basics of the\nLEME encodings below. His &nbsp;catalogue of\nthe valid XML tags of the intermediary LEME-XML encoding, their contents,\nappears in the appendix.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Basics<\/h4>\n\n\n\n<p>The primary unit of LEME\u2019s texts is the word-entry,\nwhich represents a basic lexicographic unit of the text. Word-entries have a\nform, which stores the headword and its lexemes, and an explanation, which\nstores the elaboration of the form by the author. I am deliberately avoiding\ncalling this a \u201cdefinition\u201d, as LEME\u2019s texts tend to predate this distinction,\nleading to a diversity of word-entry styles. Some dictionaries can contain\ncomplex explanations of word-entries, extended parenthetical remarks, fused\nmonolingual and bilingual word-entries, embedded etymological information, and\nobsolescent language theory (more discussion of what belongs in a form or\nexplanation can be found below).<\/p>\n\n\n\n<p>Proceeding up from the word-entry, LEME identifies\nword-groups, which usually mark the lexicographic subsections of each\ndictionary (e.g., word-group \u201cA\u201d, word-group \u201cAb\u201d). These word-groups fall into\nsections, which may contain remarks on the contents or text that does not\nresemble a word-entry. All sections belong to a LEME element which encloses the\nentire text and identifies it.<\/p>\n\n\n\n<p>Throughout each of these units (word-entry, form,\nexplanation, word-group, section, LEME) are various minor textual elements,\nindicating, for instance, damage, marginal notes (by the lexicographer or LEME\npersonnel), font changes, foreign words and etymologies and page breaks. These\nelements are not bound to certain parts of the text by LEME in general,\nalthough exceptions exist when converting to LEME-TEI (see \u201cLEME-TEI\u201d, below).<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">LEME-XML Tags<\/h4>\n\n\n\n<p>Each XML tag used by LEME-XML is described <a href=\"https:\/\/leme.utoronto.ca\/?page_id=78\">here<\/a>. This description is based on the edition of the LEME-XML RelaxNG schema current in August 2018.<\/p>\n\n\n\n<p>Each element is identified by its tag name. Tag names are written in <strong>bold font <\/strong>with triangular braces, e.g. <strong><code>&lt;root&gt;<\/code><\/strong>. I include each tag\u2019s possible XML attributes and their values. XML attributes and values are likewise written in <strong>bold font<\/strong> with an \u201c@\u201d symbol before the attribute name and a colon before the description or list of values, e.g. <strong><code>@no: string<\/code><\/strong>. Optional attributes include a question mark before the colon, e.g. <strong><code>@type?: string<\/code>. <\/strong>I also include each tag\u2019s possible child XML tags and a description of when and how it is used. <\/p>\n\n\n\n<p>Children are divided into two groups: structural and\ntextual. Structural tags indicate the structure of the lexicon; they are all\nreferenced in the Basics section above. Textual tags are quite diverse and\ncover all remarks on the text, such as annotating words as foreign, citations,\ndamage, notes and so forth. Generally, the LEME-XML schema is more permissive\nthan need be in terms of what children are allowed: LEME-DB\u2019s specification may\nclaim to disallow a child that is allowed by LEME-XML. This measure is to\nprovide some degree of flexibility to LEME-XML in case a LEME-DB file disobeys\nthe standard. Hence, the list of children is permissive and shows all possible\nchildren that can go in the element. See the description for what <em>should <\/em>go or not go in the element,\ndespite what the schema accepts.<\/p>\n\n\n\n<p>Some tags are deprecated. While they may still appear in older documents, their use should be generally avoided. Preference may be given under the entry for another tag when relevant. When referred to by tag name, deprecated tags are written in <strong>bold font<\/strong> and <em>italics<\/em>, e.g. <strong><em><code>&lt;set&gt;<\/code><\/em><\/strong>.<\/p>\n\n\n\n<p>If an attribute only accepts a particular set of values, these will be listed after the elements and identified by an alias such as \u201c<strong>entrytype<\/strong>\u201d or \u201c<strong>langstr<\/strong>\u201d. Where the term <strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/String_(computer_science)\">string<\/a><\/strong> is used, it means <em>any<\/em> sequence of Unicode characters, enclosed by quotation marks.<\/p>\n\n\n\n<p>Below is an annotated example.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Wordgroup1\n(the name of the tag, with the first letter capitalized)<\/h3>\n\n\n\n<p>Attributes: <strong><code>@type: grouptype<\/code><\/strong><code>, <\/code><strong><code>@lang?: langstr, @object?: string<\/code> <\/strong>(the list of attributes and their allowed values)<\/p>\n\n\n\n<p>Children:\n(the possible children of this element; any child listed may appear in any\norder)<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Structural: <strong><code>&lt;wordentry&gt; &lt;wordgroup2&gt; &lt;alpha&gt; &lt;heading&gt;<\/code><\/strong><\/li><li>Textual: <strong><em><code>&lt;br&gt; <\/code><\/em><code>&lt;blockquote&gt; &lt;cit&gt; &lt;class&gt; &lt;damage&gt; &lt;editoraddition&gt; &lt;emend&gt; &lt;expan&gt; <\/code><em><code>&lt;expression&gt;<\/code><\/em><code> &lt;etym&gt; &lt;etymlang&gt; &lt;f&gt; &lt;hungword&gt; <\/code><em><code>&lt;i&gt; <\/code><\/em><code>&lt;infl&gt; &lt;lemeformat&gt; &lt;lemenote&gt; &lt;lemepagenote&gt; &lt;note&gt; &lt;ornament&gt; <\/code><em><code>&lt;p&gt; <\/code><\/em><code>&lt;pb&gt; &lt;sic&gt; &lt;term&gt; &lt;xref&gt;<\/code><\/strong><\/li><\/ul>\n\n\n\n<p>The <strong><code>&lt;wordgroup1&gt;<\/code> <\/strong>element surrounds a section of the dictionary such as words beginning with the letter \u201cA\u201d. It has a <strong><code>@type<\/code> <\/strong>attribute which specifies what grouping is being made (see <strong>grouptype<\/strong>). The element may also have an optional <strong><code>@lang<\/code> <\/strong>attribute to specify the language of the enclosed content, and an optional <strong><code>@object<\/code> <\/strong>attribute which typically contains an editorially-spelled uppercase form of the group\u2019s header, such as \u201cA\u201d\u2026. (The description of the element)<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><em>Notes on TEI<\/em><em><\/em><\/h4>\n\n\n\n<p>The LEME-TEI conversion system produces a valid TEI\nencoded text from a LEME-XML encoded one. LEME-TEI texts do not necessarily use\nall the most common TEI conventions, and in fact may seem sparse compared to\nstandard TEI documents as far as tagging goes. This is a deliberate measure to\ncompromise between LEME\u2019s and TEI\u2019s structure. The LEME-TEI encoding uses TEI\u2019s\nP5 guidelines\u2019 schema, and makes use of the <strong>core, dictionaries, figures, header, linking <\/strong>and <strong>textstructure <\/strong>modules as provided\nthrough the <a href=\"http:\/\/www.tei-c.org\/Roma\/\">Roma<\/a>\nweb tool.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>LEME maintains three separate encodings for each document in its database: an \u201cXML-like\u201d database encoding (henceforth referred to as \u201cLEME-DB\u201d); a strict XML-compliant encoding (\u201cLEME-XML\u201d), and a TEI-compliant XML encoding (\u201cLEME-TEI\u201d). The LEME-DB encoding was produced by Ian Lancashire and Marc Plamondon in 2006 for the LEME 1.0 database (2006), based on the Early Modern [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_genesis_hide_title":false,"_genesis_hide_breadcrumbs":false,"_genesis_hide_singular_image":false,"_genesis_hide_footer_widgets":false,"_genesis_custom_body_class":"","_genesis_custom_post_class":"","_genesis_layout":"","footnotes":""},"class_list":{"0":"post-36","1":"page","2":"type-page","3":"status-publish","5":"entry"},"_links":{"self":[{"href":"https:\/\/leme.utoronto.ca\/index.php?rest_route=\/wp\/v2\/pages\/36","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/leme.utoronto.ca\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/leme.utoronto.ca\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/leme.utoronto.ca\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/leme.utoronto.ca\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=36"}],"version-history":[{"count":7,"href":"https:\/\/leme.utoronto.ca\/index.php?rest_route=\/wp\/v2\/pages\/36\/revisions"}],"predecessor-version":[{"id":1247,"href":"https:\/\/leme.utoronto.ca\/index.php?rest_route=\/wp\/v2\/pages\/36\/revisions\/1247"}],"wp:attachment":[{"href":"https:\/\/leme.utoronto.ca\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=36"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}