{"id":425,"date":"2019-12-05T18:58:45","date_gmt":"2019-12-05T18:58:45","guid":{"rendered":"https:\/\/leme.utoronto.ca\/?page_id=425"},"modified":"2020-01-22T18:47:46","modified_gmt":"2020-01-22T18:47:46","slug":"editorial-method","status":"publish","type":"page","link":"https:\/\/leme.utoronto.ca\/?page_id=425","title":{"rendered":"Editorial Practice"},"content":{"rendered":"\n<p>LEME uses mySQL database structures and an XML-encoded corpus because its method derives from corpus linguistics (McEnery and Hardie 2012). Conclusions about diachronic English \u2013 the language whose speakers no longer alive to answer questions &#8212; are best based on text samples chosen from a large collection of word-entries written by them. LEME assembles them from a multi-genre body of bilingual, monolingual, and polyglot dictionaries, spelling lists, definition collections, and glossaries that serve fields as diverse as medicine, botany, law, husbandry, horsemanship, architecture, education, architecture, surveying, and navigation. <\/p>\n\n\n\n<p>The LEME database lists 1,387 different books and manuscripts in its primary bibliography (not counting routine re-editions), 30,529 pages of text from 272 transcribed works, 1,135,142 word-entries, 1,216,763 encoded forms and sub-forms, and 231,431 lexemes (or lemmatized terms). Each text has a text id number. It can be found in two places: the LEME 2.0 filename of the text, and the pop-up entry window for a word retrieved from that text. To retrieve the permanent URL for a word-entry, search for a word in the concerned text, click on any entry from the generated hits to get a pop-up window of that entry, and click again on the center bottom icon, (-), to retrieve the permanent URL for the entry. For example, Sir Thomas Elyot&#8217;s word-entry on &#8220;hyphen&#8221; in the 1538 edition has the URL<\/p>\n\n\n\n<p style=\"text-align:center\"><a href=\"https:\/\/leme.library.utoronto.ca\/lexicon\/entry\/53\/7464\">https:\/\/leme.library.utoronto.ca\/lexicon\/entry\/53\/7464<\/a><\/p>\n\n\n\n<p>in which 53 is the text id, and 7464 is the word-entry id. This URL will retrieve this word-entry from any workstation online <em>not<\/em> running LEME.<\/p>\n\n\n\n<p>The LEME Corpus website\nlists all language texts 1475-1625, whether edited or not, in analytic\nparagraphs, and indexes them by chronology, subject, and proper name. Once a\ntext has been uploaded into TSpace, it opens with a metadata page that\nidentifies the text and its LEME id.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><em>Principles <\/em><\/h4>\n\n\n\n<p>The\neditorial principles that govern LEME are intended to represent lexical\ninformation as the Early Modern period understood it. LEME uses a present-day\nEnglish alphabet but otherwise keeps to old spelling and original lineation.<a href=\"#ftn1\" id=\"ftnref1\">[1]<\/a> Its texts are not\ndiplomatic. They do not reproduce display elements (font and illustrations) and\nbibliographical information <em>in the text <\/em>(running\ntitles, signatures, page and folio numbers, and catchwords), and they treat\nsome common letter-forms (e.g., long-<em>s<\/em>,\ndifferent forms of <em>r<\/em>, ligatures such\nas <em>ct<\/em>) as the common single\ncharacters we use today. Images of most original texts, readily available\nonline at EEBO\/TCP, deliver these details. It is true that font sometimes\nusefully identifies language, but unlike even critical editions LEME identifies\nthe language of all words explicitly in a form, explanation, or term tag. <\/p>\n\n\n\n<p>Not all expanded later editions of a lexical text, updated by a lexicographer, have been transcribed, but we recognize that they should be. Infrequently, an EEBO image-set was damaged; we would then use text from another copy of the same edition, or even from a later edition, and admit the fact. Most LEME texts have been entered and encoded in the LEME lab. <\/p>\n\n\n\n<p>The earliest LEME transcriptions date from the early 1990s. They include very large texts like Palsgrave, Cotgrave, Thomas Thomas, Florio (1598), and Minsheu (1599). I normally chose the earliest copytext, although the availability of an TCP transcription in a late edition was too valuable not to use. We adapt many texts available in EEBO\/TCP and the Internet Archive and have outsourced the entry of large dictionaries to various firms, recently to Apex Covantage. We have certainly not encoded everything, desirable though that is. <\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><em>Tools<\/em><\/h4>\n\n\n\n<p>The University of Toronto Library has chosen the programming languages of our Web database software. We use programmer&#8217;s editors, UltraEdit and Notepad ++, to input and encode texts in Unicode-based well-formed XML. Babelmap has proved itself very capable in helping us write with non-Roman character sets, especially Greek and Hebrew. The database editorial-tools page gives us a processing function to validate our XML-like database encoding and to add lemma elements to form and explanation tags. So does a stand-alone program devised by Tim Thijm for our corpus texts. LEME accepts headwords in the online OED as standard lemmatized forms that everyone should follow. <\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><em>Encoding <\/em><\/h4>\n\n\n\n<p>LEME encoding is just a start on what can be done. It had to impose one encoding language on everything, and because its texts are highly variable in structure (both as a result of different functions and of a lexicographer&#8217;s familiarity with the history of word-entries), the simpler the set of tags, the fewer errors in applying them. Thus LEME operates with as minimal an XML tag-set as possible.<a id=\"ftnref2\" href=\"#ftn2\">[2]<\/a> For example, most lexicographers of the period do not indicate senses, and attempts to delineate them seem speculative. Our lexicographical tag-set grew slowly. In the early 1990s, I used COCOA (Oxford Text Concordance) tags but shifted soon to SGML following recommendations of the Text Encoding Initiative. LEME eventually deprecated some SGML tags, such as for font and part of speech, gender, and grammatical inflections explicitly stated in a text. Early Modern English lexicographers from time to time employed cross-references between word-entries, but often the target for them could not be found, and so I was content to tag the cross-reference without linking to its supposed target.<a id=\"ftnref3\" href=\"#ftn3\">[3]<\/a> The LEME.xml tagset follows the database tags closely.<\/p>\n\n\n\n<p>Our\nToronto librarians and some others have expressed concern that we have not\nreleased the texts in Text Encoding Initiative (TEI) encoding, without question\nan excellent language for humanities texts that use the XML standard. We have\nseveral reasons for choosing LEME.xml tags. Our current LEME-to-TEI conversion\nprogram encounters many exceptions and cannot yet be automated. Secondly,\nalthough the recommended TEI subset of tags concerns dictionaries, the number\nof actual LEME dictionaries is overwhelmed by other genres that introduce\nword-entries (e.g., grammars, herbals, spelling lists, treatises with\ndefinitions, concordances, etc.). The Early Modern English worked steadily but\nslowly toward formalizing its lexicographical structures. Fourth, the dominant\ntheory of the word-entry at this time involves two quite different definitions,\none for the headword (a network of other words) and one for the thing named by\nthe headword (the so-called logical definition found in classical rhetoric).\nHow TEI could implement this structure and yet remain within its dictionary\nencoding subset&nbsp; is questionable. Thus\nthe structure, naming, and function of many tags in LEME texts depart from\nthose in recommended by the Dictionaries module in TEI P5. To most researchers,\nthese differences will not be self-explanatory, but to historical lexicologists\nthey are significant. In such circumstances, LEME recognizes that user\nconvenience can trump scholarly considerations. The Creative Commons 4.0\ncopyright designation enables researchers or institutions themselves to revise\nthe encoding along with the texts. TEI-affectionados can replace the LEME\nencoding language with TEI if they wish.<\/p>\n\n\n\n<p>LEME\nlemmatizes English headwords and other important English words in word-entries\nin an additional .xml file for each text. For researchers in non-English\nlanguages, English headwords may be undesirable. LEME is not able to lemmatize\nheadwords in other languages. Form and explanation tags include elements for\nlemmas (lexemes) that follow OED headword spellings. No such standard existed\nin the Early Modern period, and it would have been unwise to impose editorially\na set of arbitrary spelling conventions. Not a few radical reforms of English\nspelling systems failed in the Early Modern period.<\/p>\n\n\n\n<p>LEME\nemends errors in the text lightly, usually only for typos and foul case, and\nretains the erroneous form in a tag. LEME also expands contractions without\nidentifying their marks of abbreviation because there is no standard for naming\nthem as Unicode does language characters. My attempt to use, in an expansion\ntag, an arbitrary encoding for the shapes of abbreviated characters (e.g.,\n&#8220;a+_&#8221; for &#8220;a-macron,&#8221; that is, expanded <em>am<\/em> or <em>an<\/em>) has been recently deprecated. One spelling often is abbreviated\nby quite different characters, and the same abbreviation may be expanded into\nquite different spellings.<a href=\"#ftn4\" id=\"ftnref4\">[4]<\/a> Anyway, outside of early\nLatin and English dictionaries remaining from the early fifteenth century, most\nexpansions are obvious to readers. We have benefited from EEBO\/CP editorial\nguidelines on special characters and scholarly papers on Renaissance Greek\nligatures and abbreviations. An diligent attempt has been made to reproduce\nGreek and Hebrew characters but readers are advised that illegibility and\nLEME&#8217;s editorial unfamiliarity with these languages may have produced some odd\nresults. <\/p>\n\n\n\n<p>We welcome all corrections.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><em>Textual\nCommentary<\/em><\/h4>\n\n\n\n<p>In database lemenotes at the bottom of a word-entry, we registered English words antedating the earliest OED citation or &#8220;not found&#8221; in the OED. A &#8220;not found&#8221; lemenote does not mean that the questioned word-form is not in the OED; it only means that LEME researchers have not located it there. The database, unfortunately, does not give the date when we viewed the OED; and if OED has added our suggested information to its word-entry after we viewed it, LEME will &nbsp;not know. OED is in a state of constant updating and can be counted on to attend to readers&#8217; suggestions. For this reason, we are remapping our LEME headwords to OED headwords. This step, while time-consuming, offers the best mechanism for comparing the vocabulary that Early Modern word-entries knew about, and the considerably larger vocabulary that OED has assembled. As well, this step enables us to insert the date when we viewed, in a LEME text, either an antedating or a word that we have not found in the OED.<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<p><a href=\"#ftnref1\" id=\"ftn1\">[1]<\/a> It is important to be as explicit as one can in distinguishing an end-of-line hyphen as soft or hard.<\/p>\n\n\n\n<p><a href=\"#ftnref2\" id=\"ftn2\">[2]<\/a> Tags in texts processed by the LEME 2.0 database are xml-like. The tagged entries can be viewed in the pop-up windows in which word-entries generated in response to a search request appear.<\/p>\n\n\n\n<p><a href=\"#ftnref3\" id=\"ftn3\">[3]<\/a> Researchers can\nuse a search function to locate these.<\/p>\n\n\n\n<p><a href=\"#ftnref4\" id=\"ftn4\">[4]<\/a> The prospect of encoding abbreviations as documented in Capelli is daunting, but groups are working on it. See <strong>Joel <\/strong>Fredell<strong>, Charles Borchers <\/strong>IV,<strong> and Terri <\/strong>Ilgen, &#8220;TEI P5 and Special Characters Outside Unicode,&#8221; <em>Journal of the Text Encoding Initiative<\/em> 4 (2013). https:\/\/journals.openedition.org\/jtei\/727<\/p>\n","protected":false},"excerpt":{"rendered":"<p>LEME uses mySQL database structures and an XML-encoded corpus because its method derives from corpus linguistics (McEnery and Hardie 2012). Conclusions about diachronic English \u2013 the language whose speakers no longer alive to answer questions &#8212; are best based on text samples chosen from a large collection of word-entries written by them. LEME assembles them [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_genesis_hide_title":false,"_genesis_hide_breadcrumbs":false,"_genesis_hide_singular_image":false,"_genesis_hide_footer_widgets":false,"_genesis_custom_body_class":"","_genesis_custom_post_class":"","_genesis_layout":"","footnotes":""},"class_list":{"0":"post-425","1":"page","2":"type-page","3":"status-publish","5":"entry"},"_links":{"self":[{"href":"https:\/\/leme.utoronto.ca\/index.php?rest_route=\/wp\/v2\/pages\/425","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/leme.utoronto.ca\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/leme.utoronto.ca\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/leme.utoronto.ca\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/leme.utoronto.ca\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=425"}],"version-history":[{"count":33,"href":"https:\/\/leme.utoronto.ca\/index.php?rest_route=\/wp\/v2\/pages\/425\/revisions"}],"predecessor-version":[{"id":1019,"href":"https:\/\/leme.utoronto.ca\/index.php?rest_route=\/wp\/v2\/pages\/425\/revisions\/1019"}],"wp:attachment":[{"href":"https:\/\/leme.utoronto.ca\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=425"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}