Skip to content

London Word Tagging

Gabriel Bodard edited this page Dec 15, 2025 · 2 revisions

All original (Greek, Latin, Punic, etc.) characters in the edition, including restored/supplied words, should be tagged with one of the following elements (this is enforced in Oxygen by the project-specific Schematron file); spaces and modern punctuation may be left untagged; ancient punctuation should use <g/>.

See also EpiDoc Guidelines on Words and Lemmatisation.

  • w - a lexical word, not known to be a proper name etc.
    • if an incomplete word, use the attribute @part
      • part="I" - the initial part of a word (i.e. the end is missing or unresolvable)
      • part="M" - the middle part of a word (i.e. the beginning and end are both missing or unresolvable)
      • part="F" - the final part of a word (i.e. the beginning is missing or unresolvable)
      • (rarely) part="Y" - an obviously incomplete word, but not sure whether it is initial/final etc.
    • You do not need to lemmatise this word, as this will be done automatically at a later stage. If you know if will be problematic (rare, fragmentary, irregular spelling, etc.) you may lemmatise manually if you wish, using <w lemma="acerbus">akerbus</w>.
  • name - a personal name (including cognomina; but not "Imperator").
    • an imperial cognomen such as "Sarmaticus" should be tagged as a name.
    • name cannot take the @part attribute in TEI, so in the case of an incomplete name, a <seg> element needs to appear inside the name, with @part (values as for words, above)
    • a name that is both an onomastic label and a meaningful dictionary word, may be tagged as both name and w, although this is relatively seldom used in our projects
  • placeName - a name of a place
    • if a proper adjective - @type="ethnic" (although TEI suggests this is more properly orgName)
    • for a colony name, e.g. colonia Septimia Lepcis Magna - tag "colonia" as a word, "Septimia" as a name and "Lepcis Magna" as a placeName:
      • i.e. colonia Septimia Lepcis Magna
  • num - a numeral
  • g - a non-alphabetic symbol such as "denarius," "leaf" or "year" (either for which no Unicode code-point exists, or which is not easy to type, or is not traditionally printed as a character in Leiden)
  • abbr - an abbreviation for which we do not know the expansion; e.g. "υ(...)"
  • orig - none of the above, text that we can not resolve in any way (only if the editor has/would put this word in uppercase in Leiden)

Labelling a person

In addition to tokenization tagging, as mentioned above, any reference to a person (which may be made up of names and/or words/placenames, etc.) should be tagged as <persName>. Each <persName> must take one of the following @type values:

  • attested - any person attested other than emperors, consuls, gods etc.
  • ruler - a member of the imperial or ruling families (in former projects "emperor")
  • divine - a god, hero, angel, personification or other divine entity
  • other - mostly historical or literary figures (rarely used)
  • consular - only if a consul/archon/priest cited for dating (even more rarely used)

Example:

 <persName type="attested">
     <name type="praenomen"><expan>M<ex>arcus</ex></expan></name>
     <name type="gentilicium">Iulius</name>
     <name type="cognomen">Aurelianus</name>
 <persName>

Schematron enforcement

As noted above, the project-specific Schematron schema will flag an error if the rules under Tokenization above are not followed. In other words, if any alphabetic character in Latin or Greek (or indeed any other character except for recognised punctuation) is not tagged with one or more of <w>, <name>, <placename>, <num>, <g>, <abbr> or <orig>, a validation error will occur.

In addition, any <name> tag must appear inside either <persName> or <placename>, or a validation error will occur.

Clone this wiki locally