Skip to content

ufdatastudio/mixtec-parser

Repository files navigation

Mixtec Parser

This repo contains a parser and interpreter for a limited grammar intended for Mixtec codices. It is implemented using ANTLR4.

Context Free Grammar (CFG):

S ::= (Sent end)+
Sent ::= Clause | obj (Date Clause | Clause) | Date (obj Clause | Clause)
Clause ::= Clause_f+ (Date_tail | Obj_tail | ɛ)
Date_tail ::= Date (obj Clause_f+ | Clause_f+ | ɛ)
Obj_tail ::= obj (Date Clause_f+ | Clause_f+ | ɛ)
Date ::= y (nd | ɛ)
Clause_f ::= h ( nd | Near_date | ɛ )
Near_date ::= near_obj (nd | ɛ)

Terminal Symbols (Tokens)

h = human figures

y = year symbol

nd = name-date symbol. Represents a name when associated with a human figure and a date when associated with a year

obj = a drawn object that is not a person, year, or name-date and that is not associated with an h token (see near_obj). Examples include typonyms, tables, incense, ballcourts, temples, cities, etc.

near_obj = Object that is note a year/name-date but still associated with a specific person, such as a weapon, head dress, torch, throne, epithet, umbilical cord, etc. A near_obj has a possessive or prepositional relationship with the associated h token. For example, a person's epithet belongs to them; a person sits on a throne; a person holds a torch. Saying that a person is "at" a place is an exception to the near_obj association because typonyms apply to everyone in the scene. More generally, objects that would have any sort of prepositional relationship to multiple human figures in a scene should be considered general objects and handled at interpretation time.

end = end of sentence token

Notes About the Tokenizer

The parser in this repository expects the following additional functions to be completed at the tokenization step.

- nd associated with years are always put after them in tokenized data regarless of how they are drawn. Ditto for nd associated with human figures, representing their names.

- near_obj tokens are always placed after the h token they are associated with during tokenization but before any nd that might be associated with that person.

- Tokenizer inserts the end of sentence token, which is purely a meta-token, at the end of each scene, which should correspond roughly to sentences.

About

Symbolic Parser and Interpreter for XML representation of Mixtec Codices

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors