Mixtec Parser

This repo contains a parser and interpreter for a limited grammar intended for Mixtec codices. It is implemented using ANTLR4.

Context Free Grammar (CFG):

S ::= (Sent end)+
Sent ::= Clause | obj (Date Clause | Clause) | Date (obj Clause | Clause)
Clause ::= Clause_f+ (Date_tail | Obj_tail | ɛ)
Date_tail ::= Date (obj Clause_f+ | Clause_f+ | ɛ)
Obj_tail ::= obj (Date Clause_f+ | Clause_f+ | ɛ)
Date ::= y (nd | ɛ)
Clause_f ::= h ( nd | Near_date | ɛ )
Near_date ::= near_obj (nd | ɛ)

Terminal Symbols (Tokens)

h = human figures

y = year symbol

nd = name-date symbol. Represents a name when associated with a human figure and a date when associated with a year

obj = a drawn object that is not a person, year, or name-date and that is not associated with an h token (see near_obj). Examples include typonyms, tables, incense, ballcourts, temples, cities, etc.

near_obj = Object that is note a year/name-date but still associated with a specific person, such as a weapon, head dress, torch, throne, epithet, umbilical cord, etc. A near_obj has a possessive or prepositional relationship with the associated h token. For example, a person's epithet belongs to them; a person sits on a throne; a person holds a torch. Saying that a person is "at" a place is an exception to the near_obj association because typonyms apply to everyone in the scene. More generally, objects that would have any sort of prepositional relationship to multiple human figures in a scene should be considered general objects and handled at interpretation time.

end = end of sentence token

Notes About the Tokenizer

The parser in this repository expects the following additional functions to be completed at the tokenization step.

- nd associated with years are always put after them in tokenized data regarless of how they are drawn. Ditto for nd associated with human figures, representing their names.

- near_obj tokens are always placed after the h token they are associated with during tokenization but before any nd that might be associated with that person.

- Tokenizer inserts the end of sentence token, which is purely a meta-token, at the end of each scene, which should correspond roughly to sentences.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.antlr		.antlr
Scenes		Scenes
antlr		antlr
recursive_descent		recursive_descent
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
main.py		main.py
test_recursive_descent.py		test_recursive_descent.py
test_scenes_with_antlr_parser.py		test_scenes_with_antlr_parser.py
test_scenes_with_recursive_descent_parser.py		test_scenes_with_recursive_descent_parser.py
test_xml_interpretation.py		test_xml_interpretation.py
tokens.py		tokens.py
tree_node.py		tree_node.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mixtec Parser

Context Free Grammar (CFG):

Terminal Symbols (Tokens)

Notes About the Tokenizer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mixtec Parser

Context Free Grammar (CFG):

Terminal Symbols (Tokens)

Notes About the Tokenizer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages