RRGparbank is a treebank of syntactic structures based on Role and Reference Grammar (RRG; Van Valin and LaPolla 1997; Van Valin 2005).

The corpus contains parallel text based on George Orwell's novel 1984 and translations thereof. The data is partly taken from the multilingual dataset Multext-East (Erjavec, T. (2017)), partly added via the integration of further translations in languages that are not covered in Multext-East. So far, the parallel treebank covers English (entire novel), German (only seed data), French (only seed data), Russian (only seed data). Farsi (only seed data) is currently being integrated.

Size of the treebank. All English data and the seed data in all languages (except for Farsi) is annotated at least once. The annotation is still ongoing.
EN EN-SEED DE-SEED FR-SEED RU-SEED FA-SEED
Number of sentences 6 737 1 450 1 454 1 555 1 416 1 476
Average sentence length 18.2 16.4 16.1 15.9 12.5 15.2

The annotation is being undertaken within the ERC project TreeGraSP at the University of Düsseldorf.

An online parser, with a multilingual model trained on 4 languages of rrgparbank (en, de, fr, ru), is available here: rrgparser.

If you use RRGparbank, please cite Bladier et al. (2022).

References

Imprint · Privacy · Header image by Lerkrat Tangsri