RRGparbank is a treebank of syntactic structures based on Role and Reference Grammar (RRG; Van Valin and LaPolla 1997; Van Valin 2005).
The corpus contains parallel text based on George Orwell's novel 1984 and translations thereof. The data is partly taken from the multilingual dataset Multext-East (Erjavec, T. (2017)), partly added via the integration of further translations in languages that are not covered in Multext-East. So far, the parallel treebank covers English (entire novel), German (only seed data), French (only seed data), Russian (only seed data). Farsi (only seed data) is currently being integrated.
EN | EN-SEED | DE-SEED | FR-SEED | RU-SEED | FA-SEED | |
---|---|---|---|---|---|---|
Number of sentences | 6 737 | 1 450 | 1 454 | 1 555 | 1 416 | 1 476 |
Average sentence length | 18.2 | 16.4 | 16.1 | 15.9 | 12.5 | 15.2 |
The annotation is being undertaken within the ERC project TreeGraSP at the University of Düsseldorf.
An online parser, with a multilingual model trained on 4 languages of rrgparbank (en, de, fr, ru), is available here: rrgparser.
If you use RRGparbank, please cite Bladier et al. (2022).