RRGparbank is a treebank of syntactic structures based on Role and Reference Grammar (RRG; Van Valin and LaPolla 1997; Van Valin 2005).
The corpus contains parallel text based on George Orwell's novel 1984 and translations thereof. The data is partly taken from the multilingual dataset Multext-East (Erjavec, T. (2017)), partly added via the integration of further translations in languages that are not covered in Multext-East. So far, the parallel treebank covers English (entire novel), German (only seed data), French (only seed data), Russian (only seed data). Farsi (only seed data) is currently being integrated.
|Number of sentences||6 737||1 450||1 454||1 555||1 416||1 476|
|Average sentence length||18.2||16.4||16.1||15.9||12.5||15.2|
The annotation is being undertaken within the ERC project TreeGraSP at the University of Düsseldorf.
An online parser, with a multilingual model trained on 4 languages of rrgparbank (en, de, fr, ru), is available here: rrgparser.
If you use RRGparbank, please cite Bladier et al. (2022).