MARCELL Polish legislative subcorpus v2

The Polish corpus contains 27485 documents of 21 types representing universally binding legal acts (laws, regulations, etc.) or binding internal acts (such as resolutions of the Sejm, Senate and some state administration bodies, e.g. the Council of Ministers). The time span of the documents is 1972–2021 and the set covers only the documents in effect.

The data were retrieved from Dziennik Ustaw and Monitor Polski, the official and publicly available sources of Polish law, publishing Acts of Parliament, Regulations of the Ministers, uniform acts and amendments. The data was converted from editable PDF files to textual format (unfortunately an XML version of those documents was unavailable), tokenized and morphologically analysed with Morfeusz2 morphological analyser, disambiguated with Concraft-pl tagger, named entity recognition with Liner2 and dependency-parsed with COMBO parser. Additional scripts were created (and used) for IATE terms and EuroVoc descriptors annotation.

According to the Polish law, pursuant to Article 4(1) of the Act of 4 February 1994 on copyright and related rights, normative acts and their official drafts are not subject to copyright and as such are in the public domain.