Legislative Rule Extraction

In this post, we discuss the annotation of information from legislation, for example, to create a rule book from legislation. There are two distinct tasks and two tools. First, we want to take the original legislation and annotate it; for this, we use GATE. Second, we want to transform the output of GATE, using the annotations, into some alternative, web-compatible format; for this, we use EXtensible Stylesheet Language Transformations (XSLT). This is presented in STUB. John Cyriac of compliancetrack outlined the problem that is addressed in these two posts. (See introductory notes on this and related posts.)
Sample legislation and text
The text we are working with is a sample from Insurance and Reinsurance (Solvency II) from the European Parliament.
SUBJECT MATTER AND SCOPE
Article 1
Subject matter
This Directive lays down rules concerning the following:
1) the taking-up and pursuit, within the Community, of the self-employed activities of direct insurance and insurance;
2) the supervision in the case of insurance and reinsurance groups;
3) the reorganisation and winding-up of direct insurance undertakings.
Article 2
Scope
1. This Directive shall apply to direct life and non-life insurance undertakings which are established in the territory of a Member State or which wish to become established there. It shall also apply to reinsurance undertakings, which conduct only reinsurance activities, and which are established in the territory of a Member State or which wish to become established there with the exception of Title IV.

There are additional articles which we do not work with. The article is not a logical statement (an If, then statement), but identifies the matters which the directive is concerned with. Each statement of the article may be understood as a conjunct: the rules concern a, b, and c. However, this is not yet relevant to our analysis. See the separate post about rule extraction for conditionals.
Target result
We want to annotate the first article, picking out each section for extraction. In particular, for a practitioner to use the extraction, he should have it in a format which identifies the following:
Reference Code: Article 1
Title: Subject Matter
Level: 1.0
Description: This Directive lays down rules concerning the following:
Level: 1.1
Description: the taking-up and pursuit, within the Community, of the self-employed activities of direct insurance and reinsurance;
Level: 1.2
Description: the supervision in the case of insurance and reinsurance groups;
Level: 1.3
Description: the reorganisation and winding-up of direct insurance undertakings;

Output
The output of GATE appears in the following figure:
Annotating the structure of legislative rules
GATE
To get this output, we used the files and application state in GATELegislativeRulebook.tar.gz.
Text
The text is a fragment of the legislation above and is found in the SmallRulebookText.tex file.
Lists
We use the following lists in addition to standard ANNIE lists, meaning that a lists.def file ought to incorporate the files. This is the resource ListGaz given in the .xgapp file (though this may require some additional fiddling and files to work).

  • roman_numerals_i-xx.lst: It has majorType = roman_numeral. This is a list of roman numbers from i to xx.
  • rulebooksectionlabel.lst: It has majorType = rulebooksection. This is a list of section headings such as: Subject matter, Scope, Statutory systems, Exclusion from scope due to size, Operations, Assistance, Mutual undertakings, Institutions, Operations and activities.

The list of section headings is taken from the legislation, which presumably follows standard guidelines for section heading labels. For the list of roman numerals, there are more general methods using Regex to match well-formed numerals (see Roman Numerals in Python and Regex for Roman Numerals); however, for our purposes it is simpler to use limited lists rather than Regex. In either case, several problems arise, as we see later.
JAPE rules

  • ListArticleSection.jape: What is annotated with Article (from the lookup) and a number is annotated ArticleFlag.
  • ListFlagLevel1.jape: The string number followed by a period of closed parenthesis is annotated ListFlagLevel1.
  • ListFlagLevel1sub.jape: A number followed by a letter followed by a period is annotated ListFlagLevel1sub.
  • ListFlagLevel2.jape: A string of lower case letters followed by a closed parenthesis is annotated ListFlagLevel2.
  • ListFlagLevel3.jape: A roman number from a lookup list followed by a closed parenthsis is annotated ListFlagLevel3.
  • RuleBookSectionLabel.jape: Looks up section labels from a list and annotates them SectionType. For example, Subject matter, Scope, and Statutory systems.
  • ListStatement01.jape: A string which occurs between SectionType annotation and a colon is annotated ListStateTop.
  • ListStatement02.jape: A string which occurs between a ListFlagLevel1 and a semicolon is annotated SubListStatementPrefinal.
  • ListStatement03.jape: A string which occurs between a ListFlagLevel1 and a period is annotated SubListStatementFinal.

Application order
The order of application of the processing resources is:

  • Document Reset PR
  • ANNIE Sentence Splitter
  • ANNIE English Tokeniser
  • ListGaz
  • RulebookSectionLabel:
  • ListArticleSection
  • ListStatement01
  • ListFlagLevel01
  • ListStatement02
  • ListStatement03

Additional issues
This example does not show the other list flag levels (e.g. using letters, roman numerals etc.), nor the results on other parts of the legislation.
While the result for the specific text is attractive, there is much work to be done. The lists and rules overgenerate. For example, the rules indicate that avrt is a level flag because v is recognised as a roman numeral. In other cases, too long a passage is selected as the statement at the top of the list. Yet, the example is still useful to demonstrate a proof of concept, particularly in conjunction with the post on XSLT.
By Adam Wyner
Distributed under the Creative Commons
Attribution-Non-Commercial-Share Alike 2.0

One thought on “Legislative Rule Extraction”

Leave a Reply

Your email address will not be published. Required fields are marked *