Over the last couple of months, I have had discussions about text mining and annotating rules in legislation with several people (John Sheridan of The Office of Public Sector Information, Richard Goodwin of The Stationery Office, and John Cyriac of Compliance Track). While nothing yet concrete has resulted from these discussions, it is clearly a “hot topic”.
In the course of these discussions, I prepared a short outline of the issues and approaches, which I present below. Comments, suggestions, and collaborations are welcome.
Vision, context, and objectives
One of the main visions of artificial intelligence and law has been to develop a legislative processing tool. Such a tool has several related objectives:
- [1.] To guide the drafter to write well-formed legal rules in natural language.
- [2.] To automatically parse and semantically represent the rules.
- [3.] To automatically identify and annotate the rules so that they can be extracted from a corpus of legislation for web-based applications.
- [4.] To enable inference, modeling, and consistency testing with respect to the rules.
- [5.] To reason with respect to domain knowledge (an ontology).
- [6.] To serve the rules on the web so that users can use natural language to input information and receive determinations.
While no such tool exists, there has been steady progress on understanding the problems and developing working software solutions. In early work (see The British nationality act as a logic program (1986)), an act was manually translated into a program, allowing one to draw inferences given ground facts. Haley is a software and service company which provides a framework which partially addresses 1, 2, 4, and 6 (see Policy Automation). Some research addresses aspects of 3 (see LKIF-Core Ontology). Finally, there are XML annotation schemas for legislation (and related input support) such as The Crown XML Schema for Legislation and Akoma Ntoso, both of which require manual input. Despite these advances, there is much progress yet to be made. In particular, no results fulfill [3.].
In consideration of [3.], the primary objective of this proposal is to use the General Architecture for Text Engineering (GATE) framework in order to automatically identify and annotate legislative rules from a corpus. The annotation should support web-based applications and be consistent with semantic web mark ups for rules, e.g. RuleML. A subsidiary objective is to define an authoring template which can be used within existing authoring applications to manually annotate legislative rules.
Benefits
Attaining these objectives would:
- Support automated creation, maintenance, and distribution of rule books for compliance.
- Contribute to the development of a legislative processing tool.
- Make legislative rules accessible for web-based applications. For example, given other annotations, one could identify rules that apply with respect to particular individuals in an organisation along with relevant dates, locations, etc.
- Enable further processing of the rules such as removing formatting, parsing the content of the rules, and representing them semantically.
- Allow an inference engine to be applied over the formalised rule base.
- Make legislation more transparent and communicable among interested parties such as government departments, EU governments, and citizenry.
Scope
To attain the objectives, we propose the following phases, where the numbers represent weeks of effort:
- Create a relatively small sample corpus to scope the study.
- Manually identify the forms of legislative rules within the corpus.
- Develop or adapt an annotation scheme for rules.
- Apply the analysis tools of GATE and annotate the rules.
- Validate that GATE annotates the rules as intended.
- Apply the annotation system to a larger corpus of documents.
For each section, we would produce a summary of results, noting where difficulties are encountered and ways they might be addressed.
Extending the work
The work can be extended in a variety of ways:
- Apply the GATE rules to a larger corpus with more variety of rule forms.
- Process the rules for semantic representation and inference.
- Take into consideration defeasiblity and exceptions.
- Develop semantic web applications for the rules.
By Adam Wyner
Distributed under the Creative Commons
Attribution-Non-Commercial-Share Alike 2.0