This post has notes on and links to several other posts about legal information annotation and extraction using the General Architecture for Text Engineering system (GATE). The information in the posts was presented at my tutorial at JURIX 2009, Rotterdam, The Netherlands; the slides are available here. See the GATE website or my slides for introductory material about NLP and text annotation. For particulars about NLP and legal resources, see the posts and files at the links below.
The Posts
The following posts discuss different aspects of legal information extraction using GATE (live links indicate live posts):
- Legislative rule extraction: annotates (non-conditional) rules from legislation.
- Using XSLT to re-represent GATE output: uses the information extracted from legislation and reformats it using XSLT.
- Information extraction of conditional rules: illustrates using GATE to annotate conditional statements of the form If P, then Q, which are common in law.
- Information extraction of legal case factors: shows how to annotate factors of a case, which are prototypical fact patterns and crucial to case based reasoning.
- Information extraction of legal case features using lists and rules: illustrates annotation of case features, which include a range of information about a legal case such as jurisdiction, parties, decision, and others; the post provides lists and rules to annotate this information.
- Information extraction of legal case features using an ontology: shows how to use an ontology (rather than lists and rules) to annotate text.
- Information extraction with ANNIC: uses a GATE plugin to analyse and query annotations such as case factors.
Prototypes
The samples presented in the posts are prototypes only. No doubt there are other ways to accomplish similar tasks, the material is not as streamlined or cleanly presented as it could be, and each section is but a very small fragment of a much larger problem. In addition, there are better ways to present the lists and rules “in one piece”; however, during development and for discussion, it seems more helpful to have elements separate. Nonetheless, as a proof of concept, the samples make their point.
If there are any problems, contact Adam Wyner at adam@wyner.info.
Files
The posts are intended to be self-contained and to work with GATE 5.0. The archive files include the .xgapp file, which is a saved application state, along with text/corpus, the lists, and JAPE rules needed to run the application. In addition, the archive files include any graph outputs as reference. As noted, one may need to ‘fiddle’ a bit with the gazetteer lists in the current version.
Graphics
Graphics in the posts can be viewed in a larger and clearer size by right clicking on the graphic and selecting View Image. The Back button on your browser will close the image and return you to the post.
License
The materials are released under the following license:
By Adam Wyner
Distributed under the Creative Commons
Attribution-Non-Commercial-Share Alike 2.0
If you want to commercially exploit the material, you must seek a separate license with me. That said, I look forward to further open development on these materials; see my post on Open Source Legal Information.