This post reports initial steps in legal case factor annotation. We first give a very brief and highly simplified overview of case based reasoning using case factors, then present how case factors can be identified using text mining. (See introductory notes on this and related posts.)
Case based reasoning background
In Common Law legal systems such as in the USA and UK, judges make decisions concerning a case; we can say the judges make the law. This is in contrast to Civil Law legal systems as in Europe (excluding the UK) or elsewhere in which legislatures make law and which must be followed by judges. Neither legal system is common law or civil law in practice: the USA and UK have laws made by legislatures; in Europe, the application of legislative acts in particular circumstances (refining the law to apply to the facts) takes on aspects of common law.
In a Common Law system, judges and lawyers argue using case based reasoning: a current undecided case with respect to precedent cases, which are cases that have already been decided by a court and are accepted as “good law”. In essence, if the current case were exactly like a particular precedent case in all essential ways, then the current case ought to be decided as was the precedent case. Where the current case varies, one must argue comparatively with respect to other precedents. Among the ways in which cases are compared and contrasted, we find the case factors, where factors are prototypical fact patterns of a case. In virtue of the facts of a case and along with the applicable laws and precedents, a judge decides a case. It is, therefore, crucial to be able to identify the facts of a case in order to compare and contrast the cases.
In AI and Law, case based reasoning has a long and well developed history and literature (see the work of Hafner, Rissland, Ashley, and Bench-Capon among others. We make specific reference to Aleven’s 1997 Ph.D. Thesis. Given an analysis of cases in terms of factors, one can reason about how a current undecided case should, according to the precedents, be decided. However, a central problem is the knowledge bottleneck — how to analyse cases in terms of factors. By an large, this has been a manual labour. In the CATO database of cases discussed in Aleven 1997 (about 140 cases concerning intellectual property), the factors are manually annotated. There has been some effort to automate textual identification of factors in cases (see Bruninghaus and Ashley, but this is done with case summaries, not “actual” cases; moreover, the database, annotation, and other system supports are unavailable, so the results of their experiments are not independently verifiable and cannot be developed by other researchers.
Factors in text
In the CATO system, texts of case decisions are presented to the student along with a menu of factors; the student associates the factors with the text, in effect, annotating the case as a whole with the factors, but not the linguistic aspects which gave rise to the annotation. The factors are not extracted. The CATO system has other components to support case based argumentation, but these are not relevant to our discussion at this point.
Factors are legal concepts that range over facts. While Aleven 1997 has 27 factors and a factor hierarchy, we only look at two factors in order to give a flavour of our approach.
- Description: The plaintiff took efforts to maintain the secrecy of its information.
- The factor applies if: The plaintiff limited access to and distribution of information. Examples: nondisclosure agreements, notification that the information is confidential, securing the information with passwords and secure storage facilities, secure document distribution systems, etc.
- Description: The information was disclosed to outsiders or was in the public domain. The plaintiff either did not have secret information or did not have an interest in maintaining the secrecy of information.
- The factor applies if: The plaintiff disclosed the product information to licensees, customers, subcontractors, etc.
- The factor does not apply if: Plaintiff published the information in a public forum. All we know is that plaintiff marketed a product from which the information could be ascretained by reverse engineering.
Aleven 1997 illustrates the association of factors with textual passages in a case.
Given the factor description, we make lists and rules which at least highlight candidate passages in the case which might be relevant.
The results of annotating terms and sentences appears in:
Note that the disclosure sentence seems to be a reasonable candidate about the disclosure factor, but the secrecy sentence is a discussion about the factor rather than a presentation of the factor itself. As we have said, at this point we provide candidate expressions for the factors; further work must be done to more accurately automatically annotate the text.
The lists, JAPE rules, graphics, and application state are in the archive. See the related post Information Extraction with ANNIC which uses a GATE plugin to further analyse the results so they can be improved.
To highlight the relevant passages, we created Lookup lists and then JAPE rules. To create the Lookups, we turned to disclosure and secret in WordNet, taking the SynSets of each, as well as looking at hypernyms (superordinate terms). Making a selection, we created lists using the infinitival, lower case form. This gave us two lists — disclosure.lst and secret.lst.
- disclosure.lst: announce, betray, break, bring out, communicate, confide, disclose, discover, divulge, expose, give away, impart, inform, leak, let on, let out, make known, pass on, reveal, tell
- secret.lst: confidential, confidentiality, hidden, private, secrecy, secret
In the gazetteer itself, disclosure.lst has a majorType disclose, and secret.lst has a majorType secret. With these lists, we homogenize the alternative words for these concepts. It is importantly that these particular lists are integrated into a lists.def file; in our example, this is ListGaz, but is not included in the distribution. As the application uses the Flexible Gazetteer (not discussed here), we can Lookup alternative morphological forms of words in the lists.
Then we write JAPE rules so we can more easily identify them. The first rules make the majorType into an annotation for the annotation set, highlighting any occurrence of the terms; we could have skipped this, but it is worthwhile to see where and how the terms appear. The second rules classify sentences as relating to disclosure and secrecy.
- SecretFactor01.jape: Annotates any word from the secret.lst.
- DisclosureFactor01.jape: Annotates any word from the disclosure.lst.
- SecretFactorSentence01.jape: Annotates any sentence which contains an annotation Secret.
- DisclosureFactorSentence01.jape: Annotates any sentence which contains an annotation Disclosure.
The order of application of the processing resources is:
- Document Reset PR
- ANNIE Sentence Splitter
- ANNIE English Tokeniser
As we have already pointed out, the annotations highlight potentially relevant passages. Further refinement is needed. This would be clearer were one to look at more applications of the annotation. It will also be important to consider more factors on more cases and across more domains of case law.
By Adam Wyner
Distributed under the Creative Commons
Attribution-Non-Commercial-Share Alike 2.0