Presentation at Legal Know-how Workshop, Nov. 10, 2010

I have been invited to make a presentation on Textual information extraction and ontologies for legal case-based reasoning at a Legal Know-how Workshop, which is an industry oriented event organised by the International Society for Knowledge Management UK.
Date: 10 November 2010
Time: 13:30-19:00
Venue: University College London
Medical Sciences Building
A. V. Hill Lecture Theatre
Gower Street
London, WC1E 6BT
See the workshop website for registration fee (either free or under £25) and booking.
This will be a very interesting opportunity to hear from and talk with industry consultants and experts about the latest developments in legal knowledge management. My thanks to Stella Dextre Clarke of ISKO-UK for organising the event and inviting me to take part.


13:30 Registration
14:00 Welcome from ISKO-UK by Stella Dextre Clarke
14:05 Legal knowledge – the practitioner’s viewpoint
Melanie Farquharson, 3Kites Consulting

This session will focus on the practical situations in which lawyers look for knowledge in order to deliver legal services to their clients. It will identify some typical ‘use cases’ and consider ways in which knowledge can be delivered to the practitioner – even without them having to look for it.

14:35 Why lawyers need taxonomies – adventures in organising legal knowledge
Kathy Jacob & Lynley Barker, Pinsent Masons LLP;
Graham Barbour & Mark Fea, LexisNexis

This presentation will cover the practical issues encountered by a law firm in its quest to improve findability of one of its key resources – knowledge and information. We will discuss our approach to building taxonomies, the tools and processes deployed and how we anticipate our taxonomy will be applied and consumed by lawyers and publishers.
The LexisNexis part of the presentation will focus on the challenges of building and applying legal taxonomies to suit the breadth and depth of content they provide online. It will also examine ways in which taxonomies can be surfaced in the user interface and help to drive compelling functionality that improves the user’s search experience.

15:20 Taxonomy management at Clifford Chance
Mats Bergman, Clifford Chance

This talk will describe how taxonomy management works in practice at Clifford Chance. As an increasing number of core knowledge resources are making use of the same set of firm-wide taxonomies, the increased interdependencies necessitate the implementation of a controlled process for updating the taxonomies. A simple governance model will be presented. Some thoughts will follow on the evolution of taxonomy development within a larger organisation and the current challenge of using social tagging in conjunction with controlled vocabularies.

15:50 Refreshments (Lower Refectory)
16:20 Textual information extraction and ontologies for legal case-based reasoning
Adam Wyner, University of Liverpool

This talk gives a brief overview of current developments and prospects in two related areas of the legal semantic web for legal cases – textual information extraction and ontologies. Textual information extraction is a process of automatically annotating and extracting textual information from the legal case base (precedents), thereby identifying elements such as participants, the roles the participants play, the factors which were considered in arriving at a decision, and so on. The information is valuable not only for search (to find applicable precedents), but also to populate an ontology for legal case-based reasoning. An ontology is a formal representation of key aspects of the knowledge of legal professionals with which we can reason (e.g. given an assertion that something is a legal case, we can infer other properties) and with respect to which we can write rules (e.g. reasoning using case factors to arrive at a legal decision). Since it is expensive to manually populate an ontology (meaning to read cases and input the data into the ontology), we use textual information extraction to automatically populate the ontology. We conclude with an appeal for open source, collaborative development of legal knowledge systems among partners in academia, industry, and government.

17:00 Collaboration across boundaries
Gwenda Sippings & Gerard Bredenoord, Linklaters LLP

In this presentation, we will look at approaches to managing legal know-how in a major global law firm. We will describe several boundaries that we have to consider when organising our know-how, including boundaries between professionals, countries, internal and external resources and the well debated boundary between information and knowledge. We will also share some of the ways in which we are making our know-how available to the fee earners and other professionals in the firm, using social and technological solutions.

17:35 Reconciling the taxonomy needs of different users
Derek Sturdy, Tikit Knowledge Services

The last decade has seen the development of a substantial number of legal know-how and knowledge databases. It has also shown up a serious question on whether the metadata, and especially the taxonomies, that are applied to the various knowledge items, should be tailored to the particular needs of end-users, or whether, so to speak, "one size can fit all". In particular, this talk will discuss the overlapping, but discrete, needs of those using knowledge resources primarily for legal drafting and document production, and of those conducting legal research, and will address the relative value today, (as opposed to in 2000), of the effort put into internal metadata creation for those two sorts of end-users.

By Adam Wyner
Distributed under the Creative Commons
Attribution-Non-Commercial-Share Alike 2.0

Instructions for GATE's Onto Root Gazetteer

In this post, I present User Manual notes for GATE’s Onto Root Gazetteer (ORG) and references to ORG. In Discussion of GATE’s Onto Root Gazetteer, I discuss aspects of Onto Root Gazetteer which I found interesting or problematic. These notes and discussion may be of use to those researchers in legal informatics who are interested in text mining and annotation for the semantic web.
Thanks to Diana Maynard, Danica Damljanovic, Phil Gooch, and the GATE User Manual for comments and materials which I have liberally used. Errors rest with me (and please tell me where they are so I can fix them!).
Onto Root Gazetteer links text to an ontology by creating Lookup annotations which come from the ontology rather than a default gazetteer. The ontology is preprocessed to produce a flexible, dynamic gazetteer; that is, it is a gazetteer which takes into account alternative morphological forms and can be added to. An important advantage is that text can be annotated as an individual of the ontology, thus facilitating the population of the ontology.
Besides being flexible and dynamic, some advantages of ORG over other gazetteers:

  • It is more richly structured (see it as a gazetteer containing other gazetteers)
  • It allows one to relate textual and ontological information by adding instances.
  • It gives one richer annotations that can be used for further processes.

In the following, we present the step by step instructions for ‘rolling your own’, then show the results of the ‘prepackaged’ example that comes with the plugin.
Step 1. Add (if not already used) the Onto Root Gazetteer plugin to GATE following the usual plugin instructions.
Step 2. Add (if not already used) the Ontology Tools (OWLIM Ontology LR, OntoGazetteer, GATE Ontology Editor, OAT) plugin. ORG uses ontologies, so one must have these tools to load them as language resources.
Step 3. Create (or load) an ontology with OWLIM (see the instructions on the ontologies). This is the ontology that is the language resource that is then used by Onto Root Gazetteer. Suppose this ontology is called myOntology. It is important to note that OWLIM can only use OWL-Lite ontologies (see the documentation about this). Also, I succeeded in loading an ontology only from the resources folder of the Ontology_Tools plugin (rather than from another drive); I don’t know if this is significant.
Step 4. In GATE, create processing resources with default parameters:

  • Document Reset PR
  • RegEx Sentence Splitter (or ANNIE Sentence Splitter, but that one is likely to run slower
  • ANNIE English Tokeniser
  • ANNIE POS Tagger
  • GATE Morphological Analyser

Step 5. When all these PRs are loaded, create a Onto Root Gazetteer PR and set the initial parameters as follows. Mandatory ones are as follows (though some are set as defaults):

  • Ontology: select previously created myOntology
  • Tokeniser: select previously created Tokeniser
  • POSTagger: select previously created POS Tagger
  • Morpher: select previously created Morpher.

Step 6. Create another PR which is a Flexible Gazetteer. At the initial parameters, it is mandatory to select previously created OntoRootGazetteer for gazetteerInst. For another parameter, inputFeatureNames, click on the button on the right and when prompt with a window, add ‘Token.root’ in the provided text box, then click Add button. Click OK, give name to the new PR (optional) and then click OK.
Step 7. To create an application, right click on Application, New –> Pipeline (or Corpus Pipeline). Add the following PRS to the application in this order:

  • Document Reset PR
  • RegEx Sentence Splitter
  • ANNIE English Tokeniser
  • ANNIE POS Tagger
  • GATE Morphological Analyser
  • Flexible Gazetteer

Step 8. Run the application over the selected corpus.
Step 9. Inspect the results. Look at the Annotation Set with Lookup and also the Annotation List to see how the annotations appear.
Small Example
The ORG plugin comes with a demo application which not only sets up all the PRs and LRs (the text, corpus, and ontology), but also the application ready to run. This is the file exampleApp.xgapp, which is in resource folder of the plugin (Ontology_Based_Gazetteer). To start this, start GATE with a clean slate (no other PRs, LRs, or applications), then Applications, then right click to Restore application from file, then load the file from the folder just given.
The ontology which is used for an illustration is for GATE itself, giving the classes, subclasses, and instances of the system. While the ontology is loaded along with the application, one can find it here. The text is simple (and comes with the application): language resources and parameters.
FIGURE 1 (missing at the moment)
FIGURE 2 (missing at the moment)
One can see that the token “language resources” is annotated with respect to the class LanguageResource, “resources” is annotated with GATEResource, and “parameters” is annotated with ResourceParameter. We discuss this further below.
One further aspect is important and useful. Since the ontology tools have been loaded and a particular ontology has been used, one can not only see the ontology (open the OAT tab in the window with the text), but one can annotate the text with respect to the ontology — highlight some text and a popup menu allows one to select how to annotate the text. With this, one can add instances (or classes) to the ontology.
One can consult the following for further information about how the gazetteer is made, among other topics:

See the related post Discussion of GATE’s Onto Root Gazetteer.
By Adam Wyner
Distributed under the Creative Commons
Attribution-Non-Commercial-Share Alike 2.0

London GATE Users Group

At the recent GATE Summer School in Sheffield, there was some discussion among people from London to form an occasional, informal users group where GATE users based in London can arrange to meet to go over tutorials, develop tutorials, discuss how we work with GATE, help one another with problems, and generally have a bit of a blab over tea with others who have similar interests.
As the informal organiser of this informal group, I thought my blog (which touches on topics related to text analytics) might be an acceptable place to announce and maintain the group. If things really get going, then perhaps the group will hive off to its own site.
I would like to suggest Thursday, August 20 in the early evening (e.g. 19:00) as our first meeting time. Likely the meeting would be till 20:30. Place (somewhere in central London — Covent Garden/Leicester Square) to be announced. Please let me know if this time and vicinity suits you, as we are looking to have more than one person show up.
Likely people will bring laptops, but we’ll try to arrange a projector as well for public show and tell. If you have something you would like to discuss or show, that would be good, but we can always find something to do and discuss.
It is an open group, and if you would like to be kept informed of any upcoming meetings, please send an email to Adam Wyner ( Feel free also to join this blog as one way to keep in touch with this group.
The group currently has the following participants:

  • Dipti Garg (Fizzback)
  • Hercules Fisherman (Fizzback)
  • Adam Wyner (University College London)
  • Auhood Alfaries (Brunel University)
  • Helen Flatley (EqualMedia)
  • Gerhard Brey (King’s College London)
  • Daniel Elias (Hawk Ridge Capital Management)
  • Renato Souza (Universidade Federal de Minas Gerais)

We look forward to our first meeting and to hearing from other people who may be interested in working with GATE. Comments on this topic are very welcome.
Adam Wyner

Legal Taxonomy

In this post, I comment on Sherwin’s recent article Legal Taxonomy in the journal Legal Theory. It is a very lucid, thorough, and well-referenced discussion of the state-of-the-art in taxonomies of legal rules. By considering how legal taxonomies organise legal rules, we better understand current conceptions of legal rules by legal professionals. My take away message from the article is that the analysis of legal rules could benefit from some of the thinking in Linguistics and Computer Science, particularly in terms of how data is gathered and analysed.
Below, I briefly outline ideas concerning taxonomies and legal rules. Then, I present and comment on the points Sherwin brings to the fore.
Taxonomy is the practice and science of classification of items in a hierarchical IS-A relationship, where the items can be most anything. The IS-A relationship is also understood as subtypes or supertypes. For example, a car is a subtype of vehicle, and a Toyota is a subtype of car; we can infer that a Toyota is a subtype of vehicle. Each subtype has more specific properties than the supertype. In some taxonomies, one item may be a subtype of several supertypes; for example, a car is both a subtype of vehicle and a subtype of objects made of metal, however, not all vehicles are made of metal, nor are all things made from metal vehicles, which indicates that these types are distinct. Taxonomies are more specific than the related term ontologies, for which a range of relationships beyond the IS-A relationship may hold among the items such as is owned by or similar. In addition, ontologies generally introduce properties of elements in the class, e.g. colour, engine type, etc. Classifications in scientific domains such as Biology or Linguistics is intensely debated and revised. It would be expected that this would be even more so true in the legal domain which is comprised of intellectual evidence rather than empirical evidence as in the physical sciences and where the scientific method is not applied.
Legal Rules
First, let us be clear about what a legal rule is with a clear example following Professor David E. Sorkin’s example . A legal rule is a rule which determines whether some proposition holds (say of an individual) contingent on other propositions (the premises). For example, the state of Illinios assault statute specifies: “A person commits an assault when, without lawful authority, he engages in conduct which places another in reasonable apprehension of receiving a battery.” (720 ILCS 5/12-1(a)). We can analyse this into the legal rule:

    A person commits assault if

      1. the person engages in conduct;
      2. the person lacks lawful authority for the conduct;
      3. the conduct places another in apprehension of receiving a battery; and
      4. the other person’s apprehension is reasonable.

Optimally, each of the premises in a rule should be simple and be answerable as true or false. In this example, where all four premises are true, the conclusion, that the person committed assault, is true.
There are significant issues even with such simple examples since each of the premises of a legal rule may itself be subject to further dispute and consideration; the premises may be subjective (e.g. was the conduct intentional), admit degrees of truth (e.g. degree of emotional harm), or application of the rule may be subject to mitigating or aggravating circumstances. The determination of the final claim follows the resolution of these subsidiary disputes and considerations. In addition, some legal rules need not require all of the premises to be true, but allow a degree of counterbalancing evaluation of the terms.
The Sources of Legal Rules
Sherwin outlines the sources of the rules:

      Posited rules, which are legal rules as explicitly given by a legal authority such as a judge giving a legal decision.
      Attributed rules, which are legal rules that are drawn from a legal decision by a legal researcher rather than by a legal authority in a decision. The rule is implicit in the other aspects of the report of the case.
      Ideal rules, which are rules that are ‘ideal’ relative to some criteria of ideality, say morally or economically superior rules.

Purposes of Classification
In addition, we have the purposes or uses of making a classification of legal rules.

      Facilitating the discussion and use of law.
      Supporting the critical evaluation of law
      Influencing legal decision-making

In the first purpose, the rules are sorted into classes, which helps to understand and manage legal information. In Sherwin’s view, this is the most basic, formal, and least ambitious goal, yet it relies on having some taxonomic logic in the first place. The second purpose, the rules are evaluated to determine if they are serving the intended purpose as well as to identify gaps or inconsistencies. As Sherwin points out, the criteria of evaluation must then also be determined; however, this then relates to the criteria which guides the taxonomy in the first place, a topic we touch on below. The final purpose is a normative one, where the classification identifies the normal circumstances under which a rule applies, thereby also clarifying those circumstances in which the rule does not apply. Sherwin points out that legal scholars vary in which purpose they find attractive and worth pursuing.
While I can appreciate that some legal scholars might not find the ‘formal’ classification of interest, I view it from a different perspective. First, any claim concerning the normative application of one rule instead of another rest entirely on the intuitive presumption that the rules are clearly different. This is a distinction that the first level can help to clarify. Similar points can be made for other relationships among rules. Second, focusing on the latter stage does not help to say specifically why one rule means what it does and has the consequences as intended; yet surely this is in virtue of the specific ‘content’ of the rule, which again is clarified by a thorough going analysis at the first stage. Third, if there is going to be any progress in applied artificial intelligence and law, it will require the analytical elements defined at the first stage. Fourth, as the study of Linguistics has shown, close scrutiny at the first stage can help to reveal very issues and problems that are fundamental to all higher stages. Fifth, providing even a small, clear sample of legal arguments analysed along other lines of the first stage can give the community of legal scholars a common ‘pool’ of legal arguments to fruitfully consider at the later stages; along these lines, it is notable how few concrete, detailed examples Sherwin’s paper discusses. Not surprisingly, some of the issues Sherwin raises about the purposes of different ‘levels’ of analysis also appear in the linguistic literature. In my view, though the first stage may not be interesting to most legal professionals, there are very good reasons why it should be.
Criteria of Taxonomy
Several different criteria which guide the taxonomy of legal rules are discussed.

      Intuitive similarity: whether researchers claim that two rules are subtypes of one another.
      Evolutionary history: the legal rule is traced in the history of the law.
      Formal classification: the logical relations among categories of the law.
      Function based: a function from the problem to a set of solutions.
      Reason based: the higher-level reasons that explain or justify a rule.

Sherwin criticises judgements based on intuitive similarity since the taxonomers may be relying on false generalisations rather than their own intuitions and that intuition can be arbitrary and without reason. This is also the sort of criticism leveled at large segment of linguistic research and which has been shown to be misleading. Of course, one must watch false classifications and try to provide a justification for classifying one element in one class and not another. One way to do this is, as in psycholinguistics, is to provide tests run over subjects. Another way is to refine the sorts of observations that lead to classifications. In general, all that we currently know about language, from dictionaries, to grammars, to inference rules is based on linguistic intuitions. Some, such as the rules of propositional logic, have been so fixed that they now seem to exist independent of any linguistic basis.
The issue here is somewhat related to classification by formal logical relations. It is unclear what Sherwin thinks logical relations are and how they are applied. What we do have more clarity on are some of the criteria for such a formal taxonomy: accounting for all legal materials, a strict hierarchy, consistent interpretation of classes, and no overlap of categories. This is but one way to consider a formal hierarchy; indeed, there is a separate and very interesting question about what formal model of classification best suits a legal taxonomy. Yet, this issue is not explored in the article.
The function based approach seems to have meta categories. For example, the rule above can be seen as a function from circumstances to a classification of a person as having committed an assault. However, this is not what appears to be intended in Sherwin’s discussion. Rather, there are meta-functional categories depending on higher level problems and solutions. The examples given are Law as a Grievance-Remedial Instrument and Law as an Administrative-Regulatory Instrument. For me, this is not quite as clear as Sherwin makes it appear.
The reason approach organises rules according to an even higher-level of the rule — the justification or explanation of the rule. Some of the examples are that a wrongful harm imposes an obligation for redress, deterring breaches of promises facilitate exchange, or promoting public safety. In my view, these are what people (e.g. Professor Bench-Capon) in AI and Law would call values which are promoted by the legal rule. Sherwin discusses several different ways that reason based classification is done: intended, attributed, and ideal rationales. In my view, the claimed differences are not all that clear or crucial to the classification. In some cases, the rationale of a legal rule is given by the adjudicator. However, where this is not so, the rationale is implicit and must be interpreted, which is to give the intended rationale. In other cases, legal researchers examine a body of law and provide rationales, which is the attributed rationale. In this sense, the intended and attributed rationales are related (both interpreted), but achieved by different methods (study of one case versus study of a body of cases and considerations about the overall purpose of the law). Finally, there are ideal rationales, which set out broad, ideal goals of the legal rule, which may or may not be ‘ideally’ achievable. In this, the difference between intended/attributed and ideal is whether the rationale is analysed out of cases (bottom-up) or provided legislatively (top-down). In the end, the result is similar — legal rules are classified with respect to some rationale. The general problem with any such rationale is just how it is systematically given and itself justified so as to be consistent and not to yield conflicting interpretations of the same legal rule. Finally, Sherwin seems to think that there is some intrinsic conflict or tension between formal classification and reason based classification. I don’t agree. Rather, the difference is in the properties and methods being employed to make the classification, which are not inherently in conflict. Likely, a mixed approach will yield the most insights.
Copyright © 2009 Adam Wyner