Session I of "Automated Content Analysis and the Law" Workshop

Today is session I of the NSF sponsored workshop on Automated Content Analysis and the Law. The theme of today’s meeting is the state of judicial/legal scholarship in order to:

  • Identify the theoretical and substantive puzzles in legal and judicial scholarship which might benefit from automated content analysis
  • Discuss the kinds of data/measures that are required to address these puzzles which automated content analysis could provide.

Further comments later in the day after the session.
–Adam Wyner
Copyright © 2009 Adam Wyner

London GATE Users Group

At the recent GATE Summer School in Sheffield, there was some discussion among people from London to form an occasional, informal users group where GATE users based in London can arrange to meet to go over tutorials, develop tutorials, discuss how we work with GATE, help one another with problems, and generally have a bit of a blab over tea with others who have similar interests.
As the informal organiser of this informal group, I thought my blog (which touches on topics related to text analytics) might be an acceptable place to announce and maintain the group. If things really get going, then perhaps the group will hive off to its own site.
I would like to suggest Thursday, August 20 in the early evening (e.g. 19:00) as our first meeting time. Likely the meeting would be till 20:30. Place (somewhere in central London — Covent Garden/Leicester Square) to be announced. Please let me know if this time and vicinity suits you, as we are looking to have more than one person show up.
Likely people will bring laptops, but we’ll try to arrange a projector as well for public show and tell. If you have something you would like to discuss or show, that would be good, but we can always find something to do and discuss.
It is an open group, and if you would like to be kept informed of any upcoming meetings, please send an email to Adam Wyner (adam@wyner.info). Feel free also to join this blog as one way to keep in touch with this group.
The group currently has the following participants:

  • Dipti Garg (Fizzback)
  • Hercules Fisherman (Fizzback)
  • Adam Wyner (University College London)
  • Auhood Alfaries (Brunel University)
  • Helen Flatley (EqualMedia)
  • Gerhard Brey (King’s College London)
  • Daniel Elias (Hawk Ridge Capital Management)
  • Renato Souza (Universidade Federal de Minas Gerais)

We look forward to our first meeting and to hearing from other people who may be interested in working with GATE. Comments on this topic are very welcome.
Cheers!
Adam Wyner

Participating in One-Lex — Managing Legal Resources on the Semantic Web

Later this summer, I’ll be participating in the summer school Managing Legal Resources in the Semantic Web, September 7 to 12 in San Domenico di Fiesole (Florence, Italy). This program will focus on several aspects of legal document management:

  • Drafting methods, to improve the language and the structure of legislative texts
  • Legal XML standards, to improve the accessibility and interoperability of legal resources
  • Legal ontologies, to capture legal metadata and legal semantics
  • Formal representation of legal contents, to support legal reasoning and argumentation
  • Workflow models, to cope with the lifecycle of legal documentation

While I’m familiar with several of these areas, I’m using this opportunity to fill in my knowledge in these key areas.

NSF sponsored workshop: Automated Content Analysis and the Law

I was invited to participate in an NSF ­Sponsored Workshop
 Automated Content Analysis and Law, August 3 and 4 at NSF HQ in Arlington, VA and organised by Georg Vanberg (UNC).
There are two sessions planned. The first session will focus on identifying the theoretical/substantive puzzles in legal and judicial scholarship that might benefit from automated content analysis as well as what data and measurements are required. For the second session, the focus is on the state of automated content analysis/natural language processing, exploring the extent to which current technology is relevant to providing results with respect to issues raised in the first session and what might be needed.
There is an interesting mix of people, with a strong emphasis on legal scholarship bearing on the US Supreme Court and opinion mining. I had an email exchange with Georg, the workshop organiser about this, and we agree that attention ought to turn from the Supreme Court to lower levels of the legal system. I also suggested that participants consider some of the following points which bear on the motives and objectives of these lines of research in terms of who is being served and how the data or conclusions would be used.
Questions for Discussion

  • What sorts of artifacts and technologies (if any) will emerge from the research?
  • How does the research relate to the Semantic Web?
  • What public service does the research provide or support?
  • How does this research relate to:
    • E-discovery
    • Textual legal case based reasoning
    • Legislative XML Markup
    • Other research communities e.g. ICAIL and JURIX

Participants

  • Scott Barclay (NSF) – Barclay@uamail.albany.edu
  • Cliff Carrubba (Emory) – ccarrub@emory.edu
  • Skyler Cranmer (UNC) – skylerc@email.unc.edu
  • Barry Friedman (NYU)- friedmab@juris.law.nyu.edu
  • Susan Haire (NSF) – shaire@nsf.gov
  • Lillian Lee (Cornell) – llee@cs.cornell.edu
  • Jimmy Lin (Maryland) – jimmylin@umd.edu
  • Stefanie Lindquist (Texas) – SLindquist@law.utexas.edu
  • Will Lowe (Nottingham) – will.lowe@nottingham.ac.uk
  • Andrew Martin (Wash U) – admartin@wustl.edu
  • Wendy Martinek (NSF) – wemartin@nsf.gov
  • Kevin McGuire (UNC) – kmcguire@unc.edu
  • Wayne McIntosh (Maryland) – wmcintosh@gvpt.umd.edu
  • Burt Monroe (Penn State) – blm24@psu.edu
  • Kevin Quinn (Harvard) – kevin_quinn@harvard.edu
  • Jonathan Slapin (Trinity College) – jonslapin@gmail.com
  • Jeff Staton (Emory) – jkstato@emory.edu
  • Georg Vanberg (UNC) – gvanberg@unc.edu
  • Adam Wyner (University College London) – adam@wyner.info

General Architecture for Text Engineering Summer School

Next week I’m attending a week long summer school on General Architecture for Text Engineering (GATE). GATE is an open-source and extensible toolkit for text mining, which has been used in a variety of areas. After having worked with people who had their “hands on” the tools, I decided it would better suit me to be able to work the material myself. I’ve been looking forward to this summer school for some time and am excited at the prospect of applying GATE tools to a DB of legal cases as well as developing an ontology.

ICAIL 2009 Workshops

Last week, I attended the 12th International Conference on Artificial Intelligence and Law in Barcelona, Spain. Tuesday-Thursday were given to the main conference while Monday and Friday were for workshops. This post makes a few remarks about the workshops.
On Monday, I attended two workshops. In the morning, I was at Legal Ontologies and Artificial Intelligence Technique (LOAIT), while in the afternoon, I attended Modeling Legal Cases, where I presented a paper An OWL Ontology for Legal Cases with an instantiation of Popov v. Hayashi. On Friday, I was at the morning workshop which I organized with Tom van Engers Natural Language Engineering of Legal Argumentation (NaLELA), where I presented a paper by Tom and I where we outline our approach to engineering argumentation.
At LOAIT, we heard reports about ongoing projects to provide legal taxonomies or ontologies for mediation, norms, business process specification, legislative markup, and acquisition of ontologies. Most of this work is still in the research phase, though some of it has been applied to samples in the domain of application.
At the modeling workshop, we heard a paper by Bex, Bench-Capon, and Atkinson about how to represent motives in an argumentation framework. Basically the idea is to make the notion of motives something explicit by putting it as a node in an argumentation graph; as such, the motives can be attacked and reasoned with. Trevor Bench-Capon modeled dimensions in property law in an argumentation framework. This paper was particularly helpful to me to finally get a grip on what dimensions are; in effect, they are finer-grained factors with an ordering over them. For example, possession ranges from an animal roaming free, a chase being started, hot pursuit, mortally wounding the animal, to actual bodily possession. In another paper, Trevor modeled a set of US Supreme Court cases, raising a series of important questions about how the reasoning of the Supreme Court could be modeled. Douglas Walton’s paper gave some samples of argumentation schemes. Henry Prakken presented an analysis in his argumentation framework of a case concerning disability assessment. Finally Kevin Ashley gave an overview of some aspects of legal case based reasoning which, he claims, ought to have some ontological representation. This paper is relevant to my research on ontologies as Kevin pointed out a range of elements which may be considered for inclusion in an ontology. My main reservation is that there ought to be some clear distinction between the ontology (domain knowledge) and the rules that apply to the elements of the ontology.
More information about the NaLELA workshop can be found at the website for Natural Language Engineering of Argumentation.
There were three other workshops I did not have time to attend — the workshop on E-discovery/E-disclosure, privacy and protection in web-based social networks, legal and negotiation decision support systems.

Legal Taxonomy

Introduction
In this post, I comment on Sherwin’s recent article Legal Taxonomy in the journal Legal Theory. It is a very lucid, thorough, and well-referenced discussion of the state-of-the-art in taxonomies of legal rules. By considering how legal taxonomies organise legal rules, we better understand current conceptions of legal rules by legal professionals. My take away message from the article is that the analysis of legal rules could benefit from some of the thinking in Linguistics and Computer Science, particularly in terms of how data is gathered and analysed.
Below, I briefly outline ideas concerning taxonomies and legal rules. Then, I present and comment on the points Sherwin brings to the fore.
Taxonomies
Taxonomy is the practice and science of classification of items in a hierarchical IS-A relationship, where the items can be most anything. The IS-A relationship is also understood as subtypes or supertypes. For example, a car is a subtype of vehicle, and a Toyota is a subtype of car; we can infer that a Toyota is a subtype of vehicle. Each subtype has more specific properties than the supertype. In some taxonomies, one item may be a subtype of several supertypes; for example, a car is both a subtype of vehicle and a subtype of objects made of metal, however, not all vehicles are made of metal, nor are all things made from metal vehicles, which indicates that these types are distinct. Taxonomies are more specific than the related term ontologies, for which a range of relationships beyond the IS-A relationship may hold among the items such as is owned by or similar. In addition, ontologies generally introduce properties of elements in the class, e.g. colour, engine type, etc. Classifications in scientific domains such as Biology or Linguistics is intensely debated and revised. It would be expected that this would be even more so true in the legal domain which is comprised of intellectual evidence rather than empirical evidence as in the physical sciences and where the scientific method is not applied.
Legal Rules
First, let us be clear about what a legal rule is with a clear example following Professor David E. Sorkin’s example . A legal rule is a rule which determines whether some proposition holds (say of an individual) contingent on other propositions (the premises). For example, the state of Illinios assault statute specifies: “A person commits an assault when, without lawful authority, he engages in conduct which places another in reasonable apprehension of receiving a battery.” (720 ILCS 5/12-1(a)). We can analyse this into the legal rule:

    A person commits assault if

      1. the person engages in conduct;
      2. the person lacks lawful authority for the conduct;
      3. the conduct places another in apprehension of receiving a battery; and
      4. the other person’s apprehension is reasonable.

Optimally, each of the premises in a rule should be simple and be answerable as true or false. In this example, where all four premises are true, the conclusion, that the person committed assault, is true.
There are significant issues even with such simple examples since each of the premises of a legal rule may itself be subject to further dispute and consideration; the premises may be subjective (e.g. was the conduct intentional), admit degrees of truth (e.g. degree of emotional harm), or application of the rule may be subject to mitigating or aggravating circumstances. The determination of the final claim follows the resolution of these subsidiary disputes and considerations. In addition, some legal rules need not require all of the premises to be true, but allow a degree of counterbalancing evaluation of the terms.
The Sources of Legal Rules
Sherwin outlines the sources of the rules:

      Posited rules, which are legal rules as explicitly given by a legal authority such as a judge giving a legal decision.
      Attributed rules, which are legal rules that are drawn from a legal decision by a legal researcher rather than by a legal authority in a decision. The rule is implicit in the other aspects of the report of the case.
      Ideal rules, which are rules that are ‘ideal’ relative to some criteria of ideality, say morally or economically superior rules.

Purposes of Classification
In addition, we have the purposes or uses of making a classification of legal rules.

      Facilitating the discussion and use of law.
      Supporting the critical evaluation of law
      Influencing legal decision-making

In the first purpose, the rules are sorted into classes, which helps to understand and manage legal information. In Sherwin’s view, this is the most basic, formal, and least ambitious goal, yet it relies on having some taxonomic logic in the first place. The second purpose, the rules are evaluated to determine if they are serving the intended purpose as well as to identify gaps or inconsistencies. As Sherwin points out, the criteria of evaluation must then also be determined; however, this then relates to the criteria which guides the taxonomy in the first place, a topic we touch on below. The final purpose is a normative one, where the classification identifies the normal circumstances under which a rule applies, thereby also clarifying those circumstances in which the rule does not apply. Sherwin points out that legal scholars vary in which purpose they find attractive and worth pursuing.
While I can appreciate that some legal scholars might not find the ‘formal’ classification of interest, I view it from a different perspective. First, any claim concerning the normative application of one rule instead of another rest entirely on the intuitive presumption that the rules are clearly different. This is a distinction that the first level can help to clarify. Similar points can be made for other relationships among rules. Second, focusing on the latter stage does not help to say specifically why one rule means what it does and has the consequences as intended; yet surely this is in virtue of the specific ‘content’ of the rule, which again is clarified by a thorough going analysis at the first stage. Third, if there is going to be any progress in applied artificial intelligence and law, it will require the analytical elements defined at the first stage. Fourth, as the study of Linguistics has shown, close scrutiny at the first stage can help to reveal very issues and problems that are fundamental to all higher stages. Fifth, providing even a small, clear sample of legal arguments analysed along other lines of the first stage can give the community of legal scholars a common ‘pool’ of legal arguments to fruitfully consider at the later stages; along these lines, it is notable how few concrete, detailed examples Sherwin’s paper discusses. Not surprisingly, some of the issues Sherwin raises about the purposes of different ‘levels’ of analysis also appear in the linguistic literature. In my view, though the first stage may not be interesting to most legal professionals, there are very good reasons why it should be.
Criteria of Taxonomy
Several different criteria which guide the taxonomy of legal rules are discussed.

      Intuitive similarity: whether researchers claim that two rules are subtypes of one another.
      Evolutionary history: the legal rule is traced in the history of the law.
      Formal classification: the logical relations among categories of the law.
      Function based: a function from the problem to a set of solutions.
      Reason based: the higher-level reasons that explain or justify a rule.

Sherwin criticises judgements based on intuitive similarity since the taxonomers may be relying on false generalisations rather than their own intuitions and that intuition can be arbitrary and without reason. This is also the sort of criticism leveled at large segment of linguistic research and which has been shown to be misleading. Of course, one must watch false classifications and try to provide a justification for classifying one element in one class and not another. One way to do this is, as in psycholinguistics, is to provide tests run over subjects. Another way is to refine the sorts of observations that lead to classifications. In general, all that we currently know about language, from dictionaries, to grammars, to inference rules is based on linguistic intuitions. Some, such as the rules of propositional logic, have been so fixed that they now seem to exist independent of any linguistic basis.
The issue here is somewhat related to classification by formal logical relations. It is unclear what Sherwin thinks logical relations are and how they are applied. What we do have more clarity on are some of the criteria for such a formal taxonomy: accounting for all legal materials, a strict hierarchy, consistent interpretation of classes, and no overlap of categories. This is but one way to consider a formal hierarchy; indeed, there is a separate and very interesting question about what formal model of classification best suits a legal taxonomy. Yet, this issue is not explored in the article.
The function based approach seems to have meta categories. For example, the rule above can be seen as a function from circumstances to a classification of a person as having committed an assault. However, this is not what appears to be intended in Sherwin’s discussion. Rather, there are meta-functional categories depending on higher level problems and solutions. The examples given are Law as a Grievance-Remedial Instrument and Law as an Administrative-Regulatory Instrument. For me, this is not quite as clear as Sherwin makes it appear.
The reason approach organises rules according to an even higher-level of the rule — the justification or explanation of the rule. Some of the examples are that a wrongful harm imposes an obligation for redress, deterring breaches of promises facilitate exchange, or promoting public safety. In my view, these are what people (e.g. Professor Bench-Capon) in AI and Law would call values which are promoted by the legal rule. Sherwin discusses several different ways that reason based classification is done: intended, attributed, and ideal rationales. In my view, the claimed differences are not all that clear or crucial to the classification. In some cases, the rationale of a legal rule is given by the adjudicator. However, where this is not so, the rationale is implicit and must be interpreted, which is to give the intended rationale. In other cases, legal researchers examine a body of law and provide rationales, which is the attributed rationale. In this sense, the intended and attributed rationales are related (both interpreted), but achieved by different methods (study of one case versus study of a body of cases and considerations about the overall purpose of the law). Finally, there are ideal rationales, which set out broad, ideal goals of the legal rule, which may or may not be ‘ideally’ achievable. In this, the difference between intended/attributed and ideal is whether the rationale is analysed out of cases (bottom-up) or provided legislatively (top-down). In the end, the result is similar — legal rules are classified with respect to some rationale. The general problem with any such rationale is just how it is systematically given and itself justified so as to be consistent and not to yield conflicting interpretations of the same legal rule. Finally, Sherwin seems to think that there is some intrinsic conflict or tension between formal classification and reason based classification. I don’t agree. Rather, the difference is in the properties and methods being employed to make the classification, which are not inherently in conflict. Likely, a mixed approach will yield the most insights.
Copyright © 2009 Adam Wyner

AI and Law contacts in Boston, MA

On a recent visit to Boston, Massachusetts USA, I had the opportunity to visit the Berkman Center for Internet and Law. It so happened that Kevin Ashley of the University of Pittsburgh was visiting the center for a meeting about legal document assembly. Kevin is a well-known expert in AI and Law, specialising in case based systems for legal reasoning that are geared towards helping law students learn to reason about the law. Kevin and I have met before and have a shared interest in ontologies and case based reasoning. We discussed research trends in legal case based reasoning, funding sources, and lines of collaborative research. I also found out that Richard Susskind, author of The End of Lawyers? was giving a talk at the Berkman center the same day, so I attended that talk, which was a distillation of his recent book. As it turned out Edwina Rissland of the University of Massachusetts at Amherst, another key figure in AI and Law (and Kevin’s former thesis advisor), was also at Susskind’s talk and also participating in the legal document assembly meeting. I’d not met her before, so it was a treat to have a brief chat. Finally, I met with Carole Hafner of Northeastern University, another central figure in legal case based reasoning. Carole was particularly helpful in drawing my attention to some of the earlier key articles on these topics by her and by Edwina. I meet with Carole every time I’m in the Boston area to get her views on AI and Law. In all, a very sociable and informative series of discussions.
Copyright © 2009 Adam Wyner

Further Considerations on "The End of Lawyers"

In a previous post on Susskind’s The End of Lawyers, I briefly outlined some of the technologies Susskind discusses, then pointed out several fundamental technologies which he does not discuss in depth, but which I believe will have significant impact on the legal profession.
One topic which I did not discuss in that post was why the legal profession is so slow to adopt legal technology. Susskin points out several:

      Billable hours — the legal profession makes money by the hour, which is a disincentive to make legal processes more efficient.
      Conservativism and status — the legal profession has a long and distinguished position which changes slowly.
      Government funding — while governments may recognise the value of legal technologies, investing in them is another matter (though see the recent e-Government awards in the previous post).
      Information availability — only recently have legal documents (legislation, cases, other government information) been made publically and electronically available.

I think (and believe my colleagues in AI and Law would agree) that these are very significant contributing factors in the slow adoption of legal technolgoies by legal professionals, firms, and governments. But, there are others, and I believe that by identifying them, we can then make progress to addressing the problems that they raise.
To help us clarify the issues, we can compare and contrast the legal profession to another very ancient and prestigious profession — medicine. However, doctors, medical organisations, and medical researchers have adopted and advanced technologies very rapidly and on a large scale, yet the technologies are, at their core, similar to those available to legal professions. Therefore, technologically, there is little reason why the legal profession has not also adopted the technologies or more aggressively sought to adapt them.
While there are systems to support reasoning by doctors and medical records filing and retrieval, let me focus on two technologies which are equally available, fundamental, and influential to legal and medical professions — information extraction and ontologies.
In the medical field, there are are large corpora of textual information that must extract relevant information. The corpora are and have been publicly available for some time. There are academic and industry groups that have and develop software systems to extract the information (e.g. National Centre for Text Mining and Linguamatics, among others). Through links, one can find conferences, other groups, and government organisations; the interest is deep, widespread, and of high value. Moreover, medical ontologies are very far advanced such as the Systematised Nomenclature of Medicine Clinical Terms and the Foundational Model of Anatomy among others.
In the legal field, the corpora of textual information is only just beginning to be available. There has been some research on information extraction in the legal field. There has been some work on legal ontologies (e.g. LKIF, which was an EU project that I participated in).
In both areas — information extraction and ontologies — the medical field far outstrips the legal field. Why?
I think the differences are not so much those outlined above; one could argue that medical and legal fields have had, at least historically, similar constraints — the medical field has just overcome them. The most obvious apparent difference is that research medicine has been and continues to be advanced with scientific and technological means. Other research fields — biology, chemistry, statistics, anatomy — made relevant contributions. Moreover, the medical field has large research bodies that are well-funded (e.g. The Wellcome Trust). Finally, the culture of medical research and application of findings is such that information is disseminated, criticised, and verified. Let us put these into four points:

  • Scientific and technological approach
  • Contributions from other fields
  • Research bodies
  • Culture of research

In these respects, the legal field is very different to the medical field. Science and technology have not, until very recently, been relevant in terms of how the law is practiced. While there have been some contributions from other fields (e.g. sociology or psychology), the impact is relatively low. There are research bodies, but they not of the scale or influence of that in medicine. And the disposition of the legal community has been to closely hold information.
I believe that there is single (though very complex) underlying reason for the difference — the object of study. In medicine, the objects of study are physical, whether in chemistry, biology, anatomy, etc; these objects are and have been amenable to scientific study and technological manipulation. In contrast, in law, the object of study is non-physical; one might be tempted to say it is the law itself, but we can be more concrete and say it is the language in which the law is expressed, for at least language is something tangible and available for study.
Thus, the scientific study of language — Linguistics — is relevant. However, Linguistics as a scientific endeavour is relatively young (50 to 100 years, depending on one’s point of view). The technological means to study language can be dated to the advent of the digital computer which could process language in terms of strings of characters. Widespread, advanced approaches to computational linguistics for information extraction is even more recent — 10 to 20 years. Very large corpora and the motives to analyse them arose with the internet. And not only must we understand the language of the law, but we must also understand the legal concepts as they are expressed in law. Here, the study of deontic reasoning, while advanced, is “only” some 20 years old and has found few applications (see my 2008 PhD thesis Wyner 2008 PhD Thesis).
Language is the root of the issue; it can help explain some of the key differences in the application of technology to the legal field In our view, as the linguistic and logical analyses of the language of the law advance, so too will applications, research bodies, and results. However, it is somewhat early days and, in comparison to the medical field, there is much yet to be done.
Copyright © 2009 Adam Wyner