PhD Projects in AI and Law

Proposals

Below, there are a range of PhD project in AI and Law. Other project proposals in the general AI and Law topic area are also welcome. I am interested in NLP (information extraction, semantic representation, controlled natural languages, classification, chatbots, dialogue/discourse), argumentation, various forms of legal reasoning (case-based reasoning, client consultations, legislation/regulations, court proceedings, contracts), and ontologies/knowledge graph.

An Argument Chatbot

Chatbots are a popular area of development. In this project, you develop a chatbot for policy-making, legal consultations, or scientific debates. The chatbot should be capable of various dialogue types such as information-seeking, deliberation, and persuasion; in addition, the dialogue should be tied to patterns of argument and critical questions. The underlying techniques will be natural language processing (rule-based and machine learning), structured argumentation, and knowledge representation and reasoning. The project may be done in collaboration with IBM UK, working with IBM scientists and engineers.

Argument Extraction and Reconstruction

The goal is to identify textual passages which indicate argument and rhetorical structure (premises, claim, continuation) or argumentation schemes (patterns of everyday reasoning such as Expert Witness, Practical Reasoning, Commitment, etc). The student will review some background literature, analyse a selection of argumentation schemes, identify the particular elements to be extracted using an NLP tool, create the processing components, carry out a small evaluation exercise, and connect the NLP output to a computational argumentation tool. The particular corpus of text is to be determined.

Textual Entailment

Textual entailment is about taking a sentence or passage and drawing inferences from it, for example, the sentence “Bill turned off the light” implies “The light was off”. There are several available NLP tools to develop textual entailment. In this project, the student will apply the textual entailment tools to the corpus, evaluate them against a “gold standard”, then modify a tool to improve performance. There are existing corpora of texts to train and evaluate textual entailment.

Contrast Identification

Debates express contrasting positions on a particular topic of interest. A key problem is to determine the semantic contrariness of the positions as expressed by statements within the positions. Such a task is relatively easy for people to do, but difficult for automated identification since there are many linguistic ways to express contrasts, some of which may be synonymous. Annotation of contrast would help support semi-automatic construction of arguments and counter-arguments from text. The student will review some background literature, analyse a selection of contrasting expressions, identify the particular elements to be extracted using an NLP tool, create the processing components, carry out a small evaluation exercise, and connect the NLP output to a computational argumentation tool.

Classification of legal texts

In legal texts such as legislation or case law, different segments of the text serve different purposes. For example, one portion may be a statement of facts, while another is a statement of a rule. The project specifies the portions and classifications of a corpus of legal texts, creates a gold standard, then applies machine learning techniques to classify the portions to a high level of accuracy. Another topic within this area is legal decision prediction, wherein legal decisions (cases) are classified in various ways.

Bar Exam

In the US, to become a lawyer, a Bar Exam must be taken and past. The Bar Exam consists of 200 multiple choice questions, covering an extensive range of legal topics. The task in the project is to classify, using machine learning, the questions in the Bar Exam and to design a system to pass the Bar Exam, using techniques from NLP, logic, and machine learning.

A Controlled Natural Language with Defeasibility

Controlled Natural Languages (CNLs) are standardised, formal subsets of a natural language (such as English), which are both human readable and machine processable. Several CNLs have been developed, such as IBM’s ITA Controlled English (CE) or Business Rules Language (BRL), OMG’s Semantics of Business Vocabulary and Business Rules, and several academic languages. Some CNLs support ‘strict’ reasoning for ontologies, terminologies, or Predicate Logic, which is sufficient in many contexts. However, defeasible reasoning is essential in other contexts where there is inconsistent and partial knowledge, such as in political, legal, or scientific debates.  The project explores the representation of and reasoning with defeasibility in a CNL, which could lead to a CNL that has much wide applicability and impact. The project can be done in collaboration with IBM UK, working with IBM scientists and engineers. The project can be either a theoretical study or an implementation (or a mixture of both). The supervisor has extensive background CNLs and argumentation/defeasibility.

Rule Extraction from Legislation or Case Law

Legal texts (legislation, regulations, and case law) provide the “operational legal rules” for businesses, organisations, and individuals. It is important to be able to identify and extract such rules, particularly for rulebook compliance or to transform rules in natural language into machine-readable, executable rules. The student will analyse a selection of regulations, identify the particular elements to be extracted using NLP, create the processing components, translate rules from natural language to executable rules, draw inferences, and evaluate the results.

An Expert System to Support Reasoning in Juries

Jury trials are a fundamental aspect of the Common Law legal system in the UK and USA. In jury trials, jurors are members of the public who are required to reason about the facts of the case and about the legal rules to arrive at a decision (e.g. whether the plaintiff is guilty or innocent). This is a difficult and important task for a person to do who is not schooled in the law. Fortunately, in some jurisdictions, there are standardised “catalogues” of jury instructions to guide the jurors in how to reason. In this project, the student analyses a selection of jury instructions and implements them as an interactive juror decision support tool.

Legal Case Based Reasoning

Case based reasoning is about using known information to determine unknown problems. Legal case based reasoning is the structure of legal reasoning in courts in the UK and the USA. The project will implement several existing formalisations of legal case based reasoning.

Logical Formalisations of the Law

The law can be formalised in a variety of ways, and there are tools and techniques to support the task. Such formalisations can be queried and inferences drawn. The project will examine existing tools, see what can be improved, and provide fragments of formalised law.

Legal Ontologies/Knowledge Graph

In an ontology/KG, domain knowledge about entities, their properties, and their relations are formally represented. Representations also facilitate querying, extraction, linking, and inference. There are legal ontologies/KGs that represent the law, legal processes, and legal relationships. The project will examine existing legal ontologies, augment them, and build a richer ontological representation using existing tools.

Abstract Contract Calculator in Haskell

Create a program in Haskell, which is a functional programming language, to execute ‘theoretical’ legal contracts, which are contracts that have the form of an actual legal contract, but not the content.

Introduction to a Series of Posts on Legal Information Extraction with GATE

This post has notes on and links to several other posts about legal information annotation and extraction using the General Architecture for Text Engineering system (GATE). The information in the posts was presented at my tutorial at JURIX 2009, Rotterdam, The Netherlands; the slides are available here. See the GATE website or my slides for introductory material about NLP and text annotation. For particulars about NLP and legal resources, see the posts and files at the links below.
The Posts
The following posts discuss different aspects of legal information extraction using GATE (live links indicate live posts):

Prototypes
The samples presented in the posts are prototypes only. No doubt there are other ways to accomplish similar tasks, the material is not as streamlined or cleanly presented as it could be, and each section is but a very small fragment of a much larger problem. In addition, there are better ways to present the lists and rules “in one piece”; however, during development and for discussion, it seems more helpful to have elements separate. Nonetheless, as a proof of concept, the samples make their point.
If there are any problems, contact Adam Wyner at adam@wyner.info.
Files
The posts are intended to be self-contained and to work with GATE 5.0. The archive files include the .xgapp file, which is a saved application state, along with text/corpus, the lists, and JAPE rules needed to run the application. In addition, the archive files include any graph outputs as reference. As noted, one may need to ‘fiddle’ a bit with the gazetteer lists in the current version.
Graphics
Graphics in the posts can be viewed in a larger and clearer size by right clicking on the graphic and selecting View Image. The Back button on your browser will close the image and return you to the post.
License
The materials are released under the following license:
By Adam Wyner
Distributed under the Creative Commons
Attribution-Non-Commercial-Share Alike 2.0
If you want to commercially exploit the material, you must seek a separate license with me. That said, I look forward to further open development on these materials; see my post on Open Source Legal Information.

Article about RDFa and Open Government

This is a link to an article by Mark Birkbeck on RDFa and Open Government. It introduces the problems goverments have in managing their information and serving to the public, then outlines how RDFa, which introduces RDF into HTML documents, addresses these problem and provides for highly productive and readable web-documents.
Copyright © 2009 Adam Wyner

ICAIL 2009 Workshops

Last week, I attended the 12th International Conference on Artificial Intelligence and Law in Barcelona, Spain. Tuesday-Thursday were given to the main conference while Monday and Friday were for workshops. This post makes a few remarks about the workshops.
On Monday, I attended two workshops. In the morning, I was at Legal Ontologies and Artificial Intelligence Technique (LOAIT), while in the afternoon, I attended Modeling Legal Cases, where I presented a paper An OWL Ontology for Legal Cases with an instantiation of Popov v. Hayashi. On Friday, I was at the morning workshop which I organized with Tom van Engers Natural Language Engineering of Legal Argumentation (NaLELA), where I presented a paper by Tom and I where we outline our approach to engineering argumentation.
At LOAIT, we heard reports about ongoing projects to provide legal taxonomies or ontologies for mediation, norms, business process specification, legislative markup, and acquisition of ontologies. Most of this work is still in the research phase, though some of it has been applied to samples in the domain of application.
At the modeling workshop, we heard a paper by Bex, Bench-Capon, and Atkinson about how to represent motives in an argumentation framework. Basically the idea is to make the notion of motives something explicit by putting it as a node in an argumentation graph; as such, the motives can be attacked and reasoned with. Trevor Bench-Capon modeled dimensions in property law in an argumentation framework. This paper was particularly helpful to me to finally get a grip on what dimensions are; in effect, they are finer-grained factors with an ordering over them. For example, possession ranges from an animal roaming free, a chase being started, hot pursuit, mortally wounding the animal, to actual bodily possession. In another paper, Trevor modeled a set of US Supreme Court cases, raising a series of important questions about how the reasoning of the Supreme Court could be modeled. Douglas Walton’s paper gave some samples of argumentation schemes. Henry Prakken presented an analysis in his argumentation framework of a case concerning disability assessment. Finally Kevin Ashley gave an overview of some aspects of legal case based reasoning which, he claims, ought to have some ontological representation. This paper is relevant to my research on ontologies as Kevin pointed out a range of elements which may be considered for inclusion in an ontology. My main reservation is that there ought to be some clear distinction between the ontology (domain knowledge) and the rules that apply to the elements of the ontology.
More information about the NaLELA workshop can be found at the website for Natural Language Engineering of Argumentation.
There were three other workshops I did not have time to attend — the workshop on E-discovery/E-disclosure, privacy and protection in web-based social networks, legal and negotiation decision support systems.

Legal Taxonomy

Introduction
In this post, I comment on Sherwin’s recent article Legal Taxonomy in the journal Legal Theory. It is a very lucid, thorough, and well-referenced discussion of the state-of-the-art in taxonomies of legal rules. By considering how legal taxonomies organise legal rules, we better understand current conceptions of legal rules by legal professionals. My take away message from the article is that the analysis of legal rules could benefit from some of the thinking in Linguistics and Computer Science, particularly in terms of how data is gathered and analysed.
Below, I briefly outline ideas concerning taxonomies and legal rules. Then, I present and comment on the points Sherwin brings to the fore.
Taxonomies
Taxonomy is the practice and science of classification of items in a hierarchical IS-A relationship, where the items can be most anything. The IS-A relationship is also understood as subtypes or supertypes. For example, a car is a subtype of vehicle, and a Toyota is a subtype of car; we can infer that a Toyota is a subtype of vehicle. Each subtype has more specific properties than the supertype. In some taxonomies, one item may be a subtype of several supertypes; for example, a car is both a subtype of vehicle and a subtype of objects made of metal, however, not all vehicles are made of metal, nor are all things made from metal vehicles, which indicates that these types are distinct. Taxonomies are more specific than the related term ontologies, for which a range of relationships beyond the IS-A relationship may hold among the items such as is owned by or similar. In addition, ontologies generally introduce properties of elements in the class, e.g. colour, engine type, etc. Classifications in scientific domains such as Biology or Linguistics is intensely debated and revised. It would be expected that this would be even more so true in the legal domain which is comprised of intellectual evidence rather than empirical evidence as in the physical sciences and where the scientific method is not applied.
Legal Rules
First, let us be clear about what a legal rule is with a clear example following Professor David E. Sorkin’s example . A legal rule is a rule which determines whether some proposition holds (say of an individual) contingent on other propositions (the premises). For example, the state of Illinios assault statute specifies: “A person commits an assault when, without lawful authority, he engages in conduct which places another in reasonable apprehension of receiving a battery.” (720 ILCS 5/12-1(a)). We can analyse this into the legal rule:

    A person commits assault if

      1. the person engages in conduct;
      2. the person lacks lawful authority for the conduct;
      3. the conduct places another in apprehension of receiving a battery; and
      4. the other person’s apprehension is reasonable.

Optimally, each of the premises in a rule should be simple and be answerable as true or false. In this example, where all four premises are true, the conclusion, that the person committed assault, is true.
There are significant issues even with such simple examples since each of the premises of a legal rule may itself be subject to further dispute and consideration; the premises may be subjective (e.g. was the conduct intentional), admit degrees of truth (e.g. degree of emotional harm), or application of the rule may be subject to mitigating or aggravating circumstances. The determination of the final claim follows the resolution of these subsidiary disputes and considerations. In addition, some legal rules need not require all of the premises to be true, but allow a degree of counterbalancing evaluation of the terms.
The Sources of Legal Rules
Sherwin outlines the sources of the rules:

      Posited rules, which are legal rules as explicitly given by a legal authority such as a judge giving a legal decision.
      Attributed rules, which are legal rules that are drawn from a legal decision by a legal researcher rather than by a legal authority in a decision. The rule is implicit in the other aspects of the report of the case.
      Ideal rules, which are rules that are ‘ideal’ relative to some criteria of ideality, say morally or economically superior rules.

Purposes of Classification
In addition, we have the purposes or uses of making a classification of legal rules.

      Facilitating the discussion and use of law.
      Supporting the critical evaluation of law
      Influencing legal decision-making

In the first purpose, the rules are sorted into classes, which helps to understand and manage legal information. In Sherwin’s view, this is the most basic, formal, and least ambitious goal, yet it relies on having some taxonomic logic in the first place. The second purpose, the rules are evaluated to determine if they are serving the intended purpose as well as to identify gaps or inconsistencies. As Sherwin points out, the criteria of evaluation must then also be determined; however, this then relates to the criteria which guides the taxonomy in the first place, a topic we touch on below. The final purpose is a normative one, where the classification identifies the normal circumstances under which a rule applies, thereby also clarifying those circumstances in which the rule does not apply. Sherwin points out that legal scholars vary in which purpose they find attractive and worth pursuing.
While I can appreciate that some legal scholars might not find the ‘formal’ classification of interest, I view it from a different perspective. First, any claim concerning the normative application of one rule instead of another rest entirely on the intuitive presumption that the rules are clearly different. This is a distinction that the first level can help to clarify. Similar points can be made for other relationships among rules. Second, focusing on the latter stage does not help to say specifically why one rule means what it does and has the consequences as intended; yet surely this is in virtue of the specific ‘content’ of the rule, which again is clarified by a thorough going analysis at the first stage. Third, if there is going to be any progress in applied artificial intelligence and law, it will require the analytical elements defined at the first stage. Fourth, as the study of Linguistics has shown, close scrutiny at the first stage can help to reveal very issues and problems that are fundamental to all higher stages. Fifth, providing even a small, clear sample of legal arguments analysed along other lines of the first stage can give the community of legal scholars a common ‘pool’ of legal arguments to fruitfully consider at the later stages; along these lines, it is notable how few concrete, detailed examples Sherwin’s paper discusses. Not surprisingly, some of the issues Sherwin raises about the purposes of different ‘levels’ of analysis also appear in the linguistic literature. In my view, though the first stage may not be interesting to most legal professionals, there are very good reasons why it should be.
Criteria of Taxonomy
Several different criteria which guide the taxonomy of legal rules are discussed.

      Intuitive similarity: whether researchers claim that two rules are subtypes of one another.
      Evolutionary history: the legal rule is traced in the history of the law.
      Formal classification: the logical relations among categories of the law.
      Function based: a function from the problem to a set of solutions.
      Reason based: the higher-level reasons that explain or justify a rule.

Sherwin criticises judgements based on intuitive similarity since the taxonomers may be relying on false generalisations rather than their own intuitions and that intuition can be arbitrary and without reason. This is also the sort of criticism leveled at large segment of linguistic research and which has been shown to be misleading. Of course, one must watch false classifications and try to provide a justification for classifying one element in one class and not another. One way to do this is, as in psycholinguistics, is to provide tests run over subjects. Another way is to refine the sorts of observations that lead to classifications. In general, all that we currently know about language, from dictionaries, to grammars, to inference rules is based on linguistic intuitions. Some, such as the rules of propositional logic, have been so fixed that they now seem to exist independent of any linguistic basis.
The issue here is somewhat related to classification by formal logical relations. It is unclear what Sherwin thinks logical relations are and how they are applied. What we do have more clarity on are some of the criteria for such a formal taxonomy: accounting for all legal materials, a strict hierarchy, consistent interpretation of classes, and no overlap of categories. This is but one way to consider a formal hierarchy; indeed, there is a separate and very interesting question about what formal model of classification best suits a legal taxonomy. Yet, this issue is not explored in the article.
The function based approach seems to have meta categories. For example, the rule above can be seen as a function from circumstances to a classification of a person as having committed an assault. However, this is not what appears to be intended in Sherwin’s discussion. Rather, there are meta-functional categories depending on higher level problems and solutions. The examples given are Law as a Grievance-Remedial Instrument and Law as an Administrative-Regulatory Instrument. For me, this is not quite as clear as Sherwin makes it appear.
The reason approach organises rules according to an even higher-level of the rule — the justification or explanation of the rule. Some of the examples are that a wrongful harm imposes an obligation for redress, deterring breaches of promises facilitate exchange, or promoting public safety. In my view, these are what people (e.g. Professor Bench-Capon) in AI and Law would call values which are promoted by the legal rule. Sherwin discusses several different ways that reason based classification is done: intended, attributed, and ideal rationales. In my view, the claimed differences are not all that clear or crucial to the classification. In some cases, the rationale of a legal rule is given by the adjudicator. However, where this is not so, the rationale is implicit and must be interpreted, which is to give the intended rationale. In other cases, legal researchers examine a body of law and provide rationales, which is the attributed rationale. In this sense, the intended and attributed rationales are related (both interpreted), but achieved by different methods (study of one case versus study of a body of cases and considerations about the overall purpose of the law). Finally, there are ideal rationales, which set out broad, ideal goals of the legal rule, which may or may not be ‘ideally’ achievable. In this, the difference between intended/attributed and ideal is whether the rationale is analysed out of cases (bottom-up) or provided legislatively (top-down). In the end, the result is similar — legal rules are classified with respect to some rationale. The general problem with any such rationale is just how it is systematically given and itself justified so as to be consistent and not to yield conflicting interpretations of the same legal rule. Finally, Sherwin seems to think that there is some intrinsic conflict or tension between formal classification and reason based classification. I don’t agree. Rather, the difference is in the properties and methods being employed to make the classification, which are not inherently in conflict. Likely, a mixed approach will yield the most insights.
Copyright © 2009 Adam Wyner

AI and Law contacts in Boston, MA

On a recent visit to Boston, Massachusetts USA, I had the opportunity to visit the Berkman Center for Internet and Law. It so happened that Kevin Ashley of the University of Pittsburgh was visiting the center for a meeting about legal document assembly. Kevin is a well-known expert in AI and Law, specialising in case based systems for legal reasoning that are geared towards helping law students learn to reason about the law. Kevin and I have met before and have a shared interest in ontologies and case based reasoning. We discussed research trends in legal case based reasoning, funding sources, and lines of collaborative research. I also found out that Richard Susskind, author of The End of Lawyers? was giving a talk at the Berkman center the same day, so I attended that talk, which was a distillation of his recent book. As it turned out Edwina Rissland of the University of Massachusetts at Amherst, another key figure in AI and Law (and Kevin’s former thesis advisor), was also at Susskind’s talk and also participating in the legal document assembly meeting. I’d not met her before, so it was a treat to have a brief chat. Finally, I met with Carole Hafner of Northeastern University, another central figure in legal case based reasoning. Carole was particularly helpful in drawing my attention to some of the earlier key articles on these topics by her and by Edwina. I meet with Carole every time I’m in the Boston area to get her views on AI and Law. In all, a very sociable and informative series of discussions.
Copyright © 2009 Adam Wyner

Further Considerations on "The End of Lawyers"

In a previous post on Susskind’s The End of Lawyers, I briefly outlined some of the technologies Susskind discusses, then pointed out several fundamental technologies which he does not discuss in depth, but which I believe will have significant impact on the legal profession.
One topic which I did not discuss in that post was why the legal profession is so slow to adopt legal technology. Susskin points out several:

      Billable hours — the legal profession makes money by the hour, which is a disincentive to make legal processes more efficient.
      Conservativism and status — the legal profession has a long and distinguished position which changes slowly.
      Government funding — while governments may recognise the value of legal technologies, investing in them is another matter (though see the recent e-Government awards in the previous post).
      Information availability — only recently have legal documents (legislation, cases, other government information) been made publically and electronically available.

I think (and believe my colleagues in AI and Law would agree) that these are very significant contributing factors in the slow adoption of legal technolgoies by legal professionals, firms, and governments. But, there are others, and I believe that by identifying them, we can then make progress to addressing the problems that they raise.
To help us clarify the issues, we can compare and contrast the legal profession to another very ancient and prestigious profession — medicine. However, doctors, medical organisations, and medical researchers have adopted and advanced technologies very rapidly and on a large scale, yet the technologies are, at their core, similar to those available to legal professions. Therefore, technologically, there is little reason why the legal profession has not also adopted the technologies or more aggressively sought to adapt them.
While there are systems to support reasoning by doctors and medical records filing and retrieval, let me focus on two technologies which are equally available, fundamental, and influential to legal and medical professions — information extraction and ontologies.
In the medical field, there are are large corpora of textual information that must extract relevant information. The corpora are and have been publicly available for some time. There are academic and industry groups that have and develop software systems to extract the information (e.g. National Centre for Text Mining and Linguamatics, among others). Through links, one can find conferences, other groups, and government organisations; the interest is deep, widespread, and of high value. Moreover, medical ontologies are very far advanced such as the Systematised Nomenclature of Medicine Clinical Terms and the Foundational Model of Anatomy among others.
In the legal field, the corpora of textual information is only just beginning to be available. There has been some research on information extraction in the legal field. There has been some work on legal ontologies (e.g. LKIF, which was an EU project that I participated in).
In both areas — information extraction and ontologies — the medical field far outstrips the legal field. Why?
I think the differences are not so much those outlined above; one could argue that medical and legal fields have had, at least historically, similar constraints — the medical field has just overcome them. The most obvious apparent difference is that research medicine has been and continues to be advanced with scientific and technological means. Other research fields — biology, chemistry, statistics, anatomy — made relevant contributions. Moreover, the medical field has large research bodies that are well-funded (e.g. The Wellcome Trust). Finally, the culture of medical research and application of findings is such that information is disseminated, criticised, and verified. Let us put these into four points:

  • Scientific and technological approach
  • Contributions from other fields
  • Research bodies
  • Culture of research

In these respects, the legal field is very different to the medical field. Science and technology have not, until very recently, been relevant in terms of how the law is practiced. While there have been some contributions from other fields (e.g. sociology or psychology), the impact is relatively low. There are research bodies, but they not of the scale or influence of that in medicine. And the disposition of the legal community has been to closely hold information.
I believe that there is single (though very complex) underlying reason for the difference — the object of study. In medicine, the objects of study are physical, whether in chemistry, biology, anatomy, etc; these objects are and have been amenable to scientific study and technological manipulation. In contrast, in law, the object of study is non-physical; one might be tempted to say it is the law itself, but we can be more concrete and say it is the language in which the law is expressed, for at least language is something tangible and available for study.
Thus, the scientific study of language — Linguistics — is relevant. However, Linguistics as a scientific endeavour is relatively young (50 to 100 years, depending on one’s point of view). The technological means to study language can be dated to the advent of the digital computer which could process language in terms of strings of characters. Widespread, advanced approaches to computational linguistics for information extraction is even more recent — 10 to 20 years. Very large corpora and the motives to analyse them arose with the internet. And not only must we understand the language of the law, but we must also understand the legal concepts as they are expressed in law. Here, the study of deontic reasoning, while advanced, is “only” some 20 years old and has found few applications (see my 2008 PhD thesis Wyner 2008 PhD Thesis).
Language is the root of the issue; it can help explain some of the key differences in the application of technology to the legal field In our view, as the linguistic and logical analyses of the language of the law advance, so too will applications, research bodies, and results. However, it is somewhat early days and, in comparison to the medical field, there is much yet to be done.
Copyright © 2009 Adam Wyner