PhD Projects in AI and Law


Below, there are a range of PhD project in AI and Law. Other project proposals in the general AI and Law topic area are also welcome. I am interested in NLP (information extraction, semantic representation, controlled natural languages, classification, chatbots, dialogue/discourse), argumentation, various forms of legal reasoning (case-based reasoning, client consultations, legislation/regulations, court proceedings, contracts), and ontologies/knowledge graph.

An Argument Chatbot

Chatbots are a popular area of development. In this project, you develop a chatbot for policy-making, legal consultations, or scientific debates. The chatbot should be capable of various dialogue types such as information-seeking, deliberation, and persuasion; in addition, the dialogue should be tied to patterns of argument and critical questions. The underlying techniques will be natural language processing (rule-based and machine learning), structured argumentation, and knowledge representation and reasoning. The project may be done in collaboration with IBM UK, working with IBM scientists and engineers.

Argument Extraction and Reconstruction

The goal is to identify textual passages which indicate argument and rhetorical structure (premises, claim, continuation) or argumentation schemes (patterns of everyday reasoning such as Expert Witness, Practical Reasoning, Commitment, etc). The student will review some background literature, analyse a selection of argumentation schemes, identify the particular elements to be extracted using an NLP tool, create the processing components, carry out a small evaluation exercise, and connect the NLP output to a computational argumentation tool. The particular corpus of text is to be determined.

Textual Entailment

Textual entailment is about taking a sentence or passage and drawing inferences from it, for example, the sentence “Bill turned off the light” implies “The light was off”. There are several available NLP tools to develop textual entailment. In this project, the student will apply the textual entailment tools to the corpus, evaluate them against a “gold standard”, then modify a tool to improve performance. There are existing corpora of texts to train and evaluate textual entailment.

Contrast Identification

Debates express contrasting positions on a particular topic of interest. A key problem is to determine the semantic contrariness of the positions as expressed by statements within the positions. Such a task is relatively easy for people to do, but difficult for automated identification since there are many linguistic ways to express contrasts, some of which may be synonymous. Annotation of contrast would help support semi-automatic construction of arguments and counter-arguments from text. The student will review some background literature, analyse a selection of contrasting expressions, identify the particular elements to be extracted using an NLP tool, create the processing components, carry out a small evaluation exercise, and connect the NLP output to a computational argumentation tool.

Classification of legal texts

In legal texts such as legislation or case law, different segments of the text serve different purposes. For example, one portion may be a statement of facts, while another is a statement of a rule. The project specifies the portions and classifications of a corpus of legal texts, creates a gold standard, then applies machine learning techniques to classify the portions to a high level of accuracy. Another topic within this area is legal decision prediction, wherein legal decisions (cases) are classified in various ways.

Bar Exam

In the US, to become a lawyer, a Bar Exam must be taken and past. The Bar Exam consists of 200 multiple choice questions, covering an extensive range of legal topics. The task in the project is to classify, using machine learning, the questions in the Bar Exam and to design a system to pass the Bar Exam, using techniques from NLP, logic, and machine learning.

A Controlled Natural Language with Defeasibility

Controlled Natural Languages (CNLs) are standardised, formal subsets of a natural language (such as English), which are both human readable and machine processable. Several CNLs have been developed, such as IBM’s ITA Controlled English (CE) or Business Rules Language (BRL), OMG’s Semantics of Business Vocabulary and Business Rules, and several academic languages. Some CNLs support ‘strict’ reasoning for ontologies, terminologies, or Predicate Logic, which is sufficient in many contexts. However, defeasible reasoning is essential in other contexts where there is inconsistent and partial knowledge, such as in political, legal, or scientific debates.  The project explores the representation of and reasoning with defeasibility in a CNL, which could lead to a CNL that has much wide applicability and impact. The project can be done in collaboration with IBM UK, working with IBM scientists and engineers. The project can be either a theoretical study or an implementation (or a mixture of both). The supervisor has extensive background CNLs and argumentation/defeasibility.

Rule Extraction from Legislation or Case Law

Legal texts (legislation, regulations, and case law) provide the “operational legal rules” for businesses, organisations, and individuals. It is important to be able to identify and extract such rules, particularly for rulebook compliance or to transform rules in natural language into machine-readable, executable rules. The student will analyse a selection of regulations, identify the particular elements to be extracted using NLP, create the processing components, translate rules from natural language to executable rules, draw inferences, and evaluate the results.

An Expert System to Support Reasoning in Juries

Jury trials are a fundamental aspect of the Common Law legal system in the UK and USA. In jury trials, jurors are members of the public who are required to reason about the facts of the case and about the legal rules to arrive at a decision (e.g. whether the plaintiff is guilty or innocent). This is a difficult and important task for a person to do who is not schooled in the law. Fortunately, in some jurisdictions, there are standardised “catalogues” of jury instructions to guide the jurors in how to reason. In this project, the student analyses a selection of jury instructions and implements them as an interactive juror decision support tool.

Legal Case Based Reasoning

Case based reasoning is about using known information to determine unknown problems. Legal case based reasoning is the structure of legal reasoning in courts in the UK and the USA. The project will implement several existing formalisations of legal case based reasoning.

Logical Formalisations of the Law

The law can be formalised in a variety of ways, and there are tools and techniques to support the task. Such formalisations can be queried and inferences drawn. The project will examine existing tools, see what can be improved, and provide fragments of formalised law.

Legal Ontologies/Knowledge Graph

In an ontology/KG, domain knowledge about entities, their properties, and their relations are formally represented. Representations also facilitate querying, extraction, linking, and inference. There are legal ontologies/KGs that represent the law, legal processes, and legal relationships. The project will examine existing legal ontologies, augment them, and build a richer ontological representation using existing tools.

Abstract Contract Calculator in Haskell

Create a program in Haskell, which is a functional programming language, to execute ‘theoretical’ legal contracts, which are contracts that have the form of an actual legal contract, but not the content.

Seminar Presentation at Aberdeen Law School

I was recently invited to give a seminar at the Law School at the University of Aberdeen about AI and Law related research in general and text analytics for legal studies in particulars. Though it was held at 16:00 on a Friday (!) it was well attended (thanks to all who came), and there was good discussion afterwards. I hope this is the start of a collaboration between me and my colleagues in the Law School.
The slides have some references and links that might be interesting. Click on the title link for the slides.
Textual Processing of Legal Cases
Adam Wyner
Shortlink to this page.
By Adam Wyner

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Research Associate on the IMPACT Project at University of Liverpool

As of September 13, 2010, I have been working at the University of Liverpool, Department of Computer Science with Katie Atkinson (the PI) and Trevor Bench-Capon on the IMPACT Project (previously having worked on the project at the University of Amsterdam at the Leibniz Center for Law and also at the University of Leeds at the Centre for Digital Citizenship). I previously worked with Katie and Trevor on the ESTRELLA Project.
IMPACT: Integrated Method for Policy Making Using Argument Modelling and Computer Assisted Text Analysis

The IMPACT Project is a European Framework 7 project (Grant Agreement No 247228) in the ICT for Governance and Policy Modeling theme (ICT-2009.7.3). The project runs from January 2010 to December 2013.
IMPACT will conduct original research to develop and integrate formal, computational models of policy and arguments about policy, to facilitate deliberations about policy at a conceptual, language-independent level. These models will be used to develop and evaluate innovative prototype tools for supporting open, inclusive and transparent deliberations about public policy. To support the analysis of policy proposals in an inclusive way which respects the interests of all stakeholders, research on tools for reconstructing arguments from data resources distributed throughout the Internet will be conducted. (from Atkinson’s website).

Looking forward to working on these topics!
By Adam Wyner
Distributed under the Creative Commons
Attribution-Non-Commercial-Share Alike 2.0

Session I of "Automated Content Analysis and the Law" Workshop

Today is session I of the NSF sponsored workshop on Automated Content Analysis and the Law. The theme of today’s meeting is the state of judicial/legal scholarship in order to:

  • Identify the theoretical and substantive puzzles in legal and judicial scholarship which might benefit from automated content analysis
  • Discuss the kinds of data/measures that are required to address these puzzles which automated content analysis could provide.

Further comments later in the day after the session.
–Adam Wyner
Copyright © 2009 Adam Wyner

ICAIL 2009 Workshops

Last week, I attended the 12th International Conference on Artificial Intelligence and Law in Barcelona, Spain. Tuesday-Thursday were given to the main conference while Monday and Friday were for workshops. This post makes a few remarks about the workshops.
On Monday, I attended two workshops. In the morning, I was at Legal Ontologies and Artificial Intelligence Technique (LOAIT), while in the afternoon, I attended Modeling Legal Cases, where I presented a paper An OWL Ontology for Legal Cases with an instantiation of Popov v. Hayashi. On Friday, I was at the morning workshop which I organized with Tom van Engers Natural Language Engineering of Legal Argumentation (NaLELA), where I presented a paper by Tom and I where we outline our approach to engineering argumentation.
At LOAIT, we heard reports about ongoing projects to provide legal taxonomies or ontologies for mediation, norms, business process specification, legislative markup, and acquisition of ontologies. Most of this work is still in the research phase, though some of it has been applied to samples in the domain of application.
At the modeling workshop, we heard a paper by Bex, Bench-Capon, and Atkinson about how to represent motives in an argumentation framework. Basically the idea is to make the notion of motives something explicit by putting it as a node in an argumentation graph; as such, the motives can be attacked and reasoned with. Trevor Bench-Capon modeled dimensions in property law in an argumentation framework. This paper was particularly helpful to me to finally get a grip on what dimensions are; in effect, they are finer-grained factors with an ordering over them. For example, possession ranges from an animal roaming free, a chase being started, hot pursuit, mortally wounding the animal, to actual bodily possession. In another paper, Trevor modeled a set of US Supreme Court cases, raising a series of important questions about how the reasoning of the Supreme Court could be modeled. Douglas Walton’s paper gave some samples of argumentation schemes. Henry Prakken presented an analysis in his argumentation framework of a case concerning disability assessment. Finally Kevin Ashley gave an overview of some aspects of legal case based reasoning which, he claims, ought to have some ontological representation. This paper is relevant to my research on ontologies as Kevin pointed out a range of elements which may be considered for inclusion in an ontology. My main reservation is that there ought to be some clear distinction between the ontology (domain knowledge) and the rules that apply to the elements of the ontology.
More information about the NaLELA workshop can be found at the website for Natural Language Engineering of Argumentation.
There were three other workshops I did not have time to attend — the workshop on E-discovery/E-disclosure, privacy and protection in web-based social networks, legal and negotiation decision support systems.

How To Shepardize in Law and AI

In common law systems such as in the US and the UK, cases which have been decided by judges (precedents) play a critical role in determinations of current, undecided cases. One of the critical reasoning principles in case based reasoning is stare decisis, which is a principle of legal conservatism — current decisions should adhere to or abide by past decisions unless specifically overturned. It is critical then to be able to identify not only what precedents bear on the current case, but also whether those precedents still represent good law, that is, legal decisions which have not been overturned. The legal researcher must search through the case base identifying those good precedents.
Shepard’s Citations is a compilation of court opinions and the relationships among the cases. To examine a current case in light of precedents is called Shepardization. An online tutorial can be found at:
How to Shepardize
An article from the journal Artificial Intelligence bearing on aspects of automated shepardization is:
Information extraction from case law and retrieval of prior cases

The Taxpayer Assets Project

In the 1990s, there was an coordinated effort by a spectrum of individuals and organisations such as Ralph Nader of the Consumer’s Union and Prof. Carole Hafner of Northeastern University to gain free access to legal information. This was called the Taxpayer Assets Project (TAP) (also referred to as the JURIS system or The Crown Jewels). An initial story is:
Taxpayer Assets Project
Two documents by Prof. Hafner on TAP:
Letter to Reno
Competition for Legal Information
And a summary of how the Clinton administration did not support the development of JURIS:
Decision not to support JURIS
Articles about recent efforts along the same lines can be found at News Media links.