Papers Accepted to the JURIX 2011 Conference

My colleagues and I have had two papers (one long and one short) accepted for presentation at The 24th International Conference on Legal Knowledge and Information Systems (JURIX 2011). The papers are available on the links.
On Rule Extraction from Regulations
Adam Wyner and Wim Peters
Abstract
Rules in regulations such as found in the US Federal Code of Regulations can be expressed using conditional and deontic rules. Identifying and extracting such rules from the language of the source material would be useful for automating rulebook management and translating into an executable logic. The paper presents a linguistically-oriented, rule-based approach, which is in contrast to a machine learning approach. It outlines use cases, discusses the source materials, reviews the methodology, then provides initial results and future steps.
Populating an Online Consultation Tool
Sarah Pulfrey-Taylor, Emily Henthorn, Katie Atkinson, Adam Wyner, and Trevor Bench-Capon
Abstract
The paper addresses the extraction, formalisation, and presentation of public policy arguments. Arguments are extracted from documents that comment on public policy proposals. Formalising the information from the arguments enables the construction of models and systematic analysis of the arguments. In addition, the arguments are represented in a form suitable for presentation in an online consultation tool. Thus, the forms in the consultation correlate with the formalisation and can be evaluated accordingly. The stages of the process are outlined with reference to a working example.
Shortlink to this page.
By Adam Wyner

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Draft Materials for LEX 2011

Draft post
At the links below, you can find the slides and hands on materials on GATE for the LEX summer school on Managing Legal Resources in the Semantic Web.
GATE Legislative Rulebook
By Adam Wyner

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

TO BE UPDATED: Instructions for Online Collaborative Legal Case Annotation Task

TO BE UPDATED for the SPLeT 2012 task. The information here and in the links here are out of date. The material is being updated for the task, so please return at a later date or email the authors. Thanks for your interest.
— Adam
Wim Peters and I ran a pilot experiment in online, collaborative annotation for legal case factors. The slides are below. Now that we know more about how to present such materials, we need to find a cooperative population of law students to scale up and deepen the work.
Annotating Legal Case Factors with GATE TeamWare
By Adam Wyner

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

LOAIT Workshop Paper on Legal Text Annotation

A paper I presented at 4th Workshop on Legal Ontologies and Artificial Intelligence Techniques is to appear in the journal Rivista Informatica e diritto, an Italian journal on AI and Law.
Towards Annotating and Extracting Textual Legal Case Elements
Adam Wyner
Abstract
The paper presents an outline of a method for semantic, conceptual search in legal case documents using the GATE tool.
By Adam Wyner

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

On ICAIL 2011 Discussion on Legal Corpus Development and Text Analytics

In this note, I point to various parts of a discussion on developing and analysing legal textual data raised at ICAIL 2011. Please feel free to add comments to this document (or to me in person, by email, on your blog and linked to this, etc), which I can then add to the post (I’m very happy to attribute contributions). The intention is to stimulate discussion on these matters to help the community of researchers move ahead on common interests.
Corpus Development
Unlike the situation from several years ago, we have accessible sources of large corpora of legal textual information. The World Legal Information Institutes provide free, independent and non-profit access to worldwide law. For example, one can go to the US site and download cases: United States v Grant [1961] USCA9 19; 286 F.2d 157 (19 January 1961); one can request zipped files or screen scrap cases. The LIIs have introduced standardised references and formats for cases. There are boolean and regex searches.
From the contacts that I have had (e.g. in the US and UK), the LIIs would be very happy to collaborate with academic researchers in the analysis of their data and in keeping with their primary mission. In particular, developing tools that can be integrated and deployed with their platforms might be a way to go, thereby addressing significant platform and dissemination issues.
Another source of corpora is public.resource.org, which distributes a range of corpora covering legislation, codes, and cases.
Analysis and Annotation
There are a range of issues about information retrieval and extraction. Others can speak about IR, statistical, machine learning approaches. What I know better is annotation, whether fully or semi automatic and manual. Here we have issues about what to annotate and how. Some low level information is unproblematic (e.g. entities of a range of sorts, sections, and sentiment); higher level information (e.g. factors) might be more complex. I have some suggestions for annotations for low level information; a good starting point for factors are the CATO factors, though there is a general issue about how to extend factor identification to other domains (CATO factors are specific for intellectual property).
One general problem with analysis is that different researchers might use different tools in their work and just report the results. This means results are not interchangeable, which is particularly problematic with annotation work. If a common ‘framework’ tool is used and some consensus is developed about (at least) low level annotation types, then work can proceed more collaboratively, transparently, and reproducibly. One can develop a more forceful argument for researchers (public service bodies and information providers) to promote such an open development methodology (among them are justification and traceability, see Wyner and Peters 2010 and David Lewis’s ICAIL 2011 keynote address on related points). General Architecture for Text Engineering is an open framework for text processing modules.
There are ‘open’ systems for text annotation — Open Calais and Open Up platform’s data enrichment service from The Stationery Office. However, there are intellectual property issues that need to be considered.
Another general issue is how to carry out manual annotation, for example to build gold standards, which are required for machine learning systems. There has been significant progress, for example, with TeamWare, which provides for curated, web-based annotation tools along with annotation analysis (e.g. inter-annotator agreement). For a short tutorial (for an experiment) on using TeamWare for annotation of some legal case factors, see Web-based Annotation Support for the Law. Wim Peters and I proposed to law school faculty to use this tool to support their student exercises for first and second year students since these exercises often require identifying and extracting information from cases. Wim and I think integrating annotation exercises into legal e-learning could both help to develop large annotated sets of data and to serve an important educational purpose. See our paper about some of these points and proposals.
Research Questions
Large corpora can be formed, tools can be applied to them, but for fund raising, the community needs to develop a range of motivating research questions and use cases. Asides from questions pursued in the AI and Law community, we might consult further with public bodies (National Center for State Courts and similar), legal information service providers (Lexis-Nexis, ThomsonReuters, Practical Law Company, law societies, political scientists, etc. The kinds of answers we look for partially guide how we structure not only the corpora, but moreso the annotations.
Funding Opportunities
Digging into Data and the Request for Proposals, but the due date is June 16 (I had been working on a proposal, but needed better research questions to hold local interest). Though the deadline is too soon to submit a proposal, it does demonstrate a widespread interest in funding bodies in the development and analysis of large corpora in the humanities and social sciences. The other obvious funding sources are national (US, UK, French, etc) and international (EU and Digging into Data).
By Adam Wyner

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

General Architecture for Text Engineering Summer School 2011

I had the opportunity (thanks Katie Atkinson!) to attend the General Architecture for Text Engineering Summer School 2011. The GATE people have really developed this summer school very well. It was well attended (70 participants?) and well structured (three sections and various talks). GATE attacts a good, outgoing, helpful, and diverse group of people. A whole week of GATE and never a dull moment. Geeky, but true. And text analytics seems to be a growing area (at least according to the May 2011 issue of New Scientist, which lists it as one of seven “disruptive” technologies; I’ve always wanted to be bad).
As this was my second time at the GATE summer school, I sat in on the Advanced GATE session. All the slides and all the materials for hands on exercises are available on the GATE Summer School Wiki. In my week, we covered the following:

  • Module 9: Ontologies and Semantic Annotation
    • Introduction to Ontologies
    • GATE Ontology Editor
    • GATE Ontology Annotation Tools for Entities and Relations
    • Automatic Semantic Annotation in GATE
    • Measuring Performance
    • Using the Large Knowledge Base gazetteer (LKB)
  • Module 10: Advanced GATE Applications
    • Customising ANNIE
    • Working with different languages
    • Complex applications
    • Conditional Processing
    • Section-by-section processing
  • Module 11: Machine Learning
    • Machine learning and evaluation concepts
    • Using ML in GATE
    • Engines and algorithms)
    • Entity learning hands-onl session
    • Relation extraction hands-on session
  • Module 12: Opinion Mining
    • Introduction to opinion mining and sentiment analysis
    • Using GATE tools to perform sentiment analysis
    • Machine learning for sentiment analysis hands-on session
    • Future directions for opinion mining
  • Module 13: Semantic Technology and Linked Open Data: Basics, Tools, and Applications
    • Linked Open Data: Introduction of key principles and some key tools (FactForge, LinkedLifeData)
    • Semantic Annotation with Linked Data
    • Semantic Search

By Adam Wyner

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Talk at BILETA 2011

I’m giving a talk tomorrow, April 11 2011, at BILETA, the annual conference of the British & Irish Law, Education and Technology Association at Manchester Metropolitan University School of Law. My collaborators are Wim Peters (University of Sheffield) and Fiona Beveridge (University of Liverpool).
The abstract and slides are below:
Web-based Software Tools to Support Students’ Empirical Study of the Law
Adam Wyner (University of Liverpool, Computer Science), Wim Peters (University of Sheffield, Computer Science), and Fiona Beveridge (University of Liverpool, Law School)
The paper investigates and proposes tools to support students in empirically investigating legal cases using text analytic software. Web-based tools can be used to engage and leverage the collective skills and ambitions of law students to crowd-source the development of legal resource materials. Law school students must develop skills in close textual analysis of legal source material such as legal cases. To use source material such as case decisions to reason about how precedents apply in case-based reasoning, law students must learn to identify a range of elements in legal cases, for example, parties, jurisdiction, material facts, legislative and case citations, cause of action, ratio decideni, and others. Moreover, students should be able to address complex queries to a case or a case base (a corpus of cases) in order to answer questions of particular legal interest; for example, about relationships between a judge, parties, cause of action, and ratio. Currently students either simply rely on their own analytic abilities to read a case or find answers to questions; legal search tools (e.g. Lexis-Nexis) provide search support, but are restricted to a limited number of coarse-grained parameters and cannot search for deep, particular semantic relationships in the text. To enable automated support of queries of the corpus, and so enable deep empirical research on cases, it is essential to have a corpus of legal cases which are annotated with machine readable (XML) tags that signal the semantic properties of passages of text. To create such a corpus requires a tool to annotate the text. Such a tool would reinforce students’ examination of the source document. The paper describes recent developments of tools using Semantic Web technologies, text analysis, and web-based annotation support. With the text analysis software, General Architecture for Text Engineering (GATE), which is customised for legal applications, law students can annotate legal cases for a fine-grained range of legally relevant concepts and linguistic relations; they can also use GATE to write grammars and automatically annotate the text. Using GATE TeamWare, an online text annotation tool that automatically evaluates interannotator agreement, students can collaboratively analyse and agree on a gold standard corpus of legal cases. The corpus can be automatically indexed using Lucene, thereby allowing fast results to complex queries over any string or annotation used.
The slides of the talk are here