text mining – Page 3 – Logic Language Law Computing

General Architecture for Text Engineering Summer School 2011

I had the opportunity (thanks Katie Atkinson!) to attend the General Architecture for Text Engineering Summer School 2011. The GATE people have really developed this summer school very well. It was well attended (70 participants?) and well structured (three sections and various talks). GATE attacts a good, outgoing, helpful, and diverse group of people. A whole week of GATE and never a dull moment. Geeky, but true. And text analytics seems to be a growing area (at least according to the May 2011 issue of New Scientist, which lists it as one of seven “disruptive” technologies; I’ve always wanted to be bad).
As this was my second time at the GATE summer school, I sat in on the Advanced GATE session. All the slides and all the materials for hands on exercises are available on the GATE Summer School Wiki. In my week, we covered the following:

Module 9: Ontologies and Semantic Annotation
- Introduction to Ontologies
- GATE Ontology Editor
- GATE Ontology Annotation Tools for Entities and Relations
- Automatic Semantic Annotation in GATE
- Measuring Performance
- Using the Large Knowledge Base gazetteer (LKB)

Module 10: Advanced GATE Applications
- Customising ANNIE
- Working with different languages
- Complex applications
- Conditional Processing
- Section-by-section processing

Module 11: Machine Learning
- Machine learning and evaluation concepts
- Using ML in GATE
- Engines and algorithms)
- Entity learning hands-onl session
- Relation extraction hands-on session

Module 12: Opinion Mining
- Introduction to opinion mining and sentiment analysis
- Using GATE tools to perform sentiment analysis
- Machine learning for sentiment analysis hands-on session
- Future directions for opinion mining

Module 13: Semantic Technology and Linked Open Data: Basics, Tools, and Applications
- Linked Open Data: Introduction of key principles and some key tools (FactForge, LinkedLifeData)
- Semantic Annotation with Linked Data
- Semantic Search

By Adam Wyner

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

ICAIL 2011 Tutorial: Textual Information Extraction from Legal Resources Using GATE

Slides for ICAIL tutorial, Monday, June 6, 2011, University of Pittsburgh.
Textual Information Extraction from Legal Resources using GATE

Workshop Applying Human Language Technology to the Law

A workshop at
ICAIL 2011: The Thirteenth International Conference on Artificial Intelligence and Law

Applying Human Language Technology to the Law (AHLTL 2011)

June 10, 2011
University of Pittsburgh School of Law
Overview:
Over the last decade there have been dramatic improvements in the effectiveness and accuracy of Human Language Technology (HLT), accompanied by a significant expansion of the HLT community itself. Over the same period, there have been widespread developments in web-based distribution and processing of legal textual information, e.g. cases, legislation, citizen information sources, etc. More recently, a growing body of research and practice has addressed a range of topics common to both the HLT and Artificial Intelligence and Law communities, including automated legal reasoning and argumentation, semantic information retrieval, cross and multi-lingual information retrieval, document classification, logical representations of legal language, dialogue systems, legal drafting, legal knowledge discovery and extraction, linguistically based legal ontologies, among others. Central to these shared topics is use of HLT techniques and tools for automating knowledge extraction from legal texts and for processing legal language.
The workshop has several objectives. The first objective is to broaden the research base by introducing HLT researchers to the materials and problems of processing legal language. The second objective is to introduce AI and Law researchers to up-to-date theories, techniques, and tools from HLT, which can be applied to legal language. And the third objective is to deepen the existing research streams. Altogether, the interactions among the researchers are expected to advance research and applications and foster interdisciplinary collaboration within the legal domain.
Context:
Over the last two years, there have been several workshops and tutorials on or relating to processing legal texts and legal language, demonstrating a significant surge of interest. There have been two workshops on Semantic processing of legal texts (SPLeT) held in conjunction with LREC (2008 in Marrakech, Morocco; and 2010 in Malta). At ICAIL 2009, there were two workshops, LOAIT ’09 – the 3rd Workshop on Legal Ontologies and Artificial Intelligence Techniques joint with the 2nd Workshop on Semantic Processing of Legal Texts and NALEA ’09 – Workshop on the Natural Language Engineering of Legal Argumentation: Language, Logic, and Computation. LOAIT ’09 focussed on Legal Knowledge Representation with particular emphasis on the issue of ontology acquisition from legal texts, while NALEA ’09 tackled issues related to legal argumentation. In 2009, the National Science Foundation sponsored a workshop Automated Content Analysis and the Law, which drew participants from computational linguistics and political science. Finally, at the Second Workshop on Controlled Natural Language (CNL 2010), there were several presentations related to legal language.
Intended Audience:
The intended audience would include both current members of the AI & law community who are interested in automated analysis of legal texts and corpora and, in addition, HLT researchers for whom analysis of legal texts would provide an opportunity for development and evaluation of HLT techniques. It is anticipated that participants would come from industry (e.g. The MITRE Corporation, Thomson/Reuters, Endeca, Lexis/Nexis, Oracle), the judiciary in the US and Europe, national organisations (e.g. the US National Institute of Standards and Technology, the US National Science Foundation, European Science Foundation, the UK Office of Public Sector Information), government security agencies, legal professionals, and academic HLT researchers.
Areas of Interest:
The workshop will focus on extraction of information from legal text, representations of legal language (ontologies and semantic translations), and dialogic aspects. While information extraction and retrieval are crucial areas, the workshop emphasises syntactic, semantic, and dialogic aspects of legal information processing.

Building legal resources: terminologies, ontologies, corpora.

Ontologies of legal texts, including subareas such as ontology acquisition, ontology customisation, ontology merging, ontology extension, ontology evolution, lexical information, etc.

Information retrieval and extraction from legal texts.

Semantic annotation of legal texts.

Multilingual aspects of legal text semantic processing.

Legal thesauri mapping.

Automatic Classification of legal documents.

Automated parsing and translation of natural language arguments into a logical formalism.

Linguistically-oriented XML mark up of legal arguments.

Computational theories of argumentation that are suitable to natural language.

Controlled language systems for law.

Name matching and alias detection.

Dialogue protocols and systems for legal discussion.

Workshop Schedule

9:00 Opening remarks

The Role of HLT in High-end Search and the Persistent Need for Advanced HLT Technologies

Lexical vs. Surface Features in Deceptive Language Analysis

Legal Thesauri Reuse. An Experiment with the U.S. Code of Federal Regulations

11:00 Break

Towards the intelligent processing of non-expert generated content: mapping web 2.0 data with ontologies in the domain of consumer mediation

Formal Models of Sentences in Dutch Law

Eunomos, a legal document management system based on legislative XML and ontologies (Position paper)

From Spelling Checkers to Robot Judges? Some Implications of Normativity in Language Technology and AI and Law

13:15 Lunch

Workshop Location
To be announced.
Author Guidelines:

ICAIL CFP

AHLTL 2011, an EasyChair site

Publication:

The papers from the workshop are available from here.

Webpage:

Applying Human Language Technology to the Law

Important Dates:

~~Paper submission deadline: DEADLINE FOR SUBMISSIONS EXTENDED TO APRIL 10 by 00:00 EST~~

~~Acceptance notification sent: 15 April 2011~~

Final version deadline: 23 May 2011

Workshop date: 10 June 2011

Contact Information:

Primary contact: Adam Wyner, adam@wyner.info

Secondary contact: Karl Branting, lbranting@mitre.org

Program Committee Co-Chairs:

Adam Wyner (University of Liverpool, UK)

Karl Branting (The MITRE Corporation, USA)

Program Committee:

Kevin Ashley (University of Pittsburgh, USA)

Johan Bos (University of Rome, Italy)

Sherri Condon (The MITRE Corporation, USA)

Jack Conrad (Thomson Reuters, USA)

Enrico Francesconi (ITTIG-CNR, Florence, Italy)

Ben Hachey (Macquarie University, Australia)

Alessandro Lenci (Università di Pisa, Italy)

Leonardo Lesmo (Università di Torino, Italy)

Emile de Maat (University of Amsterdam, Netherlands)

Thorne McCarty (Rutgers University, USA)

Marie-Francine Moens (Catholic University of Leuven, Belgium)

Simonetta Montemagni (ILC-CNR, Italy)

Raquel Mochales Palau (Catholic University of Leuven, Belgium)

Craig Pfeifer (The MITRE Corporation, USA)

Wim Peters (University of Sheffield, United Kingdom)

Paulo Quaresma (Universidade de Évora, Portugal)

Mike Rosner (University of Malta, Malta)

Tony Russell-Rose (Endeca, United Kingdom)

Erich Schweighofer (Universität Wien, Austria)

Rolf Schwitter (Macquarie University, Australia)

Manfred Stede (University of Potsdam, Germany)

Mihai Surdeanu (Stanford University, USA)

Daniela Tiscornia (ITTIG-CNR, Italy)

Radboud Winkels (University of Amsterdam, Netherlands)

Jonathan Zeleznikow (Victoria University, Australia)

Proceedings and Program for Workshop on Modelling Legal Cases and Legal Rules

in conjunction with JURIX 2010
December 15, 2010
Department of Computer Science, Ashton Building, Room 310
University of Liverpool, Liverpool, United Kingdom
Workshop Proceedings
Workshop Program
Session I

Visualising the argumentation structure of an expert witness report with Rationale (extended abstract)

Analyzing open source license compatibility issues with Carneades

Emotional argumentation schemes in legal cases (short position paper)

16:00-16:30 Tea
Session II

MOSONG, a fuzzy logic model of trade mark similarity

Visualising legal case-based reasoning argumentation schemes

Say “cheese”: natural kinds, deontic logic and European Court of Justice decision C-210\/89

For general information, see JURIX 2010
By Adam Wyner
Distributed under the Creative Commons
Attribution-Non-Commercial-Share Alike 2.0

Legal Know-How Workshop Presentations

December 10, 2010, I gave a presentation at the International Society for Knowledge Organisation’s meeting on Legal Know-How. It was an interesting meeting, where I got the opportunity to present my work to members of the legal profession, hear what law firms are doing about knowledge management, and make some good new contacts.
The slides of all the talks, including mine, are available:
ISKO-UK Legal Know-How meeting
In a couple of weeks, ISKO will also add mp3s of the talks, so one can see the slides and hear the talks. Nice way to do things, as remarks and narration are almost more crucial than the slides themselves.
By Adam Wyner
Distributed under the Creative Commons
Attribution-Non-Commercial-Share Alike 2.0

Call for Papers: JURIX 2010 Workshop on Modelling Legal Cases and Legal Rules

I am organising a workshop at JURIX 2010
Modelling Legal Cases and Legal Rules
As part of the Jurix 2010 conference in Liverpool UK, we will hold a Workshop on Modelling Legal Cases and Legal Rules. This workshop is a follow on from successful workshops at Jurix 2007 and ICAIL 2009.
Legal cases and legal rules in common law contexts have been modelled in a variety of ways over the course of research in AI and Law to support different styles of reasoning for a variety of problem-solving contexts, such as decision-making, information retrieval, teaching, etc. Particular legal topic areas and cases have received wide coverage in the AI and Law literature including wild animals (e.g. Pierson v. Post, Young v. Hitchens, and Keeble v. Hickeringill), intellectual property (e.g. Mason v. Jack Daniel Distillery), and evidence (e.g. the Rijkbloem case). As well, some legal rules have been widely discussed, such as legal argument schemes (e.g. Expert Testimony) or rules of evidence (see Walton 2002). However, other areas have been less well covered. For example, there appears to be less research on modelling legal cases in civil law contexts; investigation of taxonomies and ontologies of legal rules would support abstraction and formalisation (see Sherwin 2009); additional legal rules could be brought under the scope of investigation, such as those bearing on criminal assault or causes of action.
The aim of this workshop is to provide a forum in which researchers can present their research on modelling legal cases and legal rules.
Papers are solicited that model a particular legal case or a small set of legal rules. Authors are free to choose the case or set of legal rules and analyse them according to the authors’ preferred model of representation; any theoretical discussion should be grounded in or exemplified by the case or rules at hand. Papers should make clear what are the particular distinctive features of their approach and why these features are useful in modelling the chosen case or rules. The workshop is an opportunity for authors to demonstrate the benefits of their approach and for group discussions to identify useful overlapping features as well as aspects to be further explored and developed.
Format of papers and submission guidelines
Full papers should not be more than 10 pages long and should be submitted in PDF format. It is suggested that the conference style files are used for formatting (see IOS Press site). All papers should provide:

A summary of the case or legal rules.
An overview of the representation technique, or reference to a full description of it.
The representation itself.
Discussion of any significant features.

Short position papers are also welcome from those interested in the topic but who do not wish to present a fully represented case or elaborate discussion of a set of legal rules; the short position papers can outline ideas, sketch directions of research, summarise or reflect on previously published work that has addressed the topic. A short position paper should be not more than five pages, giving a clear impression of what would be presented.
All submissions should be emailed as a PDF attachment to the workshop organiser, Adam Wyner, at: adam@wyner.info.
Programme Committee (Preliminary)

Kevin Ashley, University of Pittsburgh, USA
Katie Atkinson, University of Liverpool, UK
Floris Bex, University of Dundee, UK
Trevor Bench-Capon, University of Liverpool, UK
Tom Gordon, Fraunhofer, FOKUS, Germany
Robert Richards, Seattle, Washington, USA
Giovanni Sartor, European University Institute, Italy
Burkhard Schafer, Edinburgh Law School, Scotland
Douglas Walton, University of Windsor, Canada

Organisation
Organiser of this workshop is Adam Wyner, University of Liverpool, UK. You can contact the workshop organiser by sending an email to adam@wyner.info
Dates
Paper submission: Friday, November 5, 2010
Accepted Notification: Friday, November 12, 2010
Workshop Registration: Friday, November 19, 2010
December 15th, 2010 Jurix Workshops/Tutorials
December 16th-17th, 2010 Jurix 2010 Main Conference
By Adam Wyner
Distributed under the Creative Commons
Attribution-Non-Commercial-Share Alike 2.0

Presentation at Legal Know-how Workshop, Nov. 10, 2010

I have been invited to make a presentation on Textual information extraction and ontologies for legal case-based reasoning at a Legal Know-how Workshop, which is an industry oriented event organised by the International Society for Knowledge Management UK.
Date: 10 November 2010
Time: 13:30-19:00
Venue: University College London
Medical Sciences Building
A. V. Hill Lecture Theatre
Gower Street
London, WC1E 6BT
See the workshop website for registration fee (either free or under £25) and booking.
This will be a very interesting opportunity to hear from and talk with industry consultants and experts about the latest developments in legal knowledge management. My thanks to Stella Dextre Clarke of ISKO-UK for organising the event and inviting me to take part.

Programme

13:30	Registration
14:00	Welcome from ISKO-UK by Stella Dextre Clarke
14:05	Legal knowledge – the practitioner’s viewpoint Melanie Farquharson, 3Kites Consulting This session will focus on the practical situations in which lawyers look for knowledge in order to deliver legal services to their clients. It will identify some typical ‘use cases’ and consider ways in which knowledge can be delivered to the practitioner – even without them having to look for it.
14:35	Why lawyers need taxonomies – adventures in organising legal knowledge Kathy Jacob & Lynley Barker, Pinsent Masons LLP; Graham Barbour & Mark Fea, LexisNexis This presentation will cover the practical issues encountered by a law firm in its quest to improve findability of one of its key resources – knowledge and information. We will discuss our approach to building taxonomies, the tools and processes deployed and how we anticipate our taxonomy will be applied and consumed by lawyers and publishers. The LexisNexis part of the presentation will focus on the challenges of building and applying legal taxonomies to suit the breadth and depth of content they provide online. It will also examine ways in which taxonomies can be surfaced in the user interface and help to drive compelling functionality that improves the user’s search experience.
15:20	Taxonomy management at Clifford Chance Mats Bergman, Clifford Chance This talk will describe how taxonomy management works in practice at Clifford Chance. As an increasing number of core knowledge resources are making use of the same set of firm-wide taxonomies, the increased interdependencies necessitate the implementation of a controlled process for updating the taxonomies. A simple governance model will be presented. Some thoughts will follow on the evolution of taxonomy development within a larger organisation and the current challenge of using social tagging in conjunction with controlled vocabularies.
15:50	Refreshments (Lower Refectory)
16:20	Textual information extraction and ontologies for legal case-based reasoning Adam Wyner, University of Liverpool This talk gives a brief overview of current developments and prospects in two related areas of the legal semantic web for legal cases – textual information extraction and ontologies. Textual information extraction is a process of automatically annotating and extracting textual information from the legal case base (precedents), thereby identifying elements such as participants, the roles the participants play, the factors which were considered in arriving at a decision, and so on. The information is valuable not only for search (to find applicable precedents), but also to populate an ontology for legal case-based reasoning. An ontology is a formal representation of key aspects of the knowledge of legal professionals with which we can reason (e.g. given an assertion that something is a legal case, we can infer other properties) and with respect to which we can write rules (e.g. reasoning using case factors to arrive at a legal decision). Since it is expensive to manually populate an ontology (meaning to read cases and input the data into the ontology), we use textual information extraction to automatically populate the ontology. We conclude with an appeal for open source, collaborative development of legal knowledge systems among partners in academia, industry, and government.
17:00	Collaboration across boundaries Gwenda Sippings & Gerard Bredenoord, Linklaters LLP In this presentation, we will look at approaches to managing legal know-how in a major global law firm. We will describe several boundaries that we have to consider when organising our know-how, including boundaries between professionals, countries, internal and external resources and the well debated boundary between information and knowledge. We will also share some of the ways in which we are making our know-how available to the fee earners and other professionals in the firm, using social and technological solutions.
17:35	Reconciling the taxonomy needs of different users Derek Sturdy, Tikit Knowledge Services The last decade has seen the development of a substantial number of legal know-how and knowledge databases. It has also shown up a serious question on whether the metadata, and especially the taxonomies, that are applied to the various knowledge items, should be tailored to the particular needs of end-users, or whether, so to speak, "one size can fit all". In particular, this talk will discuss the overlapping, but discrete, needs of those using knowledge resources primarily for legal drafting and document production, and of those conducting legal research, and will address the relative value today, (as opposed to in 2000), of the effort put into internal metadata creation for those two sorts of end-users.

By Adam Wyner
Distributed under the Creative Commons
Attribution-Non-Commercial-Share Alike 2.0

Paper accepted at JURIX 2010

My colleague Wim Peters and I have had our paper
Lexical Semantics and Expert Legal Knowledge towards the Identification of Legal Case Factors
accepted for presentation at JURIX 2010. The list of accepted papers is here. The paper will appear in the proceedings, but it is available by clicking on the paper title above.
Abstract
Legal case factors are textually represented facts which are represented in reported legal case decisions. Precedent decisions contribute to the decision of a case under consideration. As textually represented facts, factors linguistically encode semantic properties and relationships among the entities which can be leveraged to identify and extract the legal case factors from decisions. We integrate legal and linguistic resources in a text analysis tool with which we annotate textual passages. Using annotations tailored to legal case factors, the legal researcher can rapidly zero in on textual spans which represent specific combinations of factors, participants, and semantic properties which bear on who played what role with respect to a factor. The research reports progress on the development of a tool.
Shortlink to this page.
By Adam Wyner
Distributed under the Creative Commons
Attribution-Non-Commercial-Share Alike 2.0

Semantic Processing of Legal Texts Workshop

In this post you will find information on the Semantic Processing of Legal Texts workshop, held in conjunction with the Language Resources and Evaluation Conference. Below please find a link to the conference, information on the workshop, and a program for the conference.
LREC
Language Resources and Evaluation Conference, May 17-23, Malta.
LREC 2010 Workshop on
SEMANTIC PROCESSING OF LEGAL TEXTS (SPLeT-2010)
23 May 2010, Malta
Workshop Description
The legal domain represents a primary candidate for web-based information distribution, exchange and management, as testified by the numerous e-government, e-justice and e-democracy initiatives worldwide. The last few years have seen a growing body of research and practice in the field of Artificial Intelligence and Law which addresses a range of topics: automated legal reasoning and argumentation, semantic and cross-language legal information retrieval, document classification, legal drafting, legal knowledge discovery and extraction, as well as the construction of legal ontologies and their application to the law domain. In this context, it is of paramount importance to use Natural Language Processing techniques and tools that automate and facilitate the process of knowledge extraction from legal texts.
Over the last two years, a number of dedicated workshops and tutorials specifically focusing on different aspects of semantic processing of legal texts has demonstrated the current interest in research on Artificial Intelligence and Law in combination with Language Resources (LR) and Human Language Technologies (HLT). The LREC 2008 Workshop on “Semantic processing of legal texts” was held in Marrakech, Morocco, on the 27th of May 2008. The JURIX 2008 Workshop on “the Natural Language Engineering of Legal Argumentation: Language, Logic, and Computation (NaLEA)”, which focused on recent advances in natural language engineering and legal argumentation. The ICAIL 2009 Workshops “LOAIT ’09 – the 3rd Workshop on Legal Ontologies and Artificial Intelligence Techniques joint with the 2nd Workshop on Semantic Processing of Legal Texts” and “NALEA’09 – Workshop on the Natural Language Engineering of Legal Argumentation: Language, Logic, and Computation”, the former focusing on Legal Knowledge Representation with particular emphasis on the issue of ontology acquisition from legal texts, the latter tackling issues related to legal argumentation and linguistic technologies.
To continue this momentum, a 3rd Workshop on “Semantic Processing of Legal Texts” is being organised at the LREC conference to bring to the attention of the broader LR/HLT community the specific technical challenges posed by the semantic processing of legal texts and also share with the community the motivations and objectives which make it of interest to researchers in legal informatics. The outcome of these interactions are expected to advance research and applications and foster interdisciplinary collaboration within the legal domain.
The main goals of the workshop are to provide an overview of the state-of-the-art in legal knowledge extraction and management, to explore new research and development directions and emerging trends, and to exchange information regarding legal LRs and HLTs and their applications.
Areas of Interest
The workshop will focus on the topics of the automatic extraction of information from legal texts and the structural organisation of the extracted knowledge. Particular emphasis will be given to the crucial role of language resources and human language technologies. Papers are on, but not limited to, the following topics:

Building legal resources: terminologies, ontologies, corpora

Ontologies of legal texts, including subareas such as ontology acquisition, ontology customisation, ontology merging, ontology extension, ontology evolution, lexical information, etc.

Information retrieval and extraction from legal texts

Semantic annotation of legal texts

Legal text processing

Multilingual aspects of legal text semantic processing

Legal thesauri mapping

Automatic Classification of legal documents

Logical analysis of legal language

Automated parsing and translation of natural language arguments into a logical formalism

Linguistically-oriented XML mark up of legal arguments

Dialogue protocols for argumentation

Legal argument ontology

Computational theories of argumentation that are suitable to natural language

Controlled language systems for law

Workshop Chairs

Enrico Francesconi (Istituto di Teoria e Tecniche dell’Informazione Giuridica of CNR, Florence, Italy)

Simonetta Montemagni (Istituto di Linguistica Computazionale of CNR, Pisa, Italy)

Wim Peters (Natural Language Processing Research Group, University of Sheffield, UK)

Adam Wyner (Department of Computer Science, University College London, UK)

Program Committee

Johan Bos (University of Rome, Italy)

Danièle Bourcier (Humboldt Universität, Berlin, Germany)

Thomas R. Bruce (Cornell Law School, Ithaca, NY, USA)

Pompeu Casanovas (Institut de Dret i Tecnologia, UAB, Barcelona, Spain)

Alessandro Lenci (Dipartimento di Linguistica, Università di Pisa, Pisa, Italy)

Leonardo Lesmo (Dipartimento di Informatica, Università di Torino, Torino, Italy)

Raquel Mochales Palau (Catholic University of Leuven, Belgium)

Paulo Quaresma (Universidade de Évora, Portugal)

Erich Schweighofer (Universität Wien, Rechtswissenschaftliche Fakultät, Wien, Austria)

Manfred Stede (University of Potsdam, Germany)

Daniela Tiscornia (Istituto di Teoria e Tecniche dell’Informazione Giuridica of CNR, Florence, Italy)

Tom van Engers (Leibniz Center for Law, University of Amsterdam, Netherlands)

Stephan Walter (Euroscript, Luxembourg S.a.r.l.)

Radboud Winkels (Leibniz Center for Law, University of Amsterdam, Netherlands)

Program

14:30-14:45 Welcome and introduction

14:45-15:10
A Description Language for Content Zones of German Court Decisions
Florian Kuhn

15:10-15:35
Controlling the language of statutes and regulations for semantic processing
Stefan Hoefler and Alexandra Bünzli

15:35-16:00
Named entity recognition in the legal domain for ontology population
Mírian Bruckschen, Caio Northfleet, Douglas da Silva, Paulo Bridi, Roger Granada, Renata Vieira, Prasad Rao and Tomas Sander

16:00-16:30
Coffee break

16:30-16:55
Legal Claim Identification: Information Extraction with Hierarchically Labeled Data
Mihai Surdeanu, Ramesh Nallapati and Christopher Manning

16:55-17:20
On the Extraction of Decisions and Contributions from Summaries of French Legal IT Contract Cases
Manuel Maarek

17:20-17:45
Towards Annotating and Extracting Textual Legal Case Factors
Adam Wyner and Wim Peters

17:45-18:10
Legal Rules Learning based on a Semantic Model for Legislation
Enrico Francesconi

Information Extraction of Legal Case Features with Lists and Rules

In this post, we show how legal case features can be annotated using lists and rules in GATE. By features, we mean a range of detailed information that may be relevant to searching for cases or extracting information such as the parties, the other legal professionals involved (judges, lawyers, etc), location, decision, case citation, legislation, and so on. In a forthcoming related post, we discuss how to use an ontology to annotate cases. We have some background discussion of case based reasoning Information Extraction of Legal Case Factors. (See introductory notes on this and related posts.)
Features of cases
Legal cases contain a wealth of detailed information such as:

Case citation.
Names of parties.
Roles of parties, meaning plaintiff or defendant.
Sort of court.
Names of judges.
Names of attorneys.
Roles of attorneys, meaning the side they represent.
Final decision.
Cases cited.
Relation of precedents to current case.
Case structural features such as sections.
Nature of the case, meaning using keywords to classify the case in terms of subject (e.g. criminal assault, intellectual property, ….)

With respect to these features, one would want to make a range of queries (using some appropriate query language).

In what cases has company X been a defendant?
In what cases has attorney Y worked for company X, where X was a defendant?
What are the final decisions for judge Z?
If the case concerns criminal assault, was a weapon used?

We initially based our work on Bransford-Koons Ph.D. Thesis 2005, commenting on, adapting, and adding to it. We used cases from California Criminal Courts which were used in that work since the lists and rules are highly specific.
Output
We have the following sample outputs from our lists and rules applied to People v. Coleman, 117 Cal App. 2d 565. In the first figure, we find the address, court district, citation, case name, counsels for each side, and the roles. There are aspects which need to be further cleaned up, but this gives a flavour of the annotations.
Case Features I
In the second figure, we focus on additional information such as structural sections (e.g. Opinion), the name of the judge, and terms having a bearing on criminal assault and weapons.
Case Features II
In the final figure, we identify the decision.
Case Features III
GATE
In the archive, we have the application, lists, JAPE rules, and graphics. The lists.def file in this archive are associated with the various other lists. The JAPE rules may have different names from what is found in the application and discussed below, but (so far as we understand), this should make no difference in the functionality.
Lists
Gazetteer lists which were used are the following; these are lists contained in a master list labelled DSAGaz. We samples and comment below.

lists.def. The gazetteer list which contains the lists below. When importing this along with the standard ANNIE list, this list is renamed in the application.
attack_words.lst. Actions that can be construed as attacks such as hit, hitting, throw, thrown, threw,….
intention.lst. Terms for intention such as intend, intends, intending,…, expect, expects,….
judgements.lst. Terms related to judgment such as granted, denied, reversed, overturned, remanded,….
judgeindicator.lst. The indicator J.. This is a problematic indicator if it is part of an individual’s name.
criminal_assault.lst. Terms related to assault such as assault, violent injury, ability,…. It is unclear just how cohesive this set of terms is.
legal_appellate_districts.lst. A list of appellate districts such as Fifth Appellate District, Fifth Dist.,….
legal_casenames.lst. Terms that can be used to indicate case names such as v., In Re,
legal_counselnames.lst. Terms for counselor titles such as Attorney General, Deputy Public Defender,….
legal_general.lst. Terms for footnotes or numbering sections such as fn., No.,….
legal_opinion_sections.lst. Terms for sections of legal opinion such as concurring, counsel, dissenting, opinion,….
legal_coa.lst. Terms for causes of action such as aggravated assault, assault, breaking and entering, burglary, robbery,….
legal_code_citations.lst. Code citation information such as Civ. Code, Penal Code,….
us_district_abb_01.lst. Abbreviations for legal districts such as Cal., P., Wis.,….
us_context_abb_01.lst. Abbreviations for participant roles such as App., Rptr,….
legal_citations.lst. Abbreviations for citations and related to districts such as Cal.2d, Cal.App. 3d,….
legal_parties.lst. Terms for legal roles such as amicus curie, appellant, appellee, counsel, defendant, plaintiff, victim, witness,….
lower_courts.lst. Phrases for other courts such as Municipal Court of, Superior Court of,….
possible_weapons.lst. A list of items that could be weapons such as automobile, bat, belt,….
weapons.lst. A list of items that are weapons such as assault rifle, axe, club, fist, gun,….

Discussion of Lists
We used some of the lists directly from Bransford-Koons 2005, but they are clearly in need of reconstruction and extension. A general problem is that the lists are defined for US case law and particularly the California district courts. Thus, we cannot simply apply the lists to different jurisdictions, e.g. the United Kingdom; the lists and rules must be relativised to different contexts. More technically, lists have alternative graphical (capital or lower case) or morphological forms, which would be better addressed using a Flexible Gazetteer. In addition, it is unclear how one could bound the range of relevant terms appropriately and give them interpretations that are relevant to the context; in general, a lexicon or ontology could give us a better list of terms, but we must find some means to construe them as need be in the legal context. For example, we have a range of attack action terms such as hit, hitting, throw, thrown, threw,….; in some contexts these actions need not be construed as attack, e.g. baseball. Some means needs to be found to ascribe the appropriate interpretation in context. A related issue is whether we must list all alternative forms of some terms (also taking into consideration spaces) or whether we can better write JAPE rules; this is relevant for the list of appellate districts, where we find both abbreviations and alternative elements of information as in Fifth Appellate District, Fifth Appellate District Div 1, and Fifth Appellate District, Division 1. Along these lines, we would prefer a systematic means to relate abbreviations to the terms they abbreviate. In our view, more general solutions are better than specific ones which list information; lists ought to be contain arbitrary information, while JAPE rules construct systematic information.
JAPE Rules
Given the lists, we have JAPE rules to annotate the relevant portions of text.

AppellantCounsel: annotates the appellant counsel.
RespondentCounsel: annotates the respondent counsel.
DSACounsellor: annotates counsels.
SectionsTerm: annotates sections relative to the list of section terms.
CaseRoleGeneral
DSACaseName2: annotates the case name.
DSACaseName: annotates the case name.
DSACaseCit: annotates the case citation.
CriminalAssault: annotates terms for criminal assault.
CauseOfAction: annotates for causes of action.
AttackTerm: annotates attack terms.
AppellateDistrict: annotates districts of courts.
DecisionStatement: annotates a sentence as the decision statement.
JudgementTerm: annotates terms related to judgement.
JudgeName: annotates the names of judges.
JudgeInd: annotates the judge name indicator.
IntentTerm: annotates terms of intent.

Discussion
Some of these rules annotate sentences, while others annotate entities with respect to some property. Some of the rules don’t work quite as well as we would wish and could stand further refinement such as the rule for the roles of counsels; the solution we have is rather ad hoc. Nonetheless, as a first pass, the lists and rules give some indication of what is possible.
Order of application

Document Reset PR
RegexSentenceSplitter
ANNIE English Tokeniser
ANNIE POS Tagger
MorphologicalAnalyzer
DSAGaz
AnnieGaz
Flexible Gazetteer
NPChunker
ANNIE NE Transducer
IntentTerm
JudgeInd
JudgeName
JudgementTerm
DecisionStatement
Weapons
AppellateDistrict
AttackTerm
CauseOfAction
CriminalAssault
DSACaseCit
DSACaseName
DSACaseName2
DSACaseNameAZW
CaseRoleGeneral
SectionsTerm
DSACounsellor
RespondentCounsel
AppellantCounsel

Discussion
Despite the limitations, this gives some useful, preliminary results which can easily be built upon. Moreover, we know of no other public, open system of annotating case elements (or factors).
By Adam Wyner
Distributed under the Creative Commons
Attribution-Non-Commercial-Share Alike 2.0