On ICAIL 2011 Discussion on Legal Corpus Development and Text Analytics

In this note, I point to various parts of a discussion on developing and analysing legal textual data raised at ICAIL 2011. Please feel free to add comments to this document (or to me in person, by email, on your blog and linked to this, etc), which I can then add to the post (I’m very happy to attribute contributions). The intention is to stimulate discussion on these matters to help the community of researchers move ahead on common interests.
Corpus Development
Unlike the situation from several years ago, we have accessible sources of large corpora of legal textual information. The World Legal Information Institutes provide free, independent and non-profit access to worldwide law. For example, one can go to the US site and download cases: United States v Grant [1961] USCA9 19; 286 F.2d 157 (19 January 1961); one can request zipped files or screen scrap cases. The LIIs have introduced standardised references and formats for cases. There are boolean and regex searches.
From the contacts that I have had (e.g. in the US and UK), the LIIs would be very happy to collaborate with academic researchers in the analysis of their data and in keeping with their primary mission. In particular, developing tools that can be integrated and deployed with their platforms might be a way to go, thereby addressing significant platform and dissemination issues.
Another source of corpora is public.resource.org, which distributes a range of corpora covering legislation, codes, and cases.
Analysis and Annotation
There are a range of issues about information retrieval and extraction. Others can speak about IR, statistical, machine learning approaches. What I know better is annotation, whether fully or semi automatic and manual. Here we have issues about what to annotate and how. Some low level information is unproblematic (e.g. entities of a range of sorts, sections, and sentiment); higher level information (e.g. factors) might be more complex. I have some suggestions for annotations for low level information; a good starting point for factors are the CATO factors, though there is a general issue about how to extend factor identification to other domains (CATO factors are specific for intellectual property).
One general problem with analysis is that different researchers might use different tools in their work and just report the results. This means results are not interchangeable, which is particularly problematic with annotation work. If a common ‘framework’ tool is used and some consensus is developed about (at least) low level annotation types, then work can proceed more collaboratively, transparently, and reproducibly. One can develop a more forceful argument for researchers (public service bodies and information providers) to promote such an open development methodology (among them are justification and traceability, see Wyner and Peters 2010 and David Lewis’s ICAIL 2011 keynote address on related points). General Architecture for Text Engineering is an open framework for text processing modules.
There are ‘open’ systems for text annotation — Open Calais and Open Up platform’s data enrichment service from The Stationery Office. However, there are intellectual property issues that need to be considered.
Another general issue is how to carry out manual annotation, for example to build gold standards, which are required for machine learning systems. There has been significant progress, for example, with TeamWare, which provides for curated, web-based annotation tools along with annotation analysis (e.g. inter-annotator agreement). For a short tutorial (for an experiment) on using TeamWare for annotation of some legal case factors, see Web-based Annotation Support for the Law. Wim Peters and I proposed to law school faculty to use this tool to support their student exercises for first and second year students since these exercises often require identifying and extracting information from cases. Wim and I think integrating annotation exercises into legal e-learning could both help to develop large annotated sets of data and to serve an important educational purpose. See our paper about some of these points and proposals.
Research Questions
Large corpora can be formed, tools can be applied to them, but for fund raising, the community needs to develop a range of motivating research questions and use cases. Asides from questions pursued in the AI and Law community, we might consult further with public bodies (National Center for State Courts and similar), legal information service providers (Lexis-Nexis, ThomsonReuters, Practical Law Company, law societies, political scientists, etc. The kinds of answers we look for partially guide how we structure not only the corpora, but moreso the annotations.
Funding Opportunities
Digging into Data and the Request for Proposals, but the due date is June 16 (I had been working on a proposal, but needed better research questions to hold local interest). Though the deadline is too soon to submit a proposal, it does demonstrate a widespread interest in funding bodies in the development and analysis of large corpora in the humanities and social sciences. The other obvious funding sources are national (US, UK, French, etc) and international (EU and Digging into Data).
By Adam Wyner

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Workshop Applying Human Language Technology to the Law

A workshop at
ICAIL 2011: The Thirteenth International Conference on Artificial Intelligence and Law

Applying Human Language Technology to the Law (AHLTL 2011)

June 10, 2011
University of Pittsburgh School of Law
Overview:
Over the last decade there have been dramatic improvements in the effectiveness and accuracy of Human Language Technology (HLT), accompanied by a significant expansion of the HLT community itself. Over the same period, there have been widespread developments in web-based distribution and processing of legal textual information, e.g. cases, legislation, citizen information sources, etc. More recently, a growing body of research and practice has addressed a range of topics common to both the HLT and Artificial Intelligence and Law communities, including automated legal reasoning and argumentation, semantic information retrieval, cross and multi-lingual information retrieval, document classification, logical representations of legal language, dialogue systems, legal drafting, legal knowledge discovery and extraction, linguistically based legal ontologies, among others. Central to these shared topics is use of HLT techniques and tools for automating knowledge extraction from legal texts and for processing legal language.
The workshop has several objectives. The first objective is to broaden the research base by introducing HLT researchers to the materials and problems of processing legal language. The second objective is to introduce AI and Law researchers to up-to-date theories, techniques, and tools from HLT, which can be applied to legal language. And the third objective is to deepen the existing research streams. Altogether, the interactions among the researchers are expected to advance research and applications and foster interdisciplinary collaboration within the legal domain.
Context:
Over the last two years, there have been several workshops and tutorials on or relating to processing legal texts and legal language, demonstrating a significant surge of interest. There have been two workshops on Semantic processing of legal texts (SPLeT) held in conjunction with LREC (2008 in Marrakech, Morocco; and 2010 in Malta). At ICAIL 2009, there were two workshops, LOAIT ’09 – the 3rd Workshop on Legal Ontologies and Artificial Intelligence Techniques joint with the 2nd Workshop on Semantic Processing of Legal Texts and NALEA ’09 – Workshop on the Natural Language Engineering of Legal Argumentation: Language, Logic, and Computation. LOAIT ’09 focussed on Legal Knowledge Representation with particular emphasis on the issue of ontology acquisition from legal texts, while NALEA ’09 tackled issues related to legal argumentation. In 2009, the National Science Foundation sponsored a workshop Automated Content Analysis and the Law, which drew participants from computational linguistics and political science. Finally, at the Second Workshop on Controlled Natural Language (CNL 2010), there were several presentations related to legal language.
Intended Audience:
The intended audience would include both current members of the AI & law community who are interested in automated analysis of legal texts and corpora and, in addition, HLT researchers for whom analysis of legal texts would provide an opportunity for development and evaluation of HLT techniques. It is anticipated that participants would come from industry (e.g. The MITRE Corporation, Thomson/Reuters, Endeca, Lexis/Nexis, Oracle), the judiciary in the US and Europe, national organisations (e.g. the US National Institute of Standards and Technology, the US National Science Foundation, European Science Foundation, the UK Office of Public Sector Information), government security agencies, legal professionals, and academic HLT researchers.
Areas of Interest:
The workshop will focus on extraction of information from legal text, representations of legal language (ontologies and semantic translations), and dialogic aspects. While information extraction and retrieval are crucial areas, the workshop emphasises syntactic, semantic, and dialogic aspects of legal information processing.

    Building legal resources: terminologies, ontologies, corpora.
    Ontologies of legal texts, including subareas such as ontology acquisition, ontology customisation, ontology merging, ontology extension, ontology evolution, lexical information, etc.
    Information retrieval and extraction from legal texts.
    Semantic annotation of legal texts.
    Multilingual aspects of legal text semantic processing.
    Legal thesauri mapping.
    Automatic Classification of legal documents.
    Automated parsing and translation of natural language arguments into a logical formalism.
    Linguistically-oriented XML mark up of legal arguments.
    Computational theories of argumentation that are suitable to natural language.
    Controlled language systems for law.
    Name matching and alias detection.
    Dialogue protocols and systems for legal discussion.

Workshop Schedule

      9:00 Opening remarks
      9:15 Jack Conrad (invited speaker). The Role of HLT in High-end Search and the Persistent Need for Advanced HLT Technologies
      10:00 Tommaso Fornaciari and Massimo Poesio. Lexical vs. Surface Features in Deceptive Language Analysis
      10:30 Nuria Casellas, Joan-Josep Vallbé and Thomas Bruce. Legal Thesauri Reuse. An Experiment with the U.S. Code of Federal Regulations
      11:00 Break
      11:15 Meritxell Fernández-Barrera and Pompeu Casanovas. Towards the intelligent processing of non-expert generated content: mapping web 2.0 data with ontologies in the domain of consumer mediation
      11:45 Emile De Maat and Radboud Winkels. Formal Models of Sentences in Dutch Law
      12:15 Guido Boella, Llio Humphreys, Leon Van Der Torre and Piercarlo Rossi. Eunomos, a legal document management system based on legislative XML and ontologies (Position paper)
      12:45 Anna Ronkainen. From Spelling Checkers to Robot Judges? Some Implications of Normativity in Language Technology and AI and Law
      13:15 Lunch

Workshop Location
To be announced.
Author Guidelines:

    The workshop solicits full papers and position papers. Authors are welcome to submit tentative, incremental, and exploratory studies which examine HLT issues distinctive to the law and legal applications. Papers not accepted as full papers may be accepted as short research abstracts. Submissions will be evaluated by the program committee. For information on submission details (length, format, notion of position paper, etc) see the ICAIL 2011 conference information:
    ICAIL CFP
    Submissions should be submitted electronically in PDF to the EasyChair site by the deadline (see important dates below):
    AHLTL 2011, an EasyChair site

Publication:

    Selected papers are to be invited to be revised and submitted to a special edition of the AI and Law journal, edited by Adam Wyner and Karl Branting.
    The papers from the workshop are available from here.

Webpage:

    Applying Human Language Technology to the Law

Important Dates:

    Paper submission deadline: DEADLINE FOR SUBMISSIONS EXTENDED TO APRIL 10 by 00:00 EST
    Acceptance notification sent: 15 April 2011
    Final version deadline: 23 May 2011
    Workshop date: 10 June 2011

Contact Information:

    Primary contact: Adam Wyner, adam@wyner.info
    Secondary contact: Karl Branting, lbranting@mitre.org

Program Committee Co-Chairs:

    Adam Wyner (University of Liverpool, UK)
    Karl Branting (The MITRE Corporation, USA)

Program Committee:

    Kevin Ashley (University of Pittsburgh, USA)
    Johan Bos (University of Rome, Italy)
    Sherri Condon (The MITRE Corporation, USA)
    Jack Conrad (Thomson Reuters, USA)
    Enrico Francesconi (ITTIG-CNR, Florence, Italy)
    Ben Hachey (Macquarie University, Australia)
    Alessandro Lenci (Università di Pisa, Italy)
    Leonardo Lesmo (Università di Torino, Italy)
    Emile de Maat (University of Amsterdam, Netherlands)
    Thorne McCarty (Rutgers University, USA)
    Marie-Francine Moens (Catholic University of Leuven, Belgium)
    Simonetta Montemagni (ILC-CNR, Italy)
    Raquel Mochales Palau (Catholic University of Leuven, Belgium)
    Craig Pfeifer (The MITRE Corporation, USA)
    Wim Peters (University of Sheffield, United Kingdom)
    Paulo Quaresma (Universidade de Évora, Portugal)
    Mike Rosner (University of Malta, Malta)
    Tony Russell-Rose (Endeca, United Kingdom)
    Erich Schweighofer (Universität Wien, Austria)
    Rolf Schwitter (Macquarie University, Australia)
    Manfred Stede (University of Potsdam, Germany)
    Mihai Surdeanu (Stanford University, USA)
    Daniela Tiscornia (ITTIG-CNR, Italy)
    Radboud Winkels (University of Amsterdam, Netherlands)
    Jonathan Zeleznikow (Victoria University, Australia)

Proceedings and Program for Workshop on Modelling Legal Cases and Legal Rules

in conjunction with JURIX 2010
December 15, 2010
Department of Computer Science, Ashton Building, Room 310
University of Liverpool, Liverpool, United Kingdom
Workshop Proceedings
Workshop Program
Session I

    14:30-14:35
    Welcome and Introductory remarks
    14:35-15:00
    Steven van Driel (Utrecht University) and Henry Prakken (Utrecht University and University of Groningen)
    Visualising the argumentation structure of an expert witness report with Rationale (extended abstract)
    15:00-15:25
    Thomas F. Gordon (Fraunhofer FOKUS)
    Analyzing open source license compatibility issues with Carneades
    15:25-15:40
    Martyn Lloyd-Kelly, Adam Wyner, and Katie Atkinson (University of Liverpool)
    Emotional argumentation schemes in legal cases (short position paper)
    15:40-16:00
    Short informal remarks

16:00-16:30 Tea
Session II

    16:30-16:55
    Anna Ronkainen (University of Helsinki)
    MOSONG, a fuzzy logic model of trade mark similarity
    16:55-17:20
    Adam Wyner and Trevor Bench-Capon (University of Liverpool)
    Visualising legal case-based reasoning argumentation schemes
    17:20-17:45
    Burkhard Schafer (University of Edinburgh)
    Say “cheese”: natural kinds, deontic logic and European Court of Justice decision C-210\/89
    17:45-18:00
    Short informal remarks

For general information, see JURIX 2010
By Adam Wyner
Distributed under the Creative Commons
Attribution-Non-Commercial-Share Alike 2.0

Call for Papers: JURIX 2010 Workshop on Modelling Legal Cases and Legal Rules

I am organising a workshop at JURIX 2010
Modelling Legal Cases and Legal Rules
As part of the Jurix 2010 conference in Liverpool UK, we will hold a Workshop on Modelling Legal Cases and Legal Rules. This workshop is a follow on from successful workshops at Jurix 2007 and ICAIL 2009.
Legal cases and legal rules in common law contexts have been modelled in a variety of ways over the course of research in AI and Law to support different styles of reasoning for a variety of problem-solving contexts, such as decision-making, information retrieval, teaching, etc. Particular legal topic areas and cases have received wide coverage in the AI and Law literature including wild animals (e.g. Pierson v. Post, Young v. Hitchens, and Keeble v. Hickeringill), intellectual property (e.g. Mason v. Jack Daniel Distillery), and evidence (e.g. the Rijkbloem case). As well, some legal rules have been widely discussed, such as legal argument schemes (e.g. Expert Testimony) or rules of evidence (see Walton 2002). However, other areas have been less well covered. For example, there appears to be less research on modelling legal cases in civil law contexts; investigation of taxonomies and ontologies of legal rules would support abstraction and formalisation (see Sherwin 2009); additional legal rules could be brought under the scope of investigation, such as those bearing on criminal assault or causes of action.
The aim of this workshop is to provide a forum in which researchers can present their research on modelling legal cases and legal rules.
Papers are solicited that model a particular legal case or a small set of legal rules. Authors are free to choose the case or set of legal rules and analyse them according to the authors’ preferred model of representation; any theoretical discussion should be grounded in or exemplified by the case or rules at hand. Papers should make clear what are the particular distinctive features of their approach and why these features are useful in modelling the chosen case or rules. The workshop is an opportunity for authors to demonstrate the benefits of their approach and for group discussions to identify useful overlapping features as well as aspects to be further explored and developed.
Format of papers and submission guidelines
Full papers should not be more than 10 pages long and should be submitted in PDF format. It is suggested that the conference style files are used for formatting (see IOS Press site). All papers should provide:

  • A summary of the case or legal rules.
  • An overview of the representation technique, or reference to a full description of it.
  • The representation itself.
  • Discussion of any significant features.

Short position papers are also welcome from those interested in the topic but who do not wish to present a fully represented case or elaborate discussion of a set of legal rules; the short position papers can outline ideas, sketch directions of research, summarise or reflect on previously published work that has addressed the topic. A short position paper should be not more than five pages, giving a clear impression of what would be presented.
All submissions should be emailed as a PDF attachment to the workshop organiser, Adam Wyner, at: adam@wyner.info.
Programme Committee (Preliminary)

  • Kevin Ashley, University of Pittsburgh, USA
  • Katie Atkinson, University of Liverpool, UK
  • Floris Bex, University of Dundee, UK
  • Trevor Bench-Capon, University of Liverpool, UK
  • Tom Gordon, Fraunhofer, FOKUS, Germany
  • Robert Richards, Seattle, Washington, USA
  • Giovanni Sartor, European University Institute, Italy
  • Burkhard Schafer, Edinburgh Law School, Scotland
  • Douglas Walton, University of Windsor, Canada

Organisation
Organiser of this workshop is Adam Wyner, University of Liverpool, UK. You can contact the workshop organiser by sending an email to adam@wyner.info
Dates
Paper submission: Friday, November 5, 2010
Accepted Notification: Friday, November 12, 2010
Workshop Registration: Friday, November 19, 2010
December 15th, 2010 Jurix Workshops/Tutorials
December 16th-17th, 2010 Jurix 2010 Main Conference
By Adam Wyner
Distributed under the Creative Commons
Attribution-Non-Commercial-Share Alike 2.0

Paper accepted at JURIX 2010

My colleague Wim Peters and I have had our paper
Lexical Semantics and Expert Legal Knowledge towards the Identification of Legal Case Factors
accepted for presentation at JURIX 2010. The list of accepted papers is here. The paper will appear in the proceedings, but it is available by clicking on the paper title above.
Abstract
Legal case factors are textually represented facts which are represented in reported legal case decisions. Precedent decisions contribute to the decision of a case under consideration. As textually represented facts, factors linguistically encode semantic properties and relationships among the entities which can be leveraged to identify and extract the legal case factors from decisions. We integrate legal and linguistic resources in a text analysis tool with which we annotate textual passages. Using annotations tailored to legal case factors, the legal researcher can rapidly zero in on textual spans which represent specific combinations of factors, participants, and semantic properties which bear on who played what role with respect to a factor. The research reports progress on the development of a tool.
Shortlink to this page.
By Adam Wyner
Distributed under the Creative Commons
Attribution-Non-Commercial-Share Alike 2.0

Computational Argumentation on the Web with Natural Language

Over the last four years, I have been working on topics related to computational argumentation on the web using natural language. Some of my publications and previous postings reflect these interests. Along with my colleague Tom van Engers, I prepared two research proposals on this topic, which are here presented as technical reports of our work. These reports are also relevant to the current IMPACT project, which addresses many of the same themes.
There is a short paper (five pages) which outlines key ideas, but has little in the way of discussion or background discussion. There is a long paper (28 pages) which goes into the proposal in much more depth.
Comments and discussion on these documents are very welcome.
By Adam Wyner
Distributed under the Creative Commons
Attribution-Non-Commercial-Share Alike 2.0

Legal Case Ontology OWL file and Case Graphic

In conjunction with the paper by Rinke Hoekstra and I (as previously noted on this blog), we are making the ontology and a graphic of Popov v. Hayashi available:
Legal Case Ontology v9
This is the OWL file. It was developed using Protege version 4, a knowledge acquisition and editing tool.
As we have not previously made this a publicly available ontology, consider it a beta release. Comments very welcome.
The graphic is the ontological representation of Popov v. Hayashi; it is a pdf file.
Ontological Graphic for Popov v. Hayashi
By Adam Wyner
Distributed under the Creative Commons
Attribution-Non-Commercial-Share Alike 2.0

New Article on Legal Case Ontologies in Knowledge Engineering Review

Rinke Hoekstra and I have a paper which will appear in Knowledge Engineering Review.
A Legal Case OWL Ontology with an Instantiation of Popov v. Hayashi
Adam Wyner and Rinke Hoekstra
To appear in Knowledge Engineering Review
Abstract
The paper provides an OWL ontology for legal cases with an instantiation of the legal case Popov v. Hayashi. The ontology makes explicit the conceptual knowledge of the legal case domain, supports reasoning about the domain, and can be used to annotate the text of cases, which in turn can be used to populate the ontology. A populated ontology is a case base which can be used for information retrieval, information extraction, and case based reasoning. The ontology contains not only elements of indexing the case (e.g. the parties, jurisdiction, and date), but as well elements used to reason to a decision such as argument schemes and the components input to the schemes. We use the Protege ontology editor and knowledge acquisition system, current guidelines for ontology development, and tools for visual and linguistic presentation of the ontology.
By Adam Wyner
Distributed under the Creative Commons
Attribution-Non-Commercial-Share Alike 2.0

Recent Paper Submissions

During my time at the Leibniz Center for Law working on the IMPACT, I and my colleagues Tom van Engers and Kiavash Bahreini prepared and submitted three papers to conferences and workshops. The drafts of the papers are linked below along with the abstracts. Comments welcome.
A Framework for Enriched, Controlled On-line Discussion Forums for e-Government Policy-making
Adam Wyner and Tom van Engers
Submitted to eGOV 2010
Abstract
The paper motivates and proposes a framework for enriched on-line discussion forums for e-government policy-making, where pro and con statements for positions are structured, recorded, represented, and evaluated. The framework builds on current technologies for multi-threaded discussion lists by integrating modes, natural language processing, ontologies, and formal argumentation frameworks. With modes other than the standard reply “comment”, users specify the semantic relationship between a new statement and the previous statement; the result is an argument graph. Natural language processing with a controlled language constrains the domain of discourse, eliminates ambiguity and unclarity, allows a logical representation of statements, and facilitates information extraction. However, the controlled language is highly expressive and natural . Ontologies represent the knowledge of the domain. Argumentation frameworks evaluate the argument graph and generate sets of consistent statements. The output of the system is a rich and articulated representation of a set of policy statements which supports queries, information extraction, and inference
From Policy-making Statements to First-order Logic
Adam Wyner, Tom van Engers, and Kiavash Bahreini
Submitted to eGOVIS 2010
Abstract
Within a framework for enriched on-line discussion forums for e-government policy-making, pro and con statements for positions are input, structurally related, then logically represented and evaluated. The framework builds on current technologies for multi-threaded discussion, natural language processing, ontologies, and formal argumentation frameworks. This paper focuses on the natural language processing of statements in the framework. A small sample policy discussion is presented. We adopt and apply a controlled natural language (Attempto Controlled English) to constrain the domain of discourse, eliminate ambiguity and unclarity, allow a logical representation of statements which supports inference and consistency checking, and facilitate information extraction. Each of the polity statements is automatically translated into rst-order logic. The result is logical representation of the policy discussion which we can query, draw inferences (given ground statements), test for consistency, and extract detailed information.
Towards Web-base Mass Argumentation in Natural Language
Adam Wyner and Tom van Engers
Submitted to EKAW 2010
Abstract
Within the artificial intelligence community, argumentation has been studied for quite some years now. Despite progress, the field has not yet succeeded in creating support tools that members of the public could use to contribute their views to discussions of public policy. One important reason for that is that the input statements of participants in policy-making discussions are put forward in natural language, while translating the statements into the formal models used by argumentation scientists is cumbersome. These formal models can be used to automatically reason with, query, or transmit domain knowledge using web-based technologies. Making this knowledge explicit, formal, and expressed in a language which a machine can process is a labour, time, and knowledge intensive task. To make such translation and it requires expertise that most participants in policy-making debates do not have. In this paper we describe an approach with which we aim at contributing to a solution of this knowledge acquisition bottle-neck. We propose a novel, integrated methodology and framework which adopts and adapts existing technologies. We use semantic wikis which support mass, collaborative, distributive, dynamic knowledge acquisition. In particular, ACEWiki incorporates NLP tools, enabling linguistically competent users to enter their knowledge in natural language, while yielding a logical form that is suitable for automated processing. In the paper we will explain how we can extend the ACEWiki and augment it with argumentation tools which elicit knowledge from users, making implicit information explicit, and generate subsets of consistent knowledge bases from inconsistent knowledge bases. To a set of consistent propositions, we can apply automated reasoners, allowing users to draw inferences and make queries. The methodology and framework take a fragmentary, incremental development approach to knowledge acquisition in complex domains.
By Adam Wyner
Distributed under the Creative Commons
Attribution-Non-Commercial-Share Alike 2.0

Semantic Processing of Legal Texts Workshop

In this post you will find information on the Semantic Processing of Legal Texts workshop, held in conjunction with the Language Resources and Evaluation Conference. Below please find a link to the conference, information on the workshop, and a program for the conference.
LREC
Language Resources and Evaluation Conference, May 17-23, Malta.
LREC 2010 Workshop on
SEMANTIC PROCESSING OF LEGAL TEXTS (SPLeT-2010)

23 May 2010, Malta
Workshop Description
The legal domain represents a primary candidate for web-based information distribution, exchange and management, as testified by the numerous e-government, e-justice and e-democracy initiatives worldwide. The last few years have seen a growing body of research and practice in the field of Artificial Intelligence and Law which addresses a range of topics: automated legal reasoning and argumentation, semantic and cross-language legal information retrieval, document classification, legal drafting, legal knowledge discovery and extraction, as well as the construction of legal ontologies and their application to the law domain. In this context, it is of paramount importance to use Natural Language Processing techniques and tools that automate and facilitate the process of knowledge extraction from legal texts.
Over the last two years, a number of dedicated workshops and tutorials specifically focusing on different aspects of semantic processing of legal texts has demonstrated the current interest in research on Artificial Intelligence and Law in combination with Language Resources (LR) and Human Language Technologies (HLT). The LREC 2008 Workshop on “Semantic processing of legal texts” was held in Marrakech, Morocco, on the 27th of May 2008. The JURIX 2008 Workshop on “the Natural Language Engineering of Legal Argumentation: Language, Logic, and Computation (NaLEA)”, which focused on recent advances in natural language engineering and legal argumentation. The ICAIL 2009 Workshops “LOAIT ’09 – the 3rd Workshop on Legal Ontologies and Artificial Intelligence Techniques joint with the 2nd Workshop on Semantic Processing of Legal Texts” and “NALEA’09 – Workshop on the Natural Language Engineering of Legal Argumentation: Language, Logic, and Computation”, the former focusing on Legal Knowledge Representation with particular emphasis on the issue of ontology acquisition from legal texts, the latter tackling issues related to legal argumentation and linguistic technologies.
To continue this momentum, a 3rd Workshop on “Semantic Processing of Legal Texts” is being organised at the LREC conference to bring to the attention of the broader LR/HLT community the specific technical challenges posed by the semantic processing of legal texts and also share with the community the motivations and objectives which make it of interest to researchers in legal informatics. The outcome of these interactions are expected to advance research and applications and foster interdisciplinary collaboration within the legal domain.
The main goals of the workshop are to provide an overview of the state-of-the-art in legal knowledge extraction and management, to explore new research and development directions and emerging trends, and to exchange information regarding legal LRs and HLTs and their applications.
Areas of Interest
The workshop will focus on the topics of the automatic extraction of information from legal texts and the structural organisation of the extracted knowledge. Particular emphasis will be given to the crucial role of language resources and human language technologies. Papers are on, but not limited to, the following topics:

  • Building legal resources: terminologies, ontologies, corpora
  • Ontologies of legal texts, including subareas such as ontology acquisition, ontology customisation, ontology merging, ontology extension, ontology evolution, lexical information, etc.
  • Information retrieval and extraction from legal texts
  • Semantic annotation of legal texts
  • Legal text processing
  • Multilingual aspects of legal text semantic processing
  • Legal thesauri mapping
  • Automatic Classification of legal documents
  • Logical analysis of legal language
  • Automated parsing and translation of natural language arguments into a logical formalism
  • Linguistically-oriented XML mark up of legal arguments
  • Dialogue protocols for argumentation
  • Legal argument ontology
  • Computational theories of argumentation that are suitable to natural language
  • Controlled language systems for law
  • Workshop Chairs

  • Enrico Francesconi (Istituto di Teoria e Tecniche dell’Informazione Giuridica of CNR, Florence, Italy)
  • Simonetta Montemagni (Istituto di Linguistica Computazionale of CNR, Pisa, Italy)
  • Wim Peters (Natural Language Processing Research Group, University of Sheffield, UK)
  • Adam Wyner (Department of Computer Science, University College London, UK)
  • Program Committee

  • Johan Bos (University of Rome, Italy)
  • Danièle Bourcier (Humboldt Universität, Berlin, Germany)
  • Thomas R. Bruce (Cornell Law School, Ithaca, NY, USA)
  • Pompeu Casanovas (Institut de Dret i Tecnologia, UAB, Barcelona, Spain)
  • Alessandro Lenci (Dipartimento di Linguistica, Università di Pisa, Pisa, Italy)
  • Leonardo Lesmo (Dipartimento di Informatica, Università di Torino, Torino, Italy)
  • Raquel Mochales Palau (Catholic University of Leuven, Belgium)
  • Paulo Quaresma (Universidade de Évora, Portugal)
  • Erich Schweighofer (Universität Wien, Rechtswissenschaftliche Fakultät, Wien, Austria)
  • Manfred Stede (University of Potsdam, Germany)
  • Daniela Tiscornia (Istituto di Teoria e Tecniche dell’Informazione Giuridica of CNR, Florence, Italy)
  • Tom van Engers (Leibniz Center for Law, University of Amsterdam, Netherlands)
  • Stephan Walter (Euroscript, Luxembourg S.a.r.l.)
  • Radboud Winkels (Leibniz Center for Law, University of Amsterdam, Netherlands)
  • Program

  • 14:30-14:45 Welcome and introduction
  • 14:45-15:10
    A Description Language for Content Zones of German Court Decisions
    Florian Kuhn
  • 15:10-15:35
    Controlling the language of statutes and regulations for semantic processing
    Stefan Hoefler and Alexandra Bünzli
  • 15:35-16:00
    Named entity recognition in the legal domain for ontology population
    Mírian Bruckschen, Caio Northfleet, Douglas da Silva, Paulo Bridi, Roger Granada, Renata Vieira, Prasad Rao and Tomas Sander
  • 16:00-16:30
    Coffee break

  • 16:30-16:55
    Legal Claim Identification: Information Extraction with Hierarchically Labeled Data
    Mihai Surdeanu, Ramesh Nallapati and Christopher Manning
  • 16:55-17:20
    On the Extraction of Decisions and Contributions from Summaries of French Legal IT Contract Cases
    Manuel Maarek
  • 17:20-17:45
    Towards Annotating and Extracting Textual Legal Case Factors
    Adam Wyner and Wim Peters
  • 17:45-18:10
    Legal Rules Learning based on a Semantic Model for Legislation
    Enrico Francesconi