Legal Informatics Start-up from Stanford University

The Stanford Daily, an online newspaper with news from Stanford University, reports the creation of a spin-off, start-up company Lex Machina which is the result of collaboration between the Law School and Department of Computer Science at Stanford. The focus of the company is to make intellectual property litigation more transparent; it covers patent infringement, copyright, trademark, antitrust, and certain trade secret lawsuits. There are commercial and non-commercial services.
This is an interesting development, particularly in terms of the collaboration between a law school and department of computer science. I hope it is the first of many, and I look forward to learning more about the company and system.

CFP: Workshop on Semantic Processing of Legal Texts

LREC 2010 Workshop on
SEMANTIC PROCESSING OF LEGAL TEXTS (SPLeT-2010)
CALL FOR PAPERS

23 May 2010, Malta
Workshop description
The legal domain represents a primary candidate for web-based information distribution, exchange and management, as testified by the numerous e-government, e-justice and e-democracy initiatives worldwide. The last few years have seen a growing body of research and practice in the field of Artificial Intelligence and Law which addresses a range of topics: automated legal reasoning and argumentation, semantic and cross-language legal information retrieval, document classification, legal drafting, legal knowledge discovery and extraction, as well as the construction of legal ontologies and their application to the law domain. In this context, it is of paramount importance to use Natural Language Processing techniques and tools that automate and facilitate the process of knowledge extraction from legal texts.
With the last two years, a number of dedicated workshops and tutorials specifically focussing on different aspects of semantic processing of legal texts has demonstrated the current interest in research on Artificial Intelligence and Law in combination with Language Resources (LR) and Human Langugage Technologies (HLT). The LREC 2008 Workshop on “Semantic processing of legal texts” was held in Marrakech, Morocco, on the 27th of May 2008. The JURIX 2008 Workshop on “the Natural Language Engineering of Legal Argumentation: Language, Logic, and Computation (NaLEA)”, which focussed on recent advances in natural language engineering and legal argumentation. The ICAIL 2009 Workshops “LOAIT ’09 – the 3rd Workshop on Legal Ontologies and Artificial Intelligence Techniques joint with the 2nd Workshop on Semantic Processing of Legal Texts” and “NALEA’09 – Workshop on the Natural Language Engineering of Legal Argumentation: Language, Logic, and Computation”, the former focussing on Legal Knowledge Representation with particular emphasis on the issue of ontology acquisition from legal texts, the latter tackling issues related to legal argumentation and linguistic technologies.
To continue this momentum, a 3rd Workshop on “Semantic Processing of Legal Texts” is being organised at the Language Resources and Evaluation Conference to bring to the attention of the broader LR/HLT community the specific technical challenges posed by the semantic processing of legal texts and also share with the community the motivations and objectives which make it of interest to researchers in legal informatics. The outcome of these interactions are expected to advance research and applications and foster interdisciplinary collaboration within the legal domain.
The main goals of the workshop are to provide an overview of the state-of-the-art in legal knowledge extraction and management, to explore new research and development directions and emerging trends, and to exchange information regarding legal LRs and HLTs and their applications.
Areas of Interest
The workshop will focus on the topics of the automatic extraction of information from legal texts and the structural organisation of the extracted knowledge. Particular emphasis will be given to the crucial role of language resources and human language technologies.
Papers are invited on, but not limited to, the following topics:

  • Building legal resources: terminologies, ontologies, corpora
  • Ontologies of legal texts, including subareas such as ontology acquisition, ontology customisation, ontology merging, ontology extension, ontology evolution, lexical information, etc.
  • Information retrieval and extraction from legal texts
  • Semantic annotation of legal texts
  • Legal text processing
  • Multilingual aspects of legal text semantic processing
  • Legal thesauri mapping
  • Automatic Classification of legal documents
  • Logical analysis of legal language
  • Automated parsing and translation of natural language arguments into a logical formalism
  • Linguistically-orientied XML mark up of legal arguments
  • Dialogue protocols for argumentation
  • Legal argument ontology
  • Computational theories of argumentation that are suitable to natural language
  • Controlled language systems for law.

Submissions
Submissions are solicited from researchers working on all aspects of semantic processing of legal texts. Authors are invited to submit papers describing original completed work, work in progress, interesting problems, case studies or research trends related to one or more of the topics of interest listed above. The final version of the accepted papers will be published in the Workshop Proceedings.
Short or full papers can be submitted. Short papers are expected to present new ideas or new visions that may influence the direction of future research, yet they may be less mature than full papers. While an exhaustive evaluation of the proposed ideas is not necessary, insight and in-depth understanding of the issues is expected. Full papers should be more well developed and evaluated. Short papers will be reviewed the same way as full papers by the Program Committee and will be published in the Workshop Proceedings.
Full paper submissions should not exceed 10 pages, short papers 6 pages; both should be typeset using a font size of 11 points. Style files will be made available by LREC for the camera-ready versions of accepted papers. Papers should be submitted electronically, no later than February 10, 2010. The only accepted format for submitted papers is Adobe PDF. Submission will be electronic using START paper submission software available at
SPLeT 2010 Workshop
Note that when submitting a paper through the START page, authors will be kindly asked to provide relevant information about the resources that have been used for the work described in their paper or that are the outcome of their research. In this way, authors will contribute to the LREC2010 Map, our new feature for LREC 2010. For further information on this initiative, please refer to
LREC2010 Map of Language Resources
Important Dates
Paper submission deadline: 10 February 2010
Acceptance notification sent: 5 March 2010
Final version deadline: 21 March 2010
Workshop date: 23 May 2010
Workshop Chairs

  • Enrico Francesconi (Istituto di Teoria e Tecniche dell’Informazione Giuridica of CNR, Florence, Italy)
  • Simonetta Montemagni (Istituto di Linguistica Computazionale of CNR, Pisa, Italy)
  • Wim Peters (Natural Language Processing Research Group, University of Sheffield, UK)
  • Adam Wyner (Department of Computer Science, University College London, UK)

Address any queries regarding the workshop to: lrec10_legalWS@ilc.cnr.it
Program Committee

  • Johan Bos (University of Rome, Italy)
  • Danièle Bourcier (Humboldt Universität, Berlin, Germany)
  • Thomas R. Bruce (Cornell Law School, Ithaca, NY, USA)
  • Pompeu Casanovas (Institut de Dret i Tecnologia, UAB, Barcelona, Spain)
  • Alessandro Lenci (Dipartimento di Linguistica, Università di Pisa, Pisa, Italy)
  • Leonardo Lesmo (Dipartimento di Informatica, Università di Torino, Torino, Italy)
  • Raquel Mochales Palau (Catholic University of Leuven, Belgium)
  • Paulo Quaresma (Universidade de Évora, Portugal)
  • Erich Schweighofer (Universität Wien, Rechtswissenschaftliche Fakultät, Wien, Austria)
  • Manfred Stede (University of Potsdam, Germany)
  • Daniela Tiscornia (Istituto di Teoria e Tecniche dell’Informazione Giuridica of CNR, Florence, Italy)
  • Tom van Engers (Leibniz Center for Law, University of Amsterdam, Netherlands)
  • Radboud Winkels (Leibniz Center for Law, University of Amsterdam, Netherlands)

Open Source Information Extraction: Data, Lists, Rules, and Development Environment

Open source software development and standards are widely discussed and practiced. It has led to a range of useful applications and services. GATE is one such example.
However, one quickly learns that open source can easily mean open to a certain extent: GATE is open source, but the applications and additional functionalities that are developed with respect to GATE often are not. On the one hand, this makes perfect sense as the applications and functionalities are added value, labour intensive, and so on. On the other hand, the scientific community cannot verify, validate, or build on prior work unless the applications and functionalities are available. This can also hinder commercial development since closed development impedes progress, dissemination, and a common framework from which everyone benefits. It also does not recognise the fundamentally experimental aspect of information extraction. In contrast, the rapid growth and contributions of the natural (Biology, Physics, Chemistry, etc) or theoretical (Maths) sciences could only have occurred in an open, transparent development environment.
I advocate open source information extraction where an information extraction result can only be reported if it can be independently verified and built on by members of the scientific community. This means that the following must be made available concurrent with the report of the result:

  • Data and corpora
  • Lists (e.g. gazetteers)
  • Rules (e.g. JAPE rules)
  • Any additional processing components (e.g. information extraction to schemes or XSLT)
  • Development environment (e.g. GATE)

In other words, the results must be independently reproducible in full. The slogan is:

No publication without replicability.

This would:

  • Contribute to the research community and build on past developments.
  • Support teaching and learning.
  • Encourage interchange. The Semantic Web chokes on different formats.
  • Return academic research to the common (i.e. largely taxpayer funded) good rather than owned by the researcher or university. If someone needs to keep their work private, they should work at a company.
  • Lead to distributive, collaborative research and results, reducing redundancy and increasing the scale and complexity of systems.

Solving the knowledge bottleneck, particularly in relation to language, has not and likely will not be solved by any one individual or research team. Open source information extraction will, I believe, make greater progress toward addressing it.
Obviously, money must be made somewhere. One source is public funding, including contributions from private organisations which see a value in building public infrastructure. Another source is, like other open source software, systems, or other public information, to make money “around” the free material by adding non-core goods, services, or advertising.
By Adam Wyner
Distributed under the Creative Commons
Attribution-Non-Commercial-Share Alike 2.0

Research on Argumentation at the Leibniz Center for Law in Amsterdam

I have a 3 month research job at the Leibniz Center for Law, University of Amsterdam starting February 1 and working with Tom van Engers. This is part of the IMPACT project:

IMPACT is an international project, partially funded by the European Commission under the 7th framework programme. It will conduct original research to develop and integrate formal, computational models of policy and arguments about policy, to facilitate deliberations about policy at a conceptual, language-independent level. To support the analysis of policy proposals in an inclusive way which respects the interests of all stakeholders, research on tools for reconstructing arguments from data resources distributed throughout the Internet will be conducted. The key problem is translation from these sources in natural language to formal argumentation structures, which will be input for automatic reasoning.

My role will be to set up a Ph.D. research project concerning the key problem. This is based on an unsuccessful larger research proposal that I made with Tom. I’ll be organising the database, the literature, some of the software, and outlining the approach the student would take. I’ll make notes on the progress as it happens.
I’m looking forward to living for a while in Amsterdam, working with Tom and my other colleagues at the center — Joost Breuker, Rinke Hoekstra, Emile de Maat. The Netherlands also has a very lively Department of Argumentation Theory. As an added bonus, my colleagues from Linguistics, Susan Rothstein and Fred Landman, are in Amsterdam on sabbatical. Will be a very interesting and fun period.