text mining – Page 2 – Logic Language Law Computing

Presentation at LEX Summer School 2012

I was a lecturer at the LEX Summer School 2012 in Ravenna, Italy on September 14, 2012.
San Vitali Mosaic, Ravenna, Italy

The school aims at providing knowledge of the most significant ICT standards emerging for legislation, judiciary, parliamentary and administrative documents. The course provides understanding of their impact in the different phases of the legislative and administrative process, awareness of the tools based on legal XML standards and of their constellations, and the ability to participate in the drafting and use of standard-compliant documents throughout law-making process. In particular we would like to create consciousness in the stakeholders in the legal domain about the benefits and the possibilities provided by the correct usage of Semantic Web technologies such as XML standards, ontologies, natural language processing techniques applied to legal texts, legal knowledge modelling and reasoning tools.

The zipped file contains the slides and some exercise material.
The first lecture (Part 1) introduces the general topic, some samples of results, and a discussion about crowdsourcing annotations in legal cases. The second lecture (Part 2) discusses the parsing and semantic representation of a fragment of the British Nationality Act. The class materials are used for an in class exercise about annotation.
Port of Classe mosaic
Shortlink to this page.
By Adam Wyner

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Papers at CMNA 2012 and AT 2012

Recent papers at two conferences. One is in the 12th workshop on Computational Models of Natural Argument (CMNA 2012), Montpellier, France. A second paper is in the 1st International Conference on Agreement Technologies (AT 2012), Dubrovnik, Croatia.
Questions, arguments, and natural language semantics
Adam Wyner
Abstract
Computational models of argumentation can be understood to bridge between human and automated reasoning. Argumentation schemes represent stereotypical, defeasible reasoning patterns. Critical questions are associated with argumentation schemes and are said to attack arguments. The paper highlights several issues with the current understanding of critical questions in argumentation. It provides a formal semantics for questions, an approach to instantiated argumentation schemes, and shows how the semantics of questions clarifies the issues. In this approach, questions do not attack schemes, though answers to questions might.
Bibtex
@INPROCEEDINGS{WynerCMNA2012,
author = {Adam Wyner},
title = {Questions, Arguments, and Natural Language Semantics},
booktitle = {Proceedings of the 12th Workshop on Computational Models of Natural Argumentation ({CMNA} 2012)},
year = {2012},
address = {Montpellier, France},
note = {To appear}}
Arguing from a Point of View
Adam Wyner and Jodie Schneider
Abstract
Evaluative statements, where some entity has a qualitative attribute, appear widespread in blogs, political discussions, and consumer websites. Such expressions can occur in argumentative settings, where they are the conclusion of an argument. Whether the argument holds depends on a the premises that express a user’s point of view. Where different users disagree, arguments may arise. There are several ways to represent users, e.g. by values and other parameters. The paper proposes models and argumentation schemes for evaluative expressions, where the arguments and attacks between arguments are relative to a user’s model.
Bibtex
@INPROCEEDINGS{WynerSchneider2012AT,
author = {Adam Wyner and Jodi Schneider},
title = {Arguing from a Point of View},
booktitle = {Proceedings of the First International Conference on Agreement Technologies},
year = {2012},
address = {Dubrovnick, Croatia},
note = {To appear}}
Shortlink to this page.
By Adam Wyner

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Presentation on Argument Mining at the London Text Analytic Meetup

On July 13 at Fizzback HQ in London, I presented a talk at the London Text Analytic Meetup on Argument Mining. The slides are available at the link below.
Comments on Natural Language and Argumentation
Adam Wyner
Abstract
Opinion and sentiment mining of web-based content are widely done to find out the views of users about consumer goods or politics, but the techniques rely on accrual, do not identify justification, and do not provide structure to support reasoning. Argument mining provides an articulated view of web-based content, identifying justifications, counterpoints, and structure for reasoning.
Two other papers were presented at the meetup.
One by Francesca Toni and Lucas Carstens from Imperial College:
Sentiment Analysis is concerned with differentiating opinionated text from factual text and, in the case of opinionated text, determine its polarity. With this paper, we present A-SVM, a system that tackles the discrimination of opinionated text from non-opinionated text with the help of Support Vector Machines (SVM). In a two-step process, SVM classifications are improved via arguments, acquired by means of a user feedback mechanism. The system has been used to investigate the merits of approaching Sentiment Analysis in a multi faceted manner by comparing straightforward Machine Learning techniques with this multimodal system architecture. All evaluations were executed using a purpose-built corpus of annotated text and its classification performance was compared to that of SVM. The classification of a test set of approximately 12,000 words yielded an increase in classification precision of 5.6%.
Another paper by Francesca Toni and Valentinos Evripidou from Imperial College
We describe a new argumentation method for analysing opinion exchanges between on-line users aiding them to draw informative, structured and meaningful information. Our method combines different factors, such as social support drawn from votes and attacking/supporting relations between opinions interpreted as abstract arguments. We show a prototype web application which puts into use this method to offer anintelligent business directory allowing users to engage in debate and aid them to extract the dominant, emerging public opinion.
By Adam Wyner

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Papers at COMMA 2012

At the 4th International Conference on Computational Models of Argumentation in Vienna, Austria, I have a short paper in the main conference and a paper in the demo session.
Semi-automated argumentative analysis of online product reviews
Adam Wyner, Jodi Schneider, Katie Atkinson, and Trevor Bench-Capon
Abstract
Argumentation is key to understanding and evaluating many texts. The arguments in the texts must be identified; using current tools, this requires substantial work from human analysts. With a rule-based tool for semi-automatic text analysis support, we facilitate argument identification. The tool highlights potential argumentative sections of a text according to terms indicative of arguments (e.g. suppose or therefore) and domain terminology (e.g. camera names and properties). The information can be used by an analyst to instantiate argumentation schemes and build arguments for and against a proposal. The resulting argumentation framework can then be passed to argument evaluation tools.
Bibtex
@INPROCEEDINGS{WynerEtAlCOMMA2012a,
author = {Adam Wyner and Schneider, Jodi and Katie Atkinson and Trevor Bench-Capon},
title = {Semi-Automated Argumentative Analysis of Online Product Reviews},
booktitle = {Proceedings of the 4th International Conference on Computational
Models of Argument ({COMMA} 2012)},
year = {2012},
note = {To appear},
}
Critiquing justifications for action using a semantic model: Demonstration
Adam Wyner, Katie Atkinson, and Trevor Bench-Capon
Abstract
The paper is two pages with no abstract.
Bibtex
@INPROCEEDINGS{WynerABCDemoCOMMA2012,
author = {Adam Wyner and Atkinson, Katie and Trevor Bench-Capon},
title = {Critiquing Justifications for Action Using a Semantic Model: Demonstration},
booktitle = {Proceedings of the 4th International Conference on Computational Models of Argument ({COMMA} 2012)},
year = {2012},
pages = {1-2},
note = {To appear},
}
Shortlink to this page.
By Adam Wyner

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Paper at CMN 2012

At the Language Resources and Evaluation Conference (LREC 2012) in Istanbul, Turkey, I participated in the Computational Models of Narrative workshop.
Arguments as Narratives
Adam Wyner
Abstract
Aspects of narrative coherence are proposed as a means to investigate and identify arguments from text. Computational analysis of argumentation largely focuses on representations of arguments that are either abstract or are constructed from a logical (e.g. propositional or first order) knowledge base. Argumentation schemes have been advanced for stereotypical patterns of defeasible reasoning. While we have well-formedness conditions for arguments in a first order language, namely the patterns for inference, the conditions for argumentation schemes is an open question, and the identification of arguments `in the wild’ is problematic. We do not understand the `source’ of rules from which inference follows; formally, well-formed `arguments’ can be expressed even with random sentences; moreover, argument indicators are sparse, so cannot be relied upon to identify arguments. As automated extraction of arguments from text increasingly finds important applications, it is pressing to isolate and integrate indicators of argument. To specify argument well-formedness conditions and identify arguments from unstructured text, we suggest using aspects of narrative coherence.
Slides for Arguments as Narratives
Bibtex
@INPROCEEDINGS{WynerCMN2012,
author = {Adam Wyner},
title = {Arguments as Narratives},
booktitle = {Proceedings of the Third Workshop on Computational Models of Narrative ({CMN} 2012)},
year = {2012},
editor = {Mark Finlayson},
pages = {178-180},
}
Shortlink to this page.
By Adam Wyner

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Papers at the Workshop on Semantic Processing of Legal Texts (SPLeT 2012)

Two short papers appear in the proceedings of LREC Workshop on SPLeT 2012 – Semantic Processing of Legal Texts. The papers are available on the links.
Problems and Prospects in the Automatic Semantic Analysis of Legal Texts – A Position Paper
Adam Wyner
Abstract
Legislation and regulations are expressed in natural language. Machine-readable forms of the texts may be represented as linked documents, semantically tagged text, or translation to a logic. The paper considers the latter form, which is key to testing consistency of laws, drawing inferences, and providing explanations relative to input. To translate laws to a machine-readable logic, sentences must be parsed and semantically translated. Manual translation is time and labour intensive, usually involving narrowly scoping the rules. While automated translation systems have made significant progress, problems remain. The paper outlines systems to automatically translate legislative clauses to a semantic representation, highlighting key problems and proposing some tasks to address them.
Semantic Annotations for Legal Text Processing using GATE Teamware
Adam Wyner and Wim Peters
Abstract
Large corpora of legal texts are increasing available in the public domain. To make them amenable for automated text processing, various sorts of annotations must be added. We consider semantic annotations bearing on the content of the texts – legal rules, case factors, and case decision elements. Adding annotations and developing gold standard corpora (to verify rule-based or machine learning algorithms) is costly in terms of time, expertise, and cost. To make the processes efficient, we propose several instances of GATE’s Teamware to support annotation tasks for legal rules, case factors, and case decision elements. We engage annotation volunteers (law school students and legal professionals). The reports on the tasks are to be presented at the workshop.
Shortlink to this page.
By Adam Wyner

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Crowdsourced Legal Case Annotation

A study in online, collaborative legal informatics
Adam Wyner, University of Aberdeen
Wim Peters, University of Sheffield
Daniel Katz, Michigan State University
— Introduction —
This is an academic research study on legal informatics (information processing of the law). The study uses an online, collaborative tool to crowdsource the annotation of legal cases. The task is similar to legal professionals’ annotation of cases. The result will be a public corpus of searchable, richly annotated legal cases that can be further processed, analysed, or queried for conceptual annotations.
Adam and Wim are computer scientists who are interested in language, law, and the Internet. Dan is an academic lawyer also interested in law and the Internet.
We are inviting people to participate in this collaborative task. This is a beta version of the exercise, and we welcome comments on how to improve it. Please read through this blog post, look at the video, and get in contact.
— Highlighting, Annotations, and Legal Case Briefs —
In reading, analysing, and preparing a summary of a legal case, law students and legal professionals annotate cases by highlighting and colour coding elements of the case to make for easy identification. Different elements are annotated: the holding, the parties, the facts, and so on. A sample image of annotations is:

Annotations for Case Citations, Legal Roles, Jurisdiction, Hearing Date

— Problem —
To analyse a legal case, legal professionals annotate the case into its constituent parts. The analysis is summarised in a case brief. However, the current approach is very limited:

Analysis is time-consuming and knowledge-intensive.

Case briefs may miss relevant information.

Case analyses and briefs are privately held.

Case analyses are in paper form, so not searchable over the Internet.

Current search tools are for text strings, not conceptual information. We want to search for concepts such as for the holdings by a particular judge and with respect to causes of action against a particular defendant. With annotated legal cases, we can enable conceptual search.

There is no capacity to systematically compare, contrast, and evaluate the work by different annotators. Consequently, the annotation task itself is not used as an opportunity to gain greater expertise in case analysis.

— Solution: Crowdsource Annotation —
We use an online legal case annotation tool and share the results to support:

Online search in legal cases for case details and concepts.

Semantic web applications and information extraction.

Crowd-source a legal case corpus.

Training and learning for legal case analysis.

The results of the study would be useful to:

Law school students learning case analysis.

Legal professionals in identifying relevant cases.

Researchers of legal informatics.

Law faculty in training students to analyse legal cases.

Broadly speaking, a corpus of analysed cases makes case law a public resource that democratises legal knowledge.
— Annotations: types and features —
To crowdsource conceptual annotations of legal cases, we use the General Architecture of Text Engineering (GATE) Teamware tool. Teamware is a web-based application that provides an annotator with a text to annotate and a list of annotations to use. The task is a web-based version of what legal analysts of cases already do.
We use familiar annotations for legal cases, divided (for ease of reference) into types and features. For example, we have a type Legal Roles and various features to select among, e.g. defendant. We are counting on you to have learned and used these annotations in the course of your legal study and practice.
You do not need to memorise the types and features as they will appear in the GATE Teamware tool. It may be handy to keep this webpage open so you can consult it or you could also print out the page.
The annotations we use are:
Argument For Party – arguments for a particular party, using the most general notion:

for Appellee, for Appellant, for Defendant, for Plaintiff.

Facts – legal and procedural facts:

Cause of Action – the specific legal theory upon which the plaintiff brings the suit.

Defenses raised by Defendant – the defendant defenses against the cause of action.

Legal Facts – the legally relevant facts of the case that are used in arguing the issues.

Remedy requested by Plaintiff – what the plaintiff asks the court to grant.

Indexes – various indicative information:

Case Citation – the citation of the particular case being annotated.

Court Address – the address of the court.

Hearing Date – the date of the hearing.

Judge Name – the names of the judge, annotated one at a time.

Jurisdiction – the legal jurisdiction of the case.

Issues – the issues before the court:

Procedural Issue – what the appellee claims that the lower court did wrong.

Substantive Issue – the point of law that is in dispute (legal facts have their own annotation).

Legal Roles – the role of the parties in the case:

Appellee, Appellee’s Lawyer, Appellant, Appellant’s Lawyer, Defendant, Defendant’s Lawyer, Plaintiff, Plaintiff’s Lawyer.

General – buyer/seller, employer/employee, landlord/tenant, etc.

Other – relevant information not covered by the other annotations.
Procedural History – the disposition of the case with respect to the lower court(s):

Appeal Information – who appealed and why they appealed.

Damages – the damages awarded by the lower court.

Lower Court Decision – the lower court’s decision.

Reasoning Outcomes – various parts of the legal decision:

Concurring Opinion.

Dicta – commentary about the judgement and holding, but not part of the rationale.

Dissenting Opinion.

Holding – the rule of law or legal principle that was applied in making the judgement. You can think about this as the new ground that the court is covering in this case. What legal rule(s) is the court developing or clarifying? The case can have more than one holding if there is more than one legal rule being considered. Note that a holding from a cited precedent is to be considered part of the rationale.

Judgement – Given the holding and the corresponding rationale for the holding, the judgement is the court’s final decision about the rights of the parties, the court’s response to a party’s request for relief, and bearing on prior decisions (e.g. affirmed, reversed, remanded, etc.).

Rationale – the court’s analysis of the issues and the reasons for the holding.

— Strategic Phases —
From previous experience and following discussions, we believe it is best if the annotations are grouped together and done in three phases. This allows the annotator to do simpler tasks first and to keep in mind a subset of the relevant annotations.

Phase I: Indexes and Legal Roles

Phase II: Procedural History and Reasoning Outcomes

Phase III: Facts and Issues

For the time being, we are not attending to annotations of Arguments for Party and Other.
— Collaborate —
Take a look at the instructional video below. If you wish to collaborate on the task, send an email to Adam Wyner – adam@wyner.info
In the email, please include brief information for:

Your name

Your professional affiliation, e.g. institution, company, firm…

Your role where you work

Your background as a legal professional

This will help us know who we are collaborating with; from the pool of candidates, we will select participants for this early study.
You will be sent a user name and password so you can login to Teamware.
We respect your privacy. We are only interested in data in the aggregate and will not reveal any personal data to third parties.
— Next —
We have an instructional video that you can open in a new tab or window and that uses QuickTime. It lasts about 14 minutes. This will give you a good idea of what you will be doing. The presenter is Adam Wyner. You can see this here:

Or follow the link on YouTube — Crowdsourcing Legal Case Annotation Instructional Video. Please view in a large (ok definition) or full screen (grainy definition) mode, which may need to be reloaded in YouTube.
There are additional points about using the tool in section below on questions, problems, and observations.
After reading this blog, viewing the instructional video, and receiving your username and password, you can login to begin annotating at — GATE Teamware
— Survey —
When you are done with your task, please answer the questions on the survey to give us feedback on your experience using the annotation tool. The survey is available below. You can scroll down and answer the questions. Don’t forget to hit the “Done” button to submit your responses, which will be very useful in helping us understand your experience and thoughts about using the tool:

Create your free online surveys with SurveyMonkey, the world’s leading questionnaire tool.

— What Then? —
We analyse the annotations from several annotators, comparing and contrasting them (interannotator agreement). This will show us similarities and differences in the understanding of the annotations and cases. As well, the results will help us develop a Gold Standard Corpus of legal cases, which are annotations of cases that annotators agree on. A Gold Standard is essential for information extraction and the development of advanced processing. We will publicly report the analysis of the exercise and make the annotated cases publicly available for re-use.
Once we have a better sense of how this study goes, we plan to roll out a larger version with more cases. And this is only the start….
— Questions, Problems, and Observations —
Thanks to participants for letting us know about their problems and sending their observations.
How easy is it to learn to use the tool? Take a look at the video to get a sense of this. With a little bit of practice, it is rather straightforward.
What if I don’t agree with some of your annotations or features? Write a comment or send us an email, and we will consider your comment. Try to be as clear and specific as you can. We are not lawyers, and we are dealing with a global community with local variation, so it is likely there will be some disagreement and variation.
Can I get the results of my annotations? Our approach is to make individual contributions to the whole. So, you will be able to access annotated cases after the exercise. There will be further information on how to work with the material.
How many cases must I do? You can do one or you can do as many as we have (not many in the beta project).
How much time will it take? About as long as it would take you to do a similar highlighting and annotation task with paper and markers.
What if I have a problem with using the tool or if the tool is buggy? Be patient and try to work with the tool. Sometimes things go wrong. Write a comment or send us an email, and we will try to advise. Note – we are only consumers of GATE Teamware, so are not responsible for the system.
How thoroughly should I annotate the cases? The more cases that are annotated fully and accurately, the better. Apply the same diligence as you would to thoroughly and carefully analyse cases with pen and paper. As you will be the beneficiary of the work of others, so too should you work to benefit them.
Do we track good annotators and bad annotators? We are interested in data in the aggregate, and are only interested in interannotator agreement and disagreement. This information will help us better understand differences in how the cases are understood and annotated. But, we can see how much time each person takes with each annotation task and measure how they perform against other annotators or a gold standard. If we have bad annotators, we will see this in the results; we would contact the annotator and see how best to improve the situation. As we noted above, we are not sharing information with third parties.
I cannot login with the username and password. Please let me know if you have this problem, and I will look into it.
I can login, but I cannot get the java webstart file to start. This is a tough problem to address over the internet. Some people have no problem, but some people are. Please let me know if you have this problem. Do check that you have followed the instructions (on blog and in movie).
I can login and start the annotation tool, but I cannot get the task. Please let me know, and I will look into it.
The text is too small and single spaced. At the moment, there is nothing we can do about this. We’ll try to keep this in mind for the future.
The highlighting tool is not easy to use. When I want to move from one annotated text to some new text, the tool doesn’t move to the new text. This is bit of a problem with the tool, which is not entirely reliable in the functionality. Try to play around with this to see what works for you. One strategy that I have found that improves performance is to annotate something. Then the annotation types appears in the upper right hand corner window among the list of annotations. Sometimes it is a good idea, when the problem occurs, is to click the annotations in that upper right hand corner window off and on (toggle them on and off). This seems to clear the system a bit so that one can go on to the next annotation. Give this a try. If you have problems, please let me know.
I found it very challenging. It is important to us to know this information to gauge how much text and the variety of annotations. We might reduce the number of annotations, breaking up the whole set into parts of the overall task.
Decision date is more important than hearing date, or at least should be provided in addition to hearing date. Probably this will be added to future iterations.
A participant, e.g. “Cone”, was originally a defendant, but was dismissed out before this appeal. I wonder if he should still be coded as “Defendant” or if he should be coded as an other role-holder. Good observation. I’ll have to consult with some lawyers further about this point.
There are sentences where the court introduced a fact and also appeared to reason using it. Is it right to code the whole sentence both as a legal fact and as a rationale. Yes, this is the way to handle this. Double annotations are always possible.
A similar problem occurred where the court offered a fact but also put a gloss on it as to its legal significance. Double annotations are always possible.
Some of the names of the categories were confusing or unclear. For example, using “Holding” for the name of the legal rule or principle was confusing (“Legal Rule” might be more intuitive). This is another point that we will need to consult further with other lawyers. There may also be some variation in terminology.
There is sometimes unclarity about role-players. A case involved a plaintiff, who was an appellee but also a cross-appellant, and a defendant who was thus an appellant and cross-appellee. These can be coded where on is plaintiff and appellee and the other defendant and appellant. But, they could have both been coded as appellee and appellant, given the existence of the cross appeal. Double (or more) annotating is fine.
Procedural History/Damages might be better framed as Procedural History/Remedies, as courts often provide injunctive relief or, as in this case, an accounting, as a remedy. This is another point that we will need to consult further with lawyers about terminology.
What if a case does not state any legal rules? Can implicit legal rules be annotated. For example, where novelty and non-obviousness are a sine qua non of a valid patent, one would not have known to mark some of the sentences as rationales. This isn’t a problem. If something is not in the case, then it is not annotated. We are not (yet) concerned with implicit information. But, if you know the implicit information, then annotate it.
How can I automatically search for and annotate the same string with the same annotation? In the instructional video, we wanted to keep the material short and to the point, so there are aspects of the annotation tool we did not cover. However, it is tedious to manually search for the same string and annotate it with the same annotation. Teamware’s Annotation Editor has a tool to support automatic search and annotation. To see how to do this, we have the video here:

How should I annotate holdings which may appear as holdings in cited cases and as part of the procedural history, as holdings in the current case, or as part of the rationale in the current case? This is an interesting and subtle point for us, and we will have to have a full consultation with lawyers to decide. But, for the time being, there can be no harm in multiple annotations, which we can then look at and work with later.
— Paper —
If you are interested in some of the ideas behind this project, please see our paper:
Semantic Annotations for Legal Text Processing using GATE Teamware
The paper will appear in May 2012 in the Proceedings of the LREC Conference Workshop on Semantic Processing of Legal Texts, Istanbul, Turkey. The exercise here is a version of the exercise proposed in the paper.
A shortlink to this blog page is:
http://wyner.info/LanguageLogicLawSoftware/?p=1315
— Thanks for collaborating! —
— If you have any questions, please submit a comment! —
— Update Note —
July 29, 2013 to reflect Dan Katz’s amended definitions for Holding. Updated in various ways July 12, 2013. The previous blog post of July 28, 2012 has been updated to note the participation of Dan Katz and his students of Michigan State University.
— Honour Role —
For the very first study, we would like to thank the following individuals who gave of their time and intelligence to carry out their tasks.

First

Second

By Adam Wyner

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

EXTENDED CFP – Workshop on Semantic Processing of Legal Texts (SPLeT 2012)

In conjunction with
Language Resources and Evaluation Conference 2012 (LREC 2012)
27 May, 2012
Istanbul, Turkey
REVISED SUBMISSION DEADLINE FOR WORKSHOP: 19 February 2012
Context
The legal domain represents a primary candidate for web-based information distribution, exchange and management, as testified by the numerous e-government, e-justice and e-democracy initiatives worldwide. The last few years have seen a growing body of research and practice in the field of Artificial Intelligence and Law which addresses a range of topics: automated legal reasoning and argumentation, semantic and cross-language legal information retrieval, document classification, legal drafting, legal knowledge discovery and extraction, as well as the construction of legal ontologies and their application to the law domain. In this context, it is of paramount importance to use Natural Language Processing techniques and tools that automate and facilitate the process of knowledge extraction from legal texts.
Since 2008, the SPLeT workshops have been a venue where researchers from the Computational Linguistics and Artificial Intelligence and Law communities meet, exchange information, compare perspectives, and share experiences and concerns on the topic of legal knowledge extraction and management, with particular emphasis on the semantic processing of legal texts. Within the Artificial Intelligence and Law community, there have also been a number of dedicated workshops and tutorials specifically focussing on different aspects of semantic processing of legal texts at conferences such as JURIX-2008, ICAIL-2009, ICAIL-2011, as well as in the International Summer School “Managing Legal Resources in the Semantic Web” (2007, 2008, 2009, 2010, 2011).
To continue this momentum and to advance research, a 4th Workshop on “Semantic Processing of Legal Texts” is being organized at the LREC-2012 conference to bring to the attention of the broader LR/HLT (Language Resources/Human Language Technology) community the specific technical challenges posed by the semantic processing of legal texts and also share with the community the motivations and objectives which make it of interest to researchers in legal informatics. The outcome of these interactions are expected to advance research and applications and foster interdisciplinary collaboration within the legal domain.
New to this edition of the workshop are two sub-events (described below) to provide common and consistent task definitions, datasets, and evaluation for legal-IE systems along with a forum for the presentation of varying but focused efforts on their development.
The main goals of the workshop and associated events are to provide an overview of the state-of-the-art in legal knowledge extraction and management, to explore new research and development directions and emerging trends, and to exchange information regarding legal language resources and human language technologies and their applications.
Sub-events
Dependency Parsing
The first sub-event will be a shared task specifically focusing on dependency parsing of legal texts: although this is not a domain-specific task, it is a task which creates the prerequisites for advanced IE applications operating on legal texts, which can benefit from reliable preprocessing tools. For this year our aim is to create the prerequisites for more advanced domain-specific tasks (e.g. event extraction) to be organized in future SPLeT editions. We strongly believe that this could be a way to attract the attention of the LR/HLT community to the specific challenges posed by the analysis of this type of texts and to have a clearer idea of the current state of the art. The languages dealt with will be Italian and English. A specific Call for Participation for the shared task is available in a dedicated page.
Semantic Annotation
The second sub-event will be an online, manual, collaborative, semantic annotation exercise, the results of which will be presented and discussed at the workshop. The goals of the exercise are: (1) to gain insight on and work towards the creation of a gold standard corpus of legal documents in a cohesive domain; and (2) to test the feasibility of the exercise and to get feedback on its annotation structure and workflow. The corpus to be annotated will be a selection of documents drawn from EU and US legislation, regulation, and case law in a particular domain (e.g. consumer or environmental protection). For this exercise, the language will be English. A specific Call for Participation for this annotation exercise is available in a dedicated page.
Areas of Interest
The workshop will focus on the topics of the automatic extraction of information from legal texts and the structural organisation of the extracted knowledge. Particular emphasis will be given to the crucial role of language resources and human language technologies.
Papers are invited on, but not limited to, the following topics:

Construction, extension, merging, customization of legal language resources, e.g. terminologies, thesauri, ontologies, corpora

Information retrieval and extraction from legal texts

Semantic annotation of legal text

Legal text processing

Multilingual aspects of legal text semantic processing

Legal thesauri mapping

Automatic Classification of legal documents

Logical analysis of legal language

Automated parsing and translation of natural language legal arguments into a logical formalism

Dialogue protocols for legal information processing

Controlled language systems for law

LREC Conference Information (Accommodation, Travel, Registration)
Language Resources and Evaluation Conference 2012 (LREC 2012)
Workshop Schedule – TBA
Workshop Registration and Location – TBA
Webpage URLs

This page is http://wyner.info/LanguageLogicLawSoftware/?p=1233

An alternative workshop webpage

Important Dates:

REVISED Submission: 19 February 2012

Acceptance Notification: 12 March 2012

Final Version: 30 March 2012

Workshop date: 27 May 2012

Author Guidelines:
Submissions are solicited from researchers working on all aspects of semantic processing of legal texts. Authors are invited to submit papers describing original completed work, work in progress, interesting problems, case studies or research trends related to one or more of the topics of interest listed above. The final version of the accepted papers will be published in the Workshop Proceedings.
Short or full papers can be submitted. Short papers are expected to present new ideas or new visions that may influence the direction of future research, yet they may be less mature than full papers. While an exhaustive evaluation of the proposed ideas is not necessary, insight and in-depth understanding of the issues is expected. Full papers should be more well developed and evaluated. Short papers will be reviewed the same way as full papers by the Program Committee and will be published in the Workshop Proceedings.
Full paper submissions should not exceed 10 pages, short papers 6 pages. See the style guidelines and files on the LREC site:
Authors’ Kit and Templates
Submit papers to:
Submission for the workshop uses the START submission system at:
https://www.softconf.com/lrec2012/LegalTexts2012/
Note that when submitting a paper through the START page, authors will be asked to provide essential information about resources (in a broad sense, i.e. also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a new result of your research. For further information on this new initiative, please refer to:
http://www.lrec-conf.org/lrec2012/?LRE-Map-2012
Publication:
After the workshop a number of selected, revised, peer-reviewed articles will be published in a Special Issue on Semantic Processing of Legal Texts of the AI and Law Journal (Springer).
Contact Information:
Address any queries regarding the workshop to:
lrec_legalWS@ilc.cnr.it
Program Committee Co-Chairs:
Enrico Francesconi (National Research Center, Italy)
Simonetta Montemagni (National Research Center, Italy)
Wim Peters (University of Sheffield, UK)
Adam Wyner (University of Liverpool, UK)
Program Committee (Preliminary):
Kevin Ashley (University of Pittsburgh, USA)
Johan Bos (University of Rome, Italy)
Daniele Bourcier (Humboldt Universitat, Germany)
Pompeu Casanovas (Universitat Autonoma de Barcelona, Spain)
Jack Conrad (Thomson Reuters, USA)
Matthias Grabmair (University of Pittsburgh, USA)
Antonio Lazari (Scuola Superiore S.Anna, Italy)
Leonardo Lesmo (Universita di Torino, Italy)
Marie-Francine Moens (Katholieke Universiteit Leuven, Belgium)
Thorne McCarty (Rutgers University, USA)
Raquel Mochales Palau (Catholic University of Leuven, Belgium)
Paulo Quaresma (Universidade de Evora, Portugal)
Tony Russell-Rose (UXLabs, UK)
Erich Schweighofer (Universitat Wien, Austria)
Rolf Schwitter (Macquarie University, Australia)
Manfred Stede (University of Potsdam, Germany)
Daniela Tiscornia (National Research Council, Italy)
Tom van Engers (University of Amsterdam, Netherlands)
Giulia Venturi (Scuola Superiore S.Anna, Italy)
Vern R. Walker (Hofstra University, USA)
Radboud Winkels (University of Amsterdam, Netherlands)
By Adam Wyner

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Papers Accepted to the JURIX 2011 Conference

My colleagues and I have had two papers (one long and one short) accepted for presentation at The 24th International Conference on Legal Knowledge and Information Systems (JURIX 2011). The papers are available on the links.
On Rule Extraction from Regulations
Adam Wyner and Wim Peters
Abstract
Rules in regulations such as found in the US Federal Code of Regulations can be expressed using conditional and deontic rules. Identifying and extracting such rules from the language of the source material would be useful for automating rulebook management and translating into an executable logic. The paper presents a linguistically-oriented, rule-based approach, which is in contrast to a machine learning approach. It outlines use cases, discusses the source materials, reviews the methodology, then provides initial results and future steps.
Populating an Online Consultation Tool
Sarah Pulfrey-Taylor, Emily Henthorn, Katie Atkinson, Adam Wyner, and Trevor Bench-Capon
Abstract
The paper addresses the extraction, formalisation, and presentation of public policy arguments. Arguments are extracted from documents that comment on public policy proposals. Formalising the information from the arguments enables the construction of models and systematic analysis of the arguments. In addition, the arguments are represented in a form suitable for presentation in an online consultation tool. Thus, the forms in the consultation correlate with the formalisation and can be evaluated accordingly. The stages of the process are outlined with reference to a working example.
Shortlink to this page.
By Adam Wyner

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Draft Materials for LEX 2011

Draft post
At the links below, you can find the slides and hands on materials on GATE for the LEX summer school on Managing Legal Resources in the Semantic Web.
GATE Legislative Rulebook
By Adam Wyner

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.