Tutorial on "Textual Information Extraction from Legal Resources" at the 16th International Conference on Artificial Intelligence and Law, Rome, Italy


Legal resources such as legislation, public notices, case law, and other legally relevant documents are increasingly freely available on the internet. They are almost entirely presented in natural language and in text. Legal professionals, researchers, and students need to extract and represent information from such resources to support compliance monitoring, analyse cases for case based reasoning, and extract information in the discovery phase of a trial (e-discovery), amongst a range of possible uses. To support such tasks, powerful text analytic tools are available. The tutorial presents an in depth demonstration of one toolkit the General Architecture for Text Engineering (GATE) with examples and several briefer demonstrations of other tools.


Participants in the tutorial should come away with some theoretical sense of what textual information extraction is about. They will also see some practical examples of how to work with a corpus of materials, develop an information extraction system using GATE and the other tools, and share their results with the research community. Participants will be provided with information on where to find additional materials and learn more.

Intended Audience

The intended audience includes legal researchers, legal professionals, law school students, and political scientists who are new to text processing as well as experienced AI and Law researchers who have used NLP, but wish to get a quick overview of using GATE.

Covered Topics

  • Motivations to annotate, extract, and represent legal textual information.
  • Uses and domains of textual information extraction. Sample materials from legislation, case decisions, gazettes, e-discovery sources, among others.
  • Motivations to use an open source tool for open source development of textual information extraction tools and materials.
  • The relationship to the semantic web, linked documents, and data visualisation.
  • Linguistic/textual problems that must be addressed.
  • Alternative approaches (statistical, knowledge-light, machine learning) and a rationale for a particular bottom-up, knowledge-heavy approach in GATE.
  • Outline of natural language processing modules and tasks.
  • Introduction to GATE – loading and running simple applications, inspecting the results, refining the search results.
  • Development of fragments of a GATE system – lists, rules, and examination of results.
  • Discussion of more complex constructions and issues such as fact pattern identification, which is essential for case-based reasoning, named entity recognition, and structures of documents.
  • Introduction to ontologies.
  • Link textual information extraction to ontologies.
  • Introduction to related tools and approaches: C&C/Boxer (parser and semantic interpreter), Attempto Controlled English, scraperwiki, among others.

Date, Time, Location, and Logistics

Monday, June 10, afternoon session.
The tutorial was held at the Casa dell’Aviatore, viale dell’Università 20 in Rome, Italy.
Information about the conference is available at the website for the 16th International Conference on Artificial Intelligence and the Law (ICAIL).


The slides from the presentation are available here:
Textual Information Extraction from Legal Resources

Further Information

Contact the lecturer.


Dr. Adam Wyner
Lecturer, Department of Computing Science, University of Aberdeen
Aberdeen, Scotland
azwyner at abdn dot ac dot uk
The lecturer has a PhD in Linguistics, a PhD in Computer Science, and research background in computational linguistics. The lecturer has previously given a tutorial on this topic at JURIX 2009 and ICAIL 2011 along with an invited talk at RuleML 2012, has published several conference papers on text analytics of legal resources using GATE and C&C/Boxer, and continues to work on text analysis of legal resources.
A shortlink to this webpage
By Adam Wyner
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.