Discussion of GATE's Onto Root Gazetteer

In Instructions for GATE’s Onto Root Gazetteer, I have information to set up Onto Root Gazetteer. In this post, I discusses aspects of the Onto Root Gazetteer that I found interesting or problematic.
For me, the documentation was not helpful as too much technical information was provided (e.g. preprocessing the ontology) rather than the steps just to get it to run. Also, no walk through example was clearly illustrated. I would still like (and will provide in the near future) a richer text (a nice paragraph) and a simpler ontology (couple of classes, subclasses, object and data properties, and individuals) to illustrate just what is done fully.
Though I have it running, there are several questions (and partial answers or musings):

  • What is the annotation relative to the ontology good for?
  • What is the difference between gazetteers derived from ontologies and default gazetteers?
  • What is the selection criteria for annotating the tokens?
  • What is the relationship between the annotated text and the ontology?

Concerning the first point, presumably more annotations allow more processing capabilities. A (simple) example would be very helpful.
Concerning the second point, matters are more complex (to my mind). First, default gazetteers (or flexible gazetteers for that matter) are flat lists (a list containing no sublists as parts) where the items in the list are annotated as per the properties of the list; for example, if we have a gazetteer for Organisation (call this the header of the list) which lists IBM, BBC, Hackney Council (call these the items of the list), then every token of IBM, BBC, and Hackney Council found in the corpus will be annotated Organisation. If there is a token organisation in the corpus, it will not be annotated with Organisation; similarly, no token of IBM in the corpus is annotated IBM. The list categorises, in effect, IBM, BBC, and Hackney Council as of the type Organisation.
ORG works differently (I believe, but may be wrong), but these points are not made in the documentation. First, a gazetteer which is derived from an ontology preserves the subsumption hierarchy of the ontology, giving us a list of lists. Such a gazetteer is a taxonomy of terminology, which is not the same as an ontology (though frequently mistaken to be identical). Second, if a token in the text is found to (flexibly) match an item in the gazetteer, then the token is annotated with that item, meaning that if the string IBM is a token in our text and an item in the gazetteer, then token is annotated IBM. In these respect, ORGs work differently from other gazetteers.
The third question might be addressed in the richer documentation concerning ORG. It relates to observations concerning the results of the example application. Consider the following. The token “language resources” has the annotation:
URI=http://gate.ac.uk/ns/gate-ontology#LanguageResource, heuristic_level=0, majorType=, propertyURI=http://www.w3.org/2000/01/rdf-scheme#label, type=class
The token “resources” has the annotation:
URI=http://gate.ac.uk/ns/gate-ontology#GATEResource, heuristic_level=0, majorType=, propertyURI=http://www.w3.org/2000/01/rdf-scheme#label, type=class
And the token “parameters” has annotation:
URI=http://gate.ac.uk/ns/gate-ontology#ResourceParameter, heuristic_level=0, majorType=, propertyURI=http://www.w3.org/2000/01/rdf-scheme#label, type=class
We see that the tokens in the text are annotated in relation to the ontology. Yet it is not clear why the token “resources” is not annotated with LanguageResource or ResourceParameter since these are components of the ORG as well. Likely there is some prioritising among the annotations that we need to learn.
Finally, concerning the last question, matters are somewhat unclear (to me) largely because the line between annotations, gazetteers, and ontologies are blurred, where for me the key unclarity focuses around annotations in the text that match items in the gazetteer. Consider the issue from a different point of view. ORG was developed in the context of a project to support ontology development from text — find terms and relations which are candidates for the ontology, then (if one wants) use the terms and relations to build the ontology. For example, if one sees lots of occurrences of “organisation” in the text, then perhaps it would be introduced as a concept in the ontology. We have a many-one relation from the tokens to the ontology. This makes sense. See it another way, where we have a default gazetteer where every given token (e.g. IBM) in a text has the same annotation, giving the impression of a one-many relation. This also makes sense. Neither of these seem problematic to me largely because I don’t really know much or presume much about the meaning of the annotation on the token: from the text, I abstract the concept, from the gazetteer, I label tokens as belonging to the same annotation class. In no case is a token “organisation” annotated with Organisation; even if it were, I couldn’t really object unless I said more about what I think the annotation means.
Contrast these points with what goes on with ORG (admittedly, this gets pretty philosophical, and in terms of day to day practice, it may not be relevant). First, it seems that one instance in the ontology is associated with multiple tokens in the text. Second, an instance or class in the ontology can be associated with a token that is intended to have some similar meaning — e.g. the individual IBM in the ontology is associated by annotation with every token of IBM in the text, and similarly for the classes. Neither of these make sense to me in terms of what ontologies are intended to represent, which is a state of knowledge (the fixed concepts, object and data properties, and individuals) about a domain. On the first point, how can I be assured that the intended meaning of tokens is the same throughout the corpus? In one document, we might find IBM as the name of a non-existent company, in an other for an existing company, and in another for a company that has gone bankrupt. Simply put, the string might remain the same, but the knowledge we have about it may vary. Ontologies (as they are currently represented) do not allow such dynamic interpretation. To ignore this point risks having annotations (and whatever might flow from the annotations) slip; for example, it would be wrong to find a relationship between IBM and owners where the company doesn’t exist. On the second point, conceptually it makes no sense to say that a token “organisation” is itself associated with the concept or instance or ‘organisation’ in the ontology. Or course, in developing the ontology, going from the text to the ontology makes good sense since one is abstracting from the text to the ontology. Yet, in that move, one makes something different — a concept over all the “ideas” drawn from the tokens. So, I disagree emphatically with Peters and Maynard (from the NeON article): “Texts are annotated with ontology classes, and the textual elements function as instances of these classes.” The textual element “organisation” or “IBM” is an instance of the concept organisation or the individual IBM? I think this is a category mistake.
In general, I find the relationship between the text, intermediate representations (gazettees), and ontologies (higher level representations of knowledge) rather interesting, but somewhat murky. As I said earlier, perhaps this is just philosophy. Depending on the domain of discussion, the corpus, and the way the annotations and ontologies are used, perhaps my intuition of lurking trouble will not be realised…. Equally, there is likely something simple that I’m missing. If so, please enlighten me.
By Adam Wyner
Distributed under the Creative Commons
Attribution-Non-Commercial-Share Alike 2.0

London's DataStore Workshop

Today I attended a workshop organised by the Greater London Authority (GLA), which is the citywide government for London. The workshop was held at City Hall on the top floor where we had a splendid view over the Thames, of Tower Bridge, and the Tower of London.
The GLA is in the process of scoping a datastore for information London. The objective is to begin to encourage development of “government 2.0” using open government data along the lines of what has been done in San Franscisco in the US (see an article on DataSF and a post by San Francisco Mayor Gavin Newsom). The principle idea is that by putting data of public interest into the public domain, the government can provide the basis for development of applications and services for the government, business community, and public. For example, using police data, one can generate crime maps.
At the GLA meeting, the objective was to meet with the developer community to get ideas and feedback on what and how the data should be released as well as how best to encourage applications in the near future.
Clearly, the GLA meeting is along the lines of what is happening elsewhere in the UK government (see Digital Engagement at the Cabinet Office, the Office of Public Sector Information, and The Stationery Office).
There were some 70 participants at the meeting, and we can look forward to further information coming from the organisers at the GLA. Some very useful suggestions where made about where to get further information such as the Technology Strategy Board which supports technology development in the UK.
Among the topics of discussion where:

  • What sort of data should be released and in what form? There were those who wanted it raw and those who wanted it structured. Likely releasing it both forms will occur.
  • How to get licensing for the data? There are a host of difficult issues here, as most of the data is owned or copyrighted by a range of organisations, each of whom wants to control the flow of information, profit from it, or has concerns about security/liability. Moreover, the government contracts information service providers, which process the data, may have some legal claim. Such providers may be required to make their data open.
  • How would the data be used? There were many suggestions about data reuse and mash up, mostly along the lines of existing applications such as mapping data to physical maps in order to get ideas about what is happening where in neighborhoods, transportation assistance, information access in a local area, and so on.
  • Who would develop the applications and how would development be funded? Clearly there is an issue about funding, but some of the ways around it are to leverage funding between academic, government, and business communities.
  • Who to consult about applications? A range of parties might be consulted about what they would find useful, from the person on the street, to members of service organisations (police, licensing, etc), to higher level government organisations. Alternative, the GLA or similar organisations might develop applications which they thought would be useful, then provide them to the public. Here, the focus would be on small, manageable, pilot projects to show proof of concept.

The discussion was very nicely organised and led — several large tables around a circle, several large monitors, several boards for writing, an MC who kept things moving along, and a good overall atmosphere. However, missing were a list of participants, contact information, and a short (three sentence) statement of interest; hopefully, all this will appear soon.
This is all very interesting and exciting, for things are just beginning to happen. However, I have some concerns and realise I have a somewhat different focus.

  • There were too few participants from government services and academia. This is also reflected in the gap between the technology and the data, since it is highly unclear who is developing what for whom and what purpose? It is hard to get a handle on how data should be served without some sense of goals. Nonetheless, likely there will be another meet-up at which this more substantive discussion will happen.
  • There was only passing mention of ontologies, annotation, information extraction, and the semantic web. The absence of semantic web concepts suggests that “reasoning” and complex information management is not high on the agenda. This is consistent with the family of application ideas (graphs, maps, local information). While I was told that there were ontologies via OPSI, this is not what I understood from John Sheridan in my recent discussion, so I will be eager to see exactly what this is.
  • Similarly, there was some discussion about whether there should be or could be standards and schemas for the data. These are always compatible (make them both available), but I see standards are essential for any communication across agencies and localities. There are drawbacks to standards development, but the issue will arise sooner or later in any case.
  • There is, as yet, no the pitch. In other words, what is the incentive for anyone to make their data available or for organisations to otherwise cooperate with this endeavour? The only mentions of incentive were government obligation, but this is perhaps the most heavy handed way to make headway. Rather, there should be positive incentives. In addition, the eGovernment agenda should be pushed (e.g. transparency, support for government, participation, efficiency, cost reduction, consistency….).
  • There was little discussion of exactly which technologies would be used though RDF/XML and REST were mentioned. These are generic and widespread; is the real hangup right now data access, or is there some technological issue? If I wanted to know what I should need to know to program and provide a simple service, what would I have to know and do?
  • Despite the widespread interest in government 2.0, there is little vertical/horizontal integration or communication among the interested parties. There is not, apparently, a coherent website or ‘state of the art’ article with links to the relevant data/functionalities/support organisations.
  • There was no over-arching conception of design or context for applications. Likely some sort of ‘apps’ or plugins framework will emerge so that, for example, a local council would build none of its own applications or services, but these would be provided as plugins by independent providers, yet given a consistent style and structure.
  • Though there are claims that there have been consultations about government 2.0 with the various interested parties, there is no clear presentation of the results of those consultations. A ‘brain-storming’ site would be very useful.
  • It is unclear to me the extent to which the participants have the political/social context in mind. While we were hosted by the GLA and discussed GLA data, the opportunities, limitations, requirements, and objectives of government seem to have entirely overlooked. For example, government is successful (not always, but often) with making and monitoring standards for the public good; as elsewhere, why not here? The requirements of government information provision are different than for commercial provision, especially since the government provides goods and services that would not otherwise be profitable. The government does consult with the public and interested parties in making policy, but in some cases it is crucial that government lead and direct developments; the government is not simply another commerical provider of goods and services, driven by consumer interests; the government has a legislative role. Keeping this in mind may change the sorts of proposals that come out from open GLA data
  • There were several discussions about why and how government data should be published. The main points ought to be developed, discussed further, and summarised. Yet, it ought also be pointed out that there is, in the UK, an abundance of information that the government holds about individuals; it is unclear how a ‘firewall’ to protect and promote civil liberties will be set up and maintained; privacy and rights are in fact rather weak in the UK. For example, the NHS is state funded and one might argue certain matters are in the public interest, so open information issues may arise here: will we have ‘disease’ mashups such as there are for broken lamp-posts, but in this case for drug addicts, HIV carriers, swine flu, etc?
  • I am particularly interested in legal reasoning, but this is not something on the agenda with respect to this data.

In any case, there is much of interest here and much to look forward to.
Cheers,
Adam Wyner
Copyright © 2009 Adam Wyner

New publication in AAAI Symposium

I and my colleagues have a paper forthcoming in the proceedings of the AAAI Fall Symposium (November 2009) The Uses of Computational Argumentation. Trevor will have the honours of making the presentation at the symposium. Below please find a link to the paper and an abstract.
Cheers,
Adam Wyner
Instantiating Knowledge Bases in Abstract Argumentation Frameworks
Adam Wyner
University College London
Trevor Bench-Capon and Paul Dunne
University of Liverpool
Abstract
Abstract Argumentation Frameworks (AFs) provide a fruitful basis for exploring issues of defeasible reasoning. Their power largely derives from the abstract nature of the arguments within the framework, where arguments are atomic nodes in an undifferentiated relation of attack. This abstraction conceals different conceptions of argument, and concrete instantiations encounter difficulties as a result of conflating these conceptions. We distinguish three distinct senses of the term. We provide an approach to instantiating AFs in which the nodes are restricted to literals and rules, encoding the underlying theory directly. Arguments, in each of the three senses, then emerge from this framework as distinctive structures of nodes and paths. Our framework retains the theoretical and computational benefits of an abstract AF, while keeping notions distinct which are conflated in other approaches to instantiation.

Text Mining Legal Resources with GATE — Study 1

This page reports the results of a first study of applying GATE to a legal resource. The focus of this study was to annotate a list of cases.
I used a web page from BAILII which contains a list of cases with the following information:

  • The first party in a case, e.g. Meade.
  • The second party in a case, e.g. Mason.
  • The citation date, e.g. [1999]
  • The court level in which the case was decided, e.g. England and Wales Court of Appeals.
  • The court within the level, e.g. Civil.
  • The citation number, e.g. 780.
  • The date of the decision, e.g. 12 February 1999.

A sample of entries from the page I worked with is:
McSpadden v Keen [1999] EWCA Civ 1515 (27 May 1999)
McTaggart, R v [1997] EWCA Crim 3050 (24th November, 1997)
McTaggart, R v [1997] EWCA Crim 3137 (2nd December, 1997)
McVeigh & Anor, R v [1998] EWCA Crim 784 (3rd March, 1998)
McWhirter & Anor, R (on the application of) v Secretary of State for Foreign and Commonwealth Affairs [2003] EWCA Civ 384 (05 March 2003)
M-D v D [2008] EWHC 1929 (Fam) (19 December 2008)
MD (Guinea) v Secretary of State for the Home Department [2009] EWCA Civ 733 (17 June 2009)
MD (Iran) v Secretary of State for the Home Department [2007] EWCA Civ 532 (27 April 2007)
Below, we have a screenshot of the result of annotation in GATE. The parts of the annotation are colour coded as appear in the column on the right. In Firefox, one can right click on the image, then View Image in order to view a larger version, then click the back button on the browser to return to the post.
GATE annotations on a list of legal case information
There were range of irregularities in the source which had to be accommodated:

  • v and v. for the versus relation.
  • Decision date formats.
  • Length of the names of the parties.
  • Different orders of court and court level
  • Variations that arise as a consequence of using a page stripped of HTML annotations. The first name in the image is an artifact.

In this approach, I did not annotate the parties as plaintiff and defendant as the case decisions themselves associate the parties with different roles in different court contexts; our approach is more general. In consideration of the variants among case citations, I opted to identify each piece of the citation, which will allow one to extract and reconstruct the citation in a subsequent work.
While a small scale and relatively simple task, the result has one main strength — it gives us a list of parties to cases. It is difficult to automatically identify parties in general, but with this approach, we can extract those entities which have been involved in a case, then use that information for subsequent annotation tasks. Another strength is that we have isolated the components of the case citation, which can then be reconstructed as we wish.
The list of parties could be further refined by isolating last names, distinguishing among parties which appear in a list, differentiating persons from organisations, and filtering out additional information that appears. This is left for future work.
The Case Base List zip file contains the following files, which were used with GATE.

  • ew-cases-0133.html, which is the HTML file that lists the cases.
  • ew-cases-0133SHORT.xml, which is the XML file with the result of annotation. This is file related to the graphic above. The file is a short version of ew-cases-0133.html so that one can more easily see the results of the annotation. These appear as stand-off annotations. In the first part of the file, one can see the tokens of the file with numerical ranges (node numbers); later in the file, one can see indications of the annotations, making reference to the starting and ending numbers of each token.
  • GraphicListAnnotation.png, the graphic above.
  • CiteYear.jape, this annotates out the citation year for use in the citation as in [1998]
  • Courts_abbr.jape, this annotates the court level in terms of abbreviations as in EWCA, which is the English and Wales Court of Appeals.
  • dateAWynerMods.jape, this annotates the decision date such as (23rd June, 2001) and (21 July 2000).
  • FirstParty.jape, this annotates the first party, which is that party to the left of versus.
  • SecondParty.jape, this annotates the second party, which is that party to the right of versus.
  • SubCourts_abbr.jape, this annotates the courts within a court level such as civil courts (Civ) and criminal courts (Crim).
  • Versus.jape, this annotates the versus divider.
  • england_wales_courts_hierarchy.lst, this is a list of courts in England and Wales.
  • england_wales_courts_hierarchy_abbr.lst, this is a list of abbreviations for the courts in England and Wales.
  • england_wales_courts_subclass.lst, this is a list of the divisions within a court level.
  • england_wales_courts_subclass_abbr.lst, this is a list of abbreviations of courts within a court level.
  • cite_year.lst, this is a list of years with square brackets as in [1999]. Perhaps a rule can be written for this, taking into account the brackets.
  • list.def, the ‘master list’ of lists for use in GATE.

The files are released under a Creative Commons Attribute and ShareAlike license. The main objective of the contribution is to foster open, public, and collaborative development of text mining tools for legal resources.
Advice, suggestions, alternatives, and contributions along the lines of this work are very welcome.
Cheers,
Adam
Copyright © 2009 Adam Wyner

Meeting with John Sheridan on the Semantic Web and Public Administration

I met today with John Sheridan, Head of e-Services, Office of Public Sector Information, The National Archives, located at the Ministry of Justice, London, UK. Also at the meeting was John’s colleague Clare Allison. John and I had met at the ICAIL conference in Barcelona, where we briefly discussed our interests in applications of Semantic Web technologies to legal informatics in the public sector. Recently, John got back in contact to talk further about how we might develop projects in this area.
Perhaps most striking to me is that John made it clear that the government (at least his sector) is proactive, looking for research and development projects that make government data available and usable in a variety of ways. In addition, he wanted to develop a range of collaborations to better understand the opportunities the Semantic Web may offer.
As part of catching up with what is going on, I took a look around the web for relatively recent documents on related activities.

In our discussion, John gave me an overview of the current state of affairs in public access to legislation, in particular, the legislative markup and API. The markup is intended to support publication, revision, and maintenance of legislation, among other possibilities. We also had some discussion about developing an ontology of goverment which would be linked to legislation.
Another interesting dimension is that John’s office is one of a few that I know of which are actively engaged to develop a knowledge economy partly encouraged by public administrative requirements and goals. Others in this area are the Dutch and the US (with xml.gov). All very promising and discussions well worth following up on.
Copyright © 2009 Adam Wyner

Session I of "Automated Content Analysis and the Law" Workshop

Today is session I of the NSF sponsored workshop on Automated Content Analysis and the Law. The theme of today’s meeting is the state of judicial/legal scholarship in order to:

  • Identify the theoretical and substantive puzzles in legal and judicial scholarship which might benefit from automated content analysis
  • Discuss the kinds of data/measures that are required to address these puzzles which automated content analysis could provide.

Further comments later in the day after the session.
–Adam Wyner
Copyright © 2009 Adam Wyner

London GATE Users Group

At the recent GATE Summer School in Sheffield, there was some discussion among people from London to form an occasional, informal users group where GATE users based in London can arrange to meet to go over tutorials, develop tutorials, discuss how we work with GATE, help one another with problems, and generally have a bit of a blab over tea with others who have similar interests.
As the informal organiser of this informal group, I thought my blog (which touches on topics related to text analytics) might be an acceptable place to announce and maintain the group. If things really get going, then perhaps the group will hive off to its own site.
I would like to suggest Thursday, August 20 in the early evening (e.g. 19:00) as our first meeting time. Likely the meeting would be till 20:30. Place (somewhere in central London — Covent Garden/Leicester Square) to be announced. Please let me know if this time and vicinity suits you, as we are looking to have more than one person show up.
Likely people will bring laptops, but we’ll try to arrange a projector as well for public show and tell. If you have something you would like to discuss or show, that would be good, but we can always find something to do and discuss.
It is an open group, and if you would like to be kept informed of any upcoming meetings, please send an email to Adam Wyner (adam@wyner.info). Feel free also to join this blog as one way to keep in touch with this group.
The group currently has the following participants:

  • Dipti Garg (Fizzback)
  • Hercules Fisherman (Fizzback)
  • Adam Wyner (University College London)
  • Auhood Alfaries (Brunel University)
  • Helen Flatley (EqualMedia)
  • Gerhard Brey (King’s College London)
  • Daniel Elias (Hawk Ridge Capital Management)
  • Renato Souza (Universidade Federal de Minas Gerais)

We look forward to our first meeting and to hearing from other people who may be interested in working with GATE. Comments on this topic are very welcome.
Cheers!
Adam Wyner

Participating in One-Lex — Managing Legal Resources on the Semantic Web

Later this summer, I’ll be participating in the summer school Managing Legal Resources in the Semantic Web, September 7 to 12 in San Domenico di Fiesole (Florence, Italy). This program will focus on several aspects of legal document management:

  • Drafting methods, to improve the language and the structure of legislative texts
  • Legal XML standards, to improve the accessibility and interoperability of legal resources
  • Legal ontologies, to capture legal metadata and legal semantics
  • Formal representation of legal contents, to support legal reasoning and argumentation
  • Workflow models, to cope with the lifecycle of legal documentation

While I’m familiar with several of these areas, I’m using this opportunity to fill in my knowledge in these key areas.

NSF sponsored workshop: Automated Content Analysis and the Law

I was invited to participate in an NSF ­Sponsored Workshop
 Automated Content Analysis and Law, August 3 and 4 at NSF HQ in Arlington, VA and organised by Georg Vanberg (UNC).
There are two sessions planned. The first session will focus on identifying the theoretical/substantive puzzles in legal and judicial scholarship that might benefit from automated content analysis as well as what data and measurements are required. For the second session, the focus is on the state of automated content analysis/natural language processing, exploring the extent to which current technology is relevant to providing results with respect to issues raised in the first session and what might be needed.
There is an interesting mix of people, with a strong emphasis on legal scholarship bearing on the US Supreme Court and opinion mining. I had an email exchange with Georg, the workshop organiser about this, and we agree that attention ought to turn from the Supreme Court to lower levels of the legal system. I also suggested that participants consider some of the following points which bear on the motives and objectives of these lines of research in terms of who is being served and how the data or conclusions would be used.
Questions for Discussion

  • What sorts of artifacts and technologies (if any) will emerge from the research?
  • How does the research relate to the Semantic Web?
  • What public service does the research provide or support?
  • How does this research relate to:
    • E-discovery
    • Textual legal case based reasoning
    • Legislative XML Markup
    • Other research communities e.g. ICAIL and JURIX

Participants

  • Scott Barclay (NSF) – Barclay@uamail.albany.edu
  • Cliff Carrubba (Emory) – ccarrub@emory.edu
  • Skyler Cranmer (UNC) – skylerc@email.unc.edu
  • Barry Friedman (NYU)- friedmab@juris.law.nyu.edu
  • Susan Haire (NSF) – shaire@nsf.gov
  • Lillian Lee (Cornell) – llee@cs.cornell.edu
  • Jimmy Lin (Maryland) – jimmylin@umd.edu
  • Stefanie Lindquist (Texas) – SLindquist@law.utexas.edu
  • Will Lowe (Nottingham) – will.lowe@nottingham.ac.uk
  • Andrew Martin (Wash U) – admartin@wustl.edu
  • Wendy Martinek (NSF) – wemartin@nsf.gov
  • Kevin McGuire (UNC) – kmcguire@unc.edu
  • Wayne McIntosh (Maryland) – wmcintosh@gvpt.umd.edu
  • Burt Monroe (Penn State) – blm24@psu.edu
  • Kevin Quinn (Harvard) – kevin_quinn@harvard.edu
  • Jonathan Slapin (Trinity College) – jonslapin@gmail.com
  • Jeff Staton (Emory) – jkstato@emory.edu
  • Georg Vanberg (UNC) – gvanberg@unc.edu
  • Adam Wyner (University College London) – adam@wyner.info

General Architecture for Text Engineering Summer School

Next week I’m attending a week long summer school on General Architecture for Text Engineering (GATE). GATE is an open-source and extensible toolkit for text mining, which has been used in a variety of areas. After having worked with people who had their “hands on” the tools, I decided it would better suit me to be able to work the material myself. I’ve been looking forward to this summer school for some time and am excited at the prospect of applying GATE tools to a DB of legal cases as well as developing an ontology.