Crowdsourced Legal Case Annotation

A study in online, collaborative legal informatics
Adam Wyner, University of Aberdeen
Wim Peters, University of Sheffield
Daniel Katz, Michigan State University
— Introduction —
This is an academic research study on legal informatics (information processing of the law). The study uses an online, collaborative tool to crowdsource the annotation of legal cases. The task is similar to legal professionals’ annotation of cases. The result will be a public corpus of searchable, richly annotated legal cases that can be further processed, analysed, or queried for conceptual annotations.
Adam and Wim are computer scientists who are interested in language, law, and the Internet. Dan is an academic lawyer also interested in law and the Internet.
We are inviting people to participate in this collaborative task. This is a beta version of the exercise, and we welcome comments on how to improve it. Please read through this blog post, look at the video, and get in contact.
— Highlighting, Annotations, and Legal Case Briefs —
In reading, analysing, and preparing a summary of a legal case, law students and legal professionals annotate cases by highlighting and colour coding elements of the case to make for easy identification. Different elements are annotated: the holding, the parties, the facts, and so on. A sample image of annotations is:

Annotations for Case Citations, Legal Roles, Jurisdiction, Hearing Date

— Problem —
To analyse a legal case, legal professionals annotate the case into its constituent parts. The analysis is summarised in a case brief. However, the current approach is very limited:

  • Analysis is time-consuming and knowledge-intensive.
  • Case briefs may miss relevant information.
  • Case analyses and briefs are privately held.
  • Case analyses are in paper form, so not searchable over the Internet.
  • Current search tools are for text strings, not conceptual information. We want to search for concepts such as for the holdings by a particular judge and with respect to causes of action against a particular defendant. With annotated legal cases, we can enable conceptual search.
  • There is no capacity to systematically compare, contrast, and evaluate the work by different annotators. Consequently, the annotation task itself is not used as an opportunity to gain greater expertise in case analysis.
  • — Solution: Crowdsource Annotation —
    We use an online legal case annotation tool and share the results to support:

  • Online search in legal cases for case details and concepts.
  • Semantic web applications and information extraction.
  • Crowd-source a legal case corpus.
  • Training and learning for legal case analysis.
  • The results of the study would be useful to:

  • Law school students learning case analysis.
  • Legal professionals in identifying relevant cases.
  • Researchers of legal informatics.
  • Law faculty in training students to analyse legal cases.
  • Broadly speaking, a corpus of analysed cases makes case law a public resource that democratises legal knowledge.
    — Annotations: types and features —
    To crowdsource conceptual annotations of legal cases, we use the General Architecture of Text Engineering (GATE) Teamware tool. Teamware is a web-based application that provides an annotator with a text to annotate and a list of annotations to use. The task is a web-based version of what legal analysts of cases already do.
    We use familiar annotations for legal cases, divided (for ease of reference) into types and features. For example, we have a type Legal Roles and various features to select among, e.g. defendant. We are counting on you to have learned and used these annotations in the course of your legal study and practice.
    You do not need to memorise the types and features as they will appear in the GATE Teamware tool.  It may be handy to keep this webpage open so you can consult it or you could also print out the page.
    The annotations we use are:
    Argument For Party – arguments for a particular party, using the most general notion:

  • for Appellee, for Appellant, for Defendant, for Plaintiff.
  • Facts – legal and procedural facts:

  • Cause of Action – the specific legal theory upon which the plaintiff brings the suit.
  • Defenses raised by Defendant – the defendant defenses against the cause of action.
  • Legal Facts – the legally relevant facts of the case that are used in arguing the issues.
  • Remedy requested by Plaintiff – what the plaintiff asks the court to grant.
  • Indexes – various indicative information:

  • Case Citation – the citation of the particular case being annotated.
  • Court Address – the address of the court.
  • Hearing Date – the date of the hearing.
  • Judge Name – the names of the judge, annotated one at a time.
  • Jurisdiction – the legal jurisdiction of the case.
  • Issues – the issues before the court:

  • Procedural Issue – what the appellee claims that the lower court did wrong.
  • Substantive Issue – the point of law that is in dispute (legal facts have their own annotation).
  • Legal Roles – the role of the parties in the case:

  • Appellee, Appellee’s Lawyer, Appellant, Appellant’s Lawyer, Defendant, Defendant’s Lawyer, Plaintiff, Plaintiff’s Lawyer.
  • General – buyer/seller, employer/employee, landlord/tenant, etc.
  • Other – relevant information not covered by the other annotations.
    Procedural History – the disposition of the case with respect to the lower court(s):

  • Appeal Information – who appealed and why they appealed.
  • Damages – the damages awarded by the lower court.
  • Lower Court Decision – the lower court’s decision.
  • Reasoning Outcomes – various parts of the legal decision:

  • Concurring Opinion.
  • Dicta – commentary about the judgement and holding, but not part of the rationale.
  • Dissenting Opinion.
  • Holding – the rule of law or legal principle that was applied in making the judgement. You can think about this as the new ground that the court is covering in this case. What legal rule(s) is the court developing or clarifying? The case can have more than one holding if there is more than one legal rule being considered. Note that a holding from a cited precedent is to be considered part of the rationale.
  • Judgement – Given the holding and the corresponding rationale for the holding, the judgement is the court’s final decision about the rights of the parties, the court’s response to a party’s request for relief, and bearing on prior decisions (e.g. affirmed, reversed, remanded, etc.).
  • Rationale – the court’s analysis of the issues and the reasons for the holding.
  • — Strategic Phases —
    From previous experience and following discussions, we believe it is best if the annotations are grouped together and done in three phases. This allows the annotator to do simpler tasks first and to keep in mind a subset of the relevant annotations.

  • Phase I: Indexes and Legal Roles
  • Phase II: Procedural History and Reasoning Outcomes
  • Phase III: Facts and Issues
  • For the time being, we are not attending to annotations of Arguments for Party and Other.
    — Collaborate —
    Take a look at the instructional video below. If you wish to collaborate on the task, send an email to Adam Wyner –
    In the email, please include brief information for:

  • Your name
  • Your professional affiliation, e.g. institution, company, firm…
  • Your role where you work
  • Your background as a legal professional
  • This will help us know who we are collaborating with; from the pool of candidates, we will select participants for this early study.
    You will be sent a user name and password so you can login to Teamware.
    We respect your privacy. We are only interested in data in the aggregate and will not reveal any personal data to third parties.
    — Next —
    We have an instructional video that you can open in a new tab or window and that uses QuickTime. It lasts about 14 minutes. This will give you a good idea of what you will be doing. The presenter is Adam Wyner. You can see this here:

    Or follow the link on YouTube — Crowdsourcing Legal Case Annotation Instructional Video. Please view in a large (ok definition) or full screen (grainy definition) mode, which may need to be reloaded in YouTube.
    There are additional points about using the tool in section below on questions, problems, and observations.
    After reading this blog, viewing the instructional video, and receiving your username and password, you can login to begin annotating at — GATE Teamware
    — Survey —
    When you are done with your task, please answer the questions on the survey to give us feedback on your experience using the annotation tool. The survey is available below. You can scroll down and answer the questions. Don’t forget to hit the “Done” button to submit your responses, which will be very useful in helping us understand your experience and thoughts about using the tool:

    Create your free online surveys with SurveyMonkey, the world’s leading questionnaire tool.

    — What Then? —
    We analyse the annotations from several annotators, comparing and contrasting them (interannotator agreement). This will show us similarities and differences in the understanding of the annotations and cases. As well, the results will help us develop a Gold Standard Corpus of legal cases, which are annotations of cases that annotators agree on. A Gold Standard is essential for information extraction and the development of advanced processing. We will publicly report the analysis of the exercise and make the annotated cases publicly available for re-use.
    Once we have a better sense of how this study goes, we plan to roll out a larger version with more cases. And this is only the start….
    — Questions, Problems, and Observations —
    Thanks to participants for letting us know about their problems and sending their observations.
    How easy is it to learn to use the tool? Take a look at the video to get a sense of this. With a little bit of practice, it is rather straightforward.
    What if I don’t agree with some of your annotations or features? Write a comment or send us an email, and we will consider your comment. Try to be as clear and specific as you can. We are not lawyers, and we are dealing with a global community with local variation, so it is likely there will be some disagreement and variation.
    Can I get the results of my annotations? Our approach is to make individual contributions to the whole. So, you will be able to access annotated cases after the exercise. There will be further information on how to work with the material.
    How many cases must I do? You can do one or you can do as many as we have (not many in the beta project).
    How much time will it take? About as long as it would take you to do a similar highlighting and annotation task with paper and markers.
    What if I have a problem with using the tool or if the tool is buggy? Be patient and try to work with the tool. Sometimes things go wrong. Write a comment or send us an email, and we will try to advise. Note – we are only consumers of GATE Teamware, so are not responsible for the system.
    How thoroughly should I annotate the cases? The more cases that are annotated fully and accurately, the better. Apply the same diligence as you would to thoroughly and carefully analyse cases with pen and paper. As you will be the beneficiary of the work of others, so too should you work to benefit them.
    Do we track good annotators and bad annotators? We are interested in data in the aggregate, and are only interested in interannotator agreement and disagreement. This information will help us better understand differences in how the cases are understood and annotated. But, we can see how much time each person takes with each annotation task and measure how they perform against other annotators or a gold standard. If we have bad annotators, we will see this in the results; we would contact the annotator and see how best to improve the situation. As we noted above, we are not sharing information with third parties.
    I cannot login with the username and password. Please let me know if you have this problem, and I will look into it.
    I can login, but I cannot get the java webstart file to start. This is a tough problem to address over the internet. Some people have no problem, but some people are. Please let me know if you have this problem. Do check that you have followed the instructions (on blog and in movie).
    I can login and start the annotation tool, but I cannot get the task. Please let me know, and I will look into it.
    The text is too small and single spaced. At the moment, there is nothing we can do about this. We’ll try to keep this in mind for the future.
    The highlighting tool is not easy to use. When I want to move from one annotated text to some new text, the tool doesn’t move to the new text. This is bit of a problem with the tool, which is not entirely reliable in the functionality. Try to play around with this to see what works for you. One strategy that I have found that improves performance is to annotate something. Then the annotation types appears in the upper right hand corner window among the list of annotations. Sometimes it is a good idea, when the problem occurs, is to click the annotations in that upper right hand corner window off and on (toggle them on and off). This seems to clear the system a bit so that one can go on to the next annotation. Give this a try. If you have problems, please let me know.
    I found it very challenging. It is important to us to know this information to gauge how much text and the variety of annotations. We might reduce the number of annotations, breaking up the whole set into parts of the overall task.
    Decision date is more important than hearing date, or at least should be provided in addition to hearing date. Probably this will be added to future iterations.
    A participant, e.g. “Cone”, was originally a defendant, but was dismissed out before this appeal. I wonder if he should still be coded as “Defendant” or if he should be coded as an other role-holder. Good observation. I’ll have to consult with some lawyers further about this point.
    There are sentences where the court introduced a fact and also appeared to reason using it. Is it right to code the whole sentence both as a legal fact and as a rationale. Yes, this is the way to handle this. Double annotations are always possible.
    A similar problem occurred where the court offered a fact but also put a gloss on it as to its legal significance. Double annotations are always possible.
    Some of the names of the categories were confusing or unclear. For example, using “Holding” for the name of the legal rule or principle was confusing (“Legal Rule” might be more intuitive). This is another point that we will need to consult further with other lawyers. There may also be some variation in terminology.
    There is sometimes unclarity about role-players. A case involved a plaintiff, who was an appellee but also a cross-appellant, and a defendant who was thus an appellant and cross-appellee. These can be coded where on is plaintiff and appellee and the other defendant and appellant. But, they could have both been coded as appellee and appellant, given the existence of the cross appeal. Double (or more) annotating is fine.
    Procedural History/Damages might be better framed as Procedural History/Remedies, as courts often provide injunctive relief or, as in this case, an accounting, as a remedy. This is another point that we will need to consult further with lawyers about terminology.
    What if a case does not state any legal rules? Can implicit legal rules be annotated. For example, where novelty and non-obviousness are a sine qua non of a valid patent, one would not have known to mark some of the sentences as rationales. This isn’t a problem. If something is not in the case, then it is not annotated. We are not (yet) concerned with implicit information. But, if you know the implicit information, then annotate it.
    How can I automatically search for and annotate the same string with the same annotation? In the instructional video, we wanted to keep the material short and to the point, so there are aspects of the annotation tool we did not cover. However, it is tedious to manually search for the same string and annotate it with the same annotation. Teamware’s Annotation Editor has a tool to support automatic search and annotation. To see how to do this, we have the video here:

    How should I annotate holdings which may appear as holdings in cited cases and as part of the procedural history, as holdings in the current case, or as part of the rationale in the current case? This is an interesting and subtle point for us, and we will have to have a full consultation with lawyers to decide. But, for the time being, there can be no harm in multiple annotations, which we can then look at and work with later.
    — Paper —
    If you are interested in some of the ideas behind this project, please see our paper:
    Semantic Annotations for Legal Text Processing using GATE Teamware
    The paper will appear in May 2012 in the Proceedings of the LREC Conference Workshop on Semantic Processing of Legal Texts, Istanbul, Turkey. The exercise here is a version of the exercise proposed in the paper.
    A shortlink to this blog page is:
    — Thanks for collaborating! —
    — If you have any questions, please submit a comment! —

    — Update Note —
    July 29, 2013 to reflect Dan Katz’s amended definitions for Holding. Updated in various ways July 12, 2013. The previous blog post of July 28, 2012 has been updated to note the participation of Dan Katz and his students of Michigan State University.
    — Honour Role —
    For the very first study, we would like to thank the following individuals who gave of their time and intelligence to carry out their tasks.

  • First
  • Second
  • By Adam Wyner

    This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.