I have a new article in Legal Technology:
Text-Mining Case Law
This article focuses on text-mining in the case base.
The Text Retrieval Conference (TREC) is an annual workshop on text retrieval from large text collections. It is sponsored by the National Institute of Standards and Technology, which is an agency of the US Commerce Department and started in 1992. In The goal of the legal track is to develop search technology that meets the needs of lawyers to engage in effective discovery in digital document collections. In 2006, a legal track was added to the conference, and there have been annual tracks the last three years.
The stated goal of the legal track is to develop text search technology to help lawyers discover information in digital document corpora. Papers from the track are published as part of the proceedings of TREC.
In the legal track, researchers are set a variety of tasks and topics among which they can choose to apply their search techniques. Let’s consider one, which was proposed in 2008 and continued in 2009, the Interactive Task, for which we have task guidelines and topics from 2008.
For 2008, the task is to search for documents that relate to topic, which is a single 16 page class action complaint that a tobacco company committed fraud, from among a document population of nearly 7 million documents of the Legacy Tobacco Document Library of legal case documents involving US tobacco companies. There are a wide range of document genres. The task is to realistically model the way that lawyers develop and refine their searches in the course of the discovery phase of litigation; that is, participants must retrieve a set of documents ‘relevant’ to what ought to be discovered concerning the topic. In the discovery phase, the parties to the suit request material (documents and evidence) concerning the case; e-discovery is the discovery phase involving electronic documents. The task is intended to be more ‘realistic’ in that it allows participants to engage an expert so as to better define the set of documents that are relevant to the topic. Here ‘relevant’ means that the participants recover the same set of documents (from the set of documents available) that a lead litigating attorneys would select; thus, the interaction with an expert who helps define relevance. The success of the participants searches are measured in terms of recall, precision, and a ‘summary measure of effectiveness’.
Discovery is a key phase of litigation, concerning the identification of information that is important to the litigators in arguing the case. However, we may consider whether it is central to legal argument itself; the evidence discovered is used in arguing the case as evidence for one claim or another, but it is unclear how distinct this is from any sort of argument where evidence is crucial. For example, in a scientific context, one might argue that a certain protein functions in a certain way to impede cancer growth, then search in the document space for supporting evidence. In other words, there are questions concerning how the task in the Legal Track bears on specifically legal reasoning such as case based reasoning, factor analysis, precedent, and grounding decisions in the law. This would be a rather different and very worthwhile task for the TREC Legal Track.
The TREC Legal Track is very closely related to workshops on e-discovery/e-disclosure DESI which are organised by many of the people involved in the TREC Legal Track.
Copyright © 2009 Adam Wyner
At Wired News, there is a long post about opening up government data under the Obama government. Among other topics, we find mentioned making legal information more accessible.
Open Up Government Data
The discussion here is almost entirely focused on gaining access to the data, not how information can be or will be extracted from the data. The website is in a wiki format, which means that various people can contribute to the development of the site.