Research on Argumentation at the Leibniz Center for Law in Amsterdam

I have a 3 month research job at the Leibniz Center for Law, University of Amsterdam starting February 1 and working with Tom van Engers. This is part of the IMPACT project:

IMPACT is an international project, partially funded by the European Commission under the 7th framework programme. It will conduct original research to develop and integrate formal, computational models of policy and arguments about policy, to facilitate deliberations about policy at a conceptual, language-independent level. To support the analysis of policy proposals in an inclusive way which respects the interests of all stakeholders, research on tools for reconstructing arguments from data resources distributed throughout the Internet will be conducted. The key problem is translation from these sources in natural language to formal argumentation structures, which will be input for automatic reasoning.

My role will be to set up a Ph.D. research project concerning the key problem. This is based on an unsuccessful larger research proposal that I made with Tom. I’ll be organising the database, the literature, some of the software, and outlining the approach the student would take. I’ll make notes on the progress as it happens.
I’m looking forward to living for a while in Amsterdam, working with Tom and my other colleagues at the center — Joost Breuker, Rinke Hoekstra, Emile de Maat. The Netherlands also has a very lively Department of Argumentation Theory. As an added bonus, my colleagues from Linguistics, Susan Rothstein and Fred Landman, are in Amsterdam on sabbatical. Will be a very interesting and fun period.

Natural Language Processing Techniques for Managing Legal Resources on the Semantic Web — Tutorial Slides

I gave a tutorial on natural language processing for legal resource management at the International Conference on Legal Information Systems (JURIX) 2009 in Rotterdam, The Netherlands. The slides are available below. Comments welcome.
The following people attended:

  • Andras Forhecz, Budapest University of Technology and Economics, Hungary
  • Ales Gola, Ministry of Interior of Czech Republic
  • Harold Hoffman, University Krems, Austria
  • Czeslaw Jedrzejek, Poznan University of Technology, Poland
  • Manuel Maarek, INRIA Grenoble, Rhone-Alpes
  • Michael Sonntag, Johannes Kepler University Linz, Austria
  • Vit Stastny, Ministry of Interior of Czech Republic

I thank the participants for their comments and look forward to continuing the discussions which we started in the tutorial.
At the link, one can find the slides. Comments are very welcome. The file is 2.2MB. The slides were originally prepared using Open Office’s Impress, then converted to PowerPoint.
Natural Language Processing Techniques for Managing Legal Resources on the Semantic Web
There is a bit more in the slides than was presented at the tutorial, covering in addition ontologies, parsers, and semantic interpreters.
In the coming weeks, I will make available additional detailed instructions as well as gazetteers and JAPE rules. I also plan to continue to add additional text mining materials.
By Adam Wyner
Distributed under the Creative Commons
Attribution-Non-Commercial-Share Alike 2.0

Annotating Rules in Legislation

Over the last couple of months, I have had discussions about text mining and annotating rules in legislation with several people (John Sheridan of The Office of Public Sector Information, Richard Goodwin of The Stationery Office, and John Cyriac of Compliance Track). While nothing yet concrete has resulted from these discussions, it is clearly a “hot topic”.
In the course of these discussions, I prepared a short outline of the issues and approaches, which I present below. Comments, suggestions, and collaborations are welcome.
Vision, context, and objectives
One of the main visions of artificial intelligence and law has been to develop a legislative processing tool. Such a tool has several related objectives:

      [1.] To guide the drafter to write well-formed legal rules in natural language.
      [2.] To automatically parse and semantically represent the rules.
      [3.] To automatically identify and annotate the rules so that they can be extracted from a corpus of legislation for web-based applications.
      [4.] To enable inference, modeling, and consistency testing with respect to the rules.
      [5.] To reason with respect to domain knowledge (an ontology).
      [6.] To serve the rules on the web so that users can use natural language to input information and receive determinations.

While no such tool exists, there has been steady progress on understanding the problems and developing working software solutions. In early work (see The British nationality act as a logic program (1986)), an act was manually translated into a program, allowing one to draw inferences given ground facts. Haley is a software and service company which provides a framework which partially addresses 1, 2, 4, and 6 (see Policy Automation). Some research addresses aspects of 3 (see LKIF-Core Ontology). Finally, there are XML annotation schemas for legislation (and related input support) such as The Crown XML Schema for Legislation and Akoma Ntoso, both of which require manual input. Despite these advances, there is much progress yet to be made. In particular, no results fulfill [3.].
In consideration of [3.], the primary objective of this proposal is to use the General Architecture for Text Engineering (GATE) framework in order to automatically identify and annotate legislative rules from a corpus. The annotation should support web-based applications and be consistent with semantic web mark ups for rules, e.g. RuleML. A subsidiary objective is to define an authoring template which can be used within existing authoring applications to manually annotate legislative rules.
Benefits
Attaining these objectives would:

  • Support automated creation, maintenance, and distribution of rule books for compliance.
  • Contribute to the development of a legislative processing tool.
  • Make legislative rules accessible for web-based applications. For example, given other annotations, one could identify rules that apply with respect to particular individuals in an organisation along with relevant dates, locations, etc.
  • Enable further processing of the rules such as removing formatting, parsing the content of the rules, and representing them semantically.
  • Allow an inference engine to be applied over the formalised rule base.
  • Make legislation more transparent and communicable among interested parties such as government departments, EU governments, and citizenry.

Scope
To attain the objectives, we propose the following phases, where the numbers represent weeks of effort:

  • Create a relatively small sample corpus to scope the study.
  • Manually identify the forms of legislative rules within the corpus.
  • Develop or adapt an annotation scheme for rules.
  • Apply the analysis tools of GATE and annotate the rules.
  • Validate that GATE annotates the rules as intended.
  • Apply the annotation system to a larger corpus of documents.

For each section, we would produce a summary of results, noting where difficulties are encountered and ways they might be addressed.
Extending the work
The work can be extended in a variety of ways:

  • Apply the GATE rules to a larger corpus with more variety of rule forms.
  • Process the rules for semantic representation and inference.
  • Take into consideration defeasiblity and exceptions.
  • Develop semantic web applications for the rules.

By Adam Wyner
Distributed under the Creative Commons
Attribution-Non-Commercial-Share Alike 2.0

Instructions for GATE's Onto Root Gazetteer

In this post, I present User Manual notes for GATE’s Onto Root Gazetteer (ORG) and references to ORG. In Discussion of GATE’s Onto Root Gazetteer, I discuss aspects of Onto Root Gazetteer which I found interesting or problematic. These notes and discussion may be of use to those researchers in legal informatics who are interested in text mining and annotation for the semantic web.
Thanks to Diana Maynard, Danica Damljanovic, Phil Gooch, and the GATE User Manual for comments and materials which I have liberally used. Errors rest with me (and please tell me where they are so I can fix them!).
Purpose
Onto Root Gazetteer links text to an ontology by creating Lookup annotations which come from the ontology rather than a default gazetteer. The ontology is preprocessed to produce a flexible, dynamic gazetteer; that is, it is a gazetteer which takes into account alternative morphological forms and can be added to. An important advantage is that text can be annotated as an individual of the ontology, thus facilitating the population of the ontology.
Besides being flexible and dynamic, some advantages of ORG over other gazetteers:

  • It is more richly structured (see it as a gazetteer containing other gazetteers)
  • It allows one to relate textual and ontological information by adding instances.
  • It gives one richer annotations that can be used for further processes.

In the following, we present the step by step instructions for ‘rolling your own’, then show the results of the ‘prepackaged’ example that comes with the plugin.
Setup
Step 1. Add (if not already used) the Onto Root Gazetteer plugin to GATE following the usual plugin instructions.
Step 2. Add (if not already used) the Ontology Tools (OWLIM Ontology LR, OntoGazetteer, GATE Ontology Editor, OAT) plugin. ORG uses ontologies, so one must have these tools to load them as language resources.
Step 3. Create (or load) an ontology with OWLIM (see the instructions on the ontologies). This is the ontology that is the language resource that is then used by Onto Root Gazetteer. Suppose this ontology is called myOntology. It is important to note that OWLIM can only use OWL-Lite ontologies (see the documentation about this). Also, I succeeded in loading an ontology only from the resources folder of the Ontology_Tools plugin (rather than from another drive); I don’t know if this is significant.
Step 4. In GATE, create processing resources with default parameters:

  • Document Reset PR
  • RegEx Sentence Splitter (or ANNIE Sentence Splitter, but that one is likely to run slower
  • ANNIE English Tokeniser
  • ANNIE POS Tagger
  • GATE Morphological Analyser

Step 5. When all these PRs are loaded, create a Onto Root Gazetteer PR and set the initial parameters as follows. Mandatory ones are as follows (though some are set as defaults):

  • Ontology: select previously created myOntology
  • Tokeniser: select previously created Tokeniser
  • POSTagger: select previously created POS Tagger
  • Morpher: select previously created Morpher.

Step 6. Create another PR which is a Flexible Gazetteer. At the initial parameters, it is mandatory to select previously created OntoRootGazetteer for gazetteerInst. For another parameter, inputFeatureNames, click on the button on the right and when prompt with a window, add ‘Token.root’ in the provided text box, then click Add button. Click OK, give name to the new PR (optional) and then click OK.
Step 7. To create an application, right click on Application, New –> Pipeline (or Corpus Pipeline). Add the following PRS to the application in this order:

  • Document Reset PR
  • RegEx Sentence Splitter
  • ANNIE English Tokeniser
  • ANNIE POS Tagger
  • GATE Morphological Analyser
  • Flexible Gazetteer

Step 8. Run the application over the selected corpus.
Step 9. Inspect the results. Look at the Annotation Set with Lookup and also the Annotation List to see how the annotations appear.
Small Example
The ORG plugin comes with a demo application which not only sets up all the PRs and LRs (the text, corpus, and ontology), but also the application ready to run. This is the file exampleApp.xgapp, which is in resource folder of the plugin (Ontology_Based_Gazetteer). To start this, start GATE with a clean slate (no other PRs, LRs, or applications), then Applications, then right click to Restore application from file, then load the file from the folder just given.
The ontology which is used for an illustration is for GATE itself, giving the classes, subclasses, and instances of the system. While the ontology is loaded along with the application, one can find it here. The text is simple (and comes with the application): language resources and parameters.
FIGURE 1 (missing at the moment)
FIGURE 2 (missing at the moment)
One can see that the token “language resources” is annotated with respect to the class LanguageResource, “resources” is annotated with GATEResource, and “parameters” is annotated with ResourceParameter. We discuss this further below.
One further aspect is important and useful. Since the ontology tools have been loaded and a particular ontology has been used, one can not only see the ontology (open the OAT tab in the window with the text), but one can annotate the text with respect to the ontology — highlight some text and a popup menu allows one to select how to annotate the text. With this, one can add instances (or classes) to the ontology.
Documentation
One can consult the following for further information about how the gazetteer is made, among other topics:

Discussion
See the related post Discussion of GATE’s Onto Root Gazetteer.
By Adam Wyner
Distributed under the Creative Commons
Attribution-Non-Commercial-Share Alike 2.0

Meeting with John Sheridan on the Semantic Web and Public Administration

I met today with John Sheridan, Head of e-Services, Office of Public Sector Information, The National Archives, located at the Ministry of Justice, London, UK. Also at the meeting was John’s colleague Clare Allison. John and I had met at the ICAIL conference in Barcelona, where we briefly discussed our interests in applications of Semantic Web technologies to legal informatics in the public sector. Recently, John got back in contact to talk further about how we might develop projects in this area.
Perhaps most striking to me is that John made it clear that the government (at least his sector) is proactive, looking for research and development projects that make government data available and usable in a variety of ways. In addition, he wanted to develop a range of collaborations to better understand the opportunities the Semantic Web may offer.
As part of catching up with what is going on, I took a look around the web for relatively recent documents on related activities.

In our discussion, John gave me an overview of the current state of affairs in public access to legislation, in particular, the legislative markup and API. The markup is intended to support publication, revision, and maintenance of legislation, among other possibilities. We also had some discussion about developing an ontology of goverment which would be linked to legislation.
Another interesting dimension is that John’s office is one of a few that I know of which are actively engaged to develop a knowledge economy partly encouraged by public administrative requirements and goals. Others in this area are the Dutch and the US (with xml.gov). All very promising and discussions well worth following up on.
Copyright © 2009 Adam Wyner

Session I of "Automated Content Analysis and the Law" Workshop

Today is session I of the NSF sponsored workshop on Automated Content Analysis and the Law. The theme of today’s meeting is the state of judicial/legal scholarship in order to:

  • Identify the theoretical and substantive puzzles in legal and judicial scholarship which might benefit from automated content analysis
  • Discuss the kinds of data/measures that are required to address these puzzles which automated content analysis could provide.

Further comments later in the day after the session.
–Adam Wyner
Copyright © 2009 Adam Wyner

NSF sponsored workshop: Automated Content Analysis and the Law

I was invited to participate in an NSF ­Sponsored Workshop
 Automated Content Analysis and Law, August 3 and 4 at NSF HQ in Arlington, VA and organised by Georg Vanberg (UNC).
There are two sessions planned. The first session will focus on identifying the theoretical/substantive puzzles in legal and judicial scholarship that might benefit from automated content analysis as well as what data and measurements are required. For the second session, the focus is on the state of automated content analysis/natural language processing, exploring the extent to which current technology is relevant to providing results with respect to issues raised in the first session and what might be needed.
There is an interesting mix of people, with a strong emphasis on legal scholarship bearing on the US Supreme Court and opinion mining. I had an email exchange with Georg, the workshop organiser about this, and we agree that attention ought to turn from the Supreme Court to lower levels of the legal system. I also suggested that participants consider some of the following points which bear on the motives and objectives of these lines of research in terms of who is being served and how the data or conclusions would be used.
Questions for Discussion

  • What sorts of artifacts and technologies (if any) will emerge from the research?
  • How does the research relate to the Semantic Web?
  • What public service does the research provide or support?
  • How does this research relate to:
    • E-discovery
    • Textual legal case based reasoning
    • Legislative XML Markup
    • Other research communities e.g. ICAIL and JURIX

Participants

  • Scott Barclay (NSF) – Barclay@uamail.albany.edu
  • Cliff Carrubba (Emory) – ccarrub@emory.edu
  • Skyler Cranmer (UNC) – skylerc@email.unc.edu
  • Barry Friedman (NYU)- friedmab@juris.law.nyu.edu
  • Susan Haire (NSF) – shaire@nsf.gov
  • Lillian Lee (Cornell) – llee@cs.cornell.edu
  • Jimmy Lin (Maryland) – jimmylin@umd.edu
  • Stefanie Lindquist (Texas) – SLindquist@law.utexas.edu
  • Will Lowe (Nottingham) – will.lowe@nottingham.ac.uk
  • Andrew Martin (Wash U) – admartin@wustl.edu
  • Wendy Martinek (NSF) – wemartin@nsf.gov
  • Kevin McGuire (UNC) – kmcguire@unc.edu
  • Wayne McIntosh (Maryland) – wmcintosh@gvpt.umd.edu
  • Burt Monroe (Penn State) – blm24@psu.edu
  • Kevin Quinn (Harvard) – kevin_quinn@harvard.edu
  • Jonathan Slapin (Trinity College) – jonslapin@gmail.com
  • Jeff Staton (Emory) – jkstato@emory.edu
  • Georg Vanberg (UNC) – gvanberg@unc.edu
  • Adam Wyner (University College London) – adam@wyner.info

Legal Taxonomy

Introduction
In this post, I comment on Sherwin’s recent article Legal Taxonomy in the journal Legal Theory. It is a very lucid, thorough, and well-referenced discussion of the state-of-the-art in taxonomies of legal rules. By considering how legal taxonomies organise legal rules, we better understand current conceptions of legal rules by legal professionals. My take away message from the article is that the analysis of legal rules could benefit from some of the thinking in Linguistics and Computer Science, particularly in terms of how data is gathered and analysed.
Below, I briefly outline ideas concerning taxonomies and legal rules. Then, I present and comment on the points Sherwin brings to the fore.
Taxonomies
Taxonomy is the practice and science of classification of items in a hierarchical IS-A relationship, where the items can be most anything. The IS-A relationship is also understood as subtypes or supertypes. For example, a car is a subtype of vehicle, and a Toyota is a subtype of car; we can infer that a Toyota is a subtype of vehicle. Each subtype has more specific properties than the supertype. In some taxonomies, one item may be a subtype of several supertypes; for example, a car is both a subtype of vehicle and a subtype of objects made of metal, however, not all vehicles are made of metal, nor are all things made from metal vehicles, which indicates that these types are distinct. Taxonomies are more specific than the related term ontologies, for which a range of relationships beyond the IS-A relationship may hold among the items such as is owned by or similar. In addition, ontologies generally introduce properties of elements in the class, e.g. colour, engine type, etc. Classifications in scientific domains such as Biology or Linguistics is intensely debated and revised. It would be expected that this would be even more so true in the legal domain which is comprised of intellectual evidence rather than empirical evidence as in the physical sciences and where the scientific method is not applied.
Legal Rules
First, let us be clear about what a legal rule is with a clear example following Professor David E. Sorkin’s example . A legal rule is a rule which determines whether some proposition holds (say of an individual) contingent on other propositions (the premises). For example, the state of Illinios assault statute specifies: “A person commits an assault when, without lawful authority, he engages in conduct which places another in reasonable apprehension of receiving a battery.” (720 ILCS 5/12-1(a)). We can analyse this into the legal rule:

    A person commits assault if

      1. the person engages in conduct;
      2. the person lacks lawful authority for the conduct;
      3. the conduct places another in apprehension of receiving a battery; and
      4. the other person’s apprehension is reasonable.

Optimally, each of the premises in a rule should be simple and be answerable as true or false. In this example, where all four premises are true, the conclusion, that the person committed assault, is true.
There are significant issues even with such simple examples since each of the premises of a legal rule may itself be subject to further dispute and consideration; the premises may be subjective (e.g. was the conduct intentional), admit degrees of truth (e.g. degree of emotional harm), or application of the rule may be subject to mitigating or aggravating circumstances. The determination of the final claim follows the resolution of these subsidiary disputes and considerations. In addition, some legal rules need not require all of the premises to be true, but allow a degree of counterbalancing evaluation of the terms.
The Sources of Legal Rules
Sherwin outlines the sources of the rules:

      Posited rules, which are legal rules as explicitly given by a legal authority such as a judge giving a legal decision.
      Attributed rules, which are legal rules that are drawn from a legal decision by a legal researcher rather than by a legal authority in a decision. The rule is implicit in the other aspects of the report of the case.
      Ideal rules, which are rules that are ‘ideal’ relative to some criteria of ideality, say morally or economically superior rules.

Purposes of Classification
In addition, we have the purposes or uses of making a classification of legal rules.

      Facilitating the discussion and use of law.
      Supporting the critical evaluation of law
      Influencing legal decision-making

In the first purpose, the rules are sorted into classes, which helps to understand and manage legal information. In Sherwin’s view, this is the most basic, formal, and least ambitious goal, yet it relies on having some taxonomic logic in the first place. The second purpose, the rules are evaluated to determine if they are serving the intended purpose as well as to identify gaps or inconsistencies. As Sherwin points out, the criteria of evaluation must then also be determined; however, this then relates to the criteria which guides the taxonomy in the first place, a topic we touch on below. The final purpose is a normative one, where the classification identifies the normal circumstances under which a rule applies, thereby also clarifying those circumstances in which the rule does not apply. Sherwin points out that legal scholars vary in which purpose they find attractive and worth pursuing.
While I can appreciate that some legal scholars might not find the ‘formal’ classification of interest, I view it from a different perspective. First, any claim concerning the normative application of one rule instead of another rest entirely on the intuitive presumption that the rules are clearly different. This is a distinction that the first level can help to clarify. Similar points can be made for other relationships among rules. Second, focusing on the latter stage does not help to say specifically why one rule means what it does and has the consequences as intended; yet surely this is in virtue of the specific ‘content’ of the rule, which again is clarified by a thorough going analysis at the first stage. Third, if there is going to be any progress in applied artificial intelligence and law, it will require the analytical elements defined at the first stage. Fourth, as the study of Linguistics has shown, close scrutiny at the first stage can help to reveal very issues and problems that are fundamental to all higher stages. Fifth, providing even a small, clear sample of legal arguments analysed along other lines of the first stage can give the community of legal scholars a common ‘pool’ of legal arguments to fruitfully consider at the later stages; along these lines, it is notable how few concrete, detailed examples Sherwin’s paper discusses. Not surprisingly, some of the issues Sherwin raises about the purposes of different ‘levels’ of analysis also appear in the linguistic literature. In my view, though the first stage may not be interesting to most legal professionals, there are very good reasons why it should be.
Criteria of Taxonomy
Several different criteria which guide the taxonomy of legal rules are discussed.

      Intuitive similarity: whether researchers claim that two rules are subtypes of one another.
      Evolutionary history: the legal rule is traced in the history of the law.
      Formal classification: the logical relations among categories of the law.
      Function based: a function from the problem to a set of solutions.
      Reason based: the higher-level reasons that explain or justify a rule.

Sherwin criticises judgements based on intuitive similarity since the taxonomers may be relying on false generalisations rather than their own intuitions and that intuition can be arbitrary and without reason. This is also the sort of criticism leveled at large segment of linguistic research and which has been shown to be misleading. Of course, one must watch false classifications and try to provide a justification for classifying one element in one class and not another. One way to do this is, as in psycholinguistics, is to provide tests run over subjects. Another way is to refine the sorts of observations that lead to classifications. In general, all that we currently know about language, from dictionaries, to grammars, to inference rules is based on linguistic intuitions. Some, such as the rules of propositional logic, have been so fixed that they now seem to exist independent of any linguistic basis.
The issue here is somewhat related to classification by formal logical relations. It is unclear what Sherwin thinks logical relations are and how they are applied. What we do have more clarity on are some of the criteria for such a formal taxonomy: accounting for all legal materials, a strict hierarchy, consistent interpretation of classes, and no overlap of categories. This is but one way to consider a formal hierarchy; indeed, there is a separate and very interesting question about what formal model of classification best suits a legal taxonomy. Yet, this issue is not explored in the article.
The function based approach seems to have meta categories. For example, the rule above can be seen as a function from circumstances to a classification of a person as having committed an assault. However, this is not what appears to be intended in Sherwin’s discussion. Rather, there are meta-functional categories depending on higher level problems and solutions. The examples given are Law as a Grievance-Remedial Instrument and Law as an Administrative-Regulatory Instrument. For me, this is not quite as clear as Sherwin makes it appear.
The reason approach organises rules according to an even higher-level of the rule — the justification or explanation of the rule. Some of the examples are that a wrongful harm imposes an obligation for redress, deterring breaches of promises facilitate exchange, or promoting public safety. In my view, these are what people (e.g. Professor Bench-Capon) in AI and Law would call values which are promoted by the legal rule. Sherwin discusses several different ways that reason based classification is done: intended, attributed, and ideal rationales. In my view, the claimed differences are not all that clear or crucial to the classification. In some cases, the rationale of a legal rule is given by the adjudicator. However, where this is not so, the rationale is implicit and must be interpreted, which is to give the intended rationale. In other cases, legal researchers examine a body of law and provide rationales, which is the attributed rationale. In this sense, the intended and attributed rationales are related (both interpreted), but achieved by different methods (study of one case versus study of a body of cases and considerations about the overall purpose of the law). Finally, there are ideal rationales, which set out broad, ideal goals of the legal rule, which may or may not be ‘ideally’ achievable. In this, the difference between intended/attributed and ideal is whether the rationale is analysed out of cases (bottom-up) or provided legislatively (top-down). In the end, the result is similar — legal rules are classified with respect to some rationale. The general problem with any such rationale is just how it is systematically given and itself justified so as to be consistent and not to yield conflicting interpretations of the same legal rule. Finally, Sherwin seems to think that there is some intrinsic conflict or tension between formal classification and reason based classification. I don’t agree. Rather, the difference is in the properties and methods being employed to make the classification, which are not inherently in conflict. Likely, a mixed approach will yield the most insights.
Copyright © 2009 Adam Wyner

Further Considerations on "The End of Lawyers"

In a previous post on Susskind’s The End of Lawyers, I briefly outlined some of the technologies Susskind discusses, then pointed out several fundamental technologies which he does not discuss in depth, but which I believe will have significant impact on the legal profession.
One topic which I did not discuss in that post was why the legal profession is so slow to adopt legal technology. Susskin points out several:

      Billable hours — the legal profession makes money by the hour, which is a disincentive to make legal processes more efficient.
      Conservativism and status — the legal profession has a long and distinguished position which changes slowly.
      Government funding — while governments may recognise the value of legal technologies, investing in them is another matter (though see the recent e-Government awards in the previous post).
      Information availability — only recently have legal documents (legislation, cases, other government information) been made publically and electronically available.

I think (and believe my colleagues in AI and Law would agree) that these are very significant contributing factors in the slow adoption of legal technolgoies by legal professionals, firms, and governments. But, there are others, and I believe that by identifying them, we can then make progress to addressing the problems that they raise.
To help us clarify the issues, we can compare and contrast the legal profession to another very ancient and prestigious profession — medicine. However, doctors, medical organisations, and medical researchers have adopted and advanced technologies very rapidly and on a large scale, yet the technologies are, at their core, similar to those available to legal professions. Therefore, technologically, there is little reason why the legal profession has not also adopted the technologies or more aggressively sought to adapt them.
While there are systems to support reasoning by doctors and medical records filing and retrieval, let me focus on two technologies which are equally available, fundamental, and influential to legal and medical professions — information extraction and ontologies.
In the medical field, there are are large corpora of textual information that must extract relevant information. The corpora are and have been publicly available for some time. There are academic and industry groups that have and develop software systems to extract the information (e.g. National Centre for Text Mining and Linguamatics, among others). Through links, one can find conferences, other groups, and government organisations; the interest is deep, widespread, and of high value. Moreover, medical ontologies are very far advanced such as the Systematised Nomenclature of Medicine Clinical Terms and the Foundational Model of Anatomy among others.
In the legal field, the corpora of textual information is only just beginning to be available. There has been some research on information extraction in the legal field. There has been some work on legal ontologies (e.g. LKIF, which was an EU project that I participated in).
In both areas — information extraction and ontologies — the medical field far outstrips the legal field. Why?
I think the differences are not so much those outlined above; one could argue that medical and legal fields have had, at least historically, similar constraints — the medical field has just overcome them. The most obvious apparent difference is that research medicine has been and continues to be advanced with scientific and technological means. Other research fields — biology, chemistry, statistics, anatomy — made relevant contributions. Moreover, the medical field has large research bodies that are well-funded (e.g. The Wellcome Trust). Finally, the culture of medical research and application of findings is such that information is disseminated, criticised, and verified. Let us put these into four points:

  • Scientific and technological approach
  • Contributions from other fields
  • Research bodies
  • Culture of research

In these respects, the legal field is very different to the medical field. Science and technology have not, until very recently, been relevant in terms of how the law is practiced. While there have been some contributions from other fields (e.g. sociology or psychology), the impact is relatively low. There are research bodies, but they not of the scale or influence of that in medicine. And the disposition of the legal community has been to closely hold information.
I believe that there is single (though very complex) underlying reason for the difference — the object of study. In medicine, the objects of study are physical, whether in chemistry, biology, anatomy, etc; these objects are and have been amenable to scientific study and technological manipulation. In contrast, in law, the object of study is non-physical; one might be tempted to say it is the law itself, but we can be more concrete and say it is the language in which the law is expressed, for at least language is something tangible and available for study.
Thus, the scientific study of language — Linguistics — is relevant. However, Linguistics as a scientific endeavour is relatively young (50 to 100 years, depending on one’s point of view). The technological means to study language can be dated to the advent of the digital computer which could process language in terms of strings of characters. Widespread, advanced approaches to computational linguistics for information extraction is even more recent — 10 to 20 years. Very large corpora and the motives to analyse them arose with the internet. And not only must we understand the language of the law, but we must also understand the legal concepts as they are expressed in law. Here, the study of deontic reasoning, while advanced, is “only” some 20 years old and has found few applications (see my 2008 PhD thesis Wyner 2008 PhD Thesis).
Language is the root of the issue; it can help explain some of the key differences in the application of technology to the legal field In our view, as the linguistic and logical analyses of the language of the law advance, so too will applications, research bodies, and results. However, it is somewhat early days and, in comparison to the medical field, there is much yet to be done.
Copyright © 2009 Adam Wyner

Susskind's "The End of Lawyers" is Part of the Story

Introduction
In this post, I briefly outline Richard Sussking’s background, elements from The End of Lawyers, and then turn to consider issues that Susskind is aware of but does not discuss in depth. These are issues which I believe are fundamental to how technology will impact legal practice such as the semantic web, textual information extraction, ontologies, and open source databases of legal documents.
Background
Susskind specialises in how information and communication technology (ICT) is used by lawyers and public administrators. His website is:
www.susskind.com
Besides the important and general interest of his line of work, its prominence in the community of practicing legal professionals gives us a good indication of the sorts of technologies that community is and is not aware of.
Richard Susskind has been writing about ICT since publication of his PhD thesis Expert Systems in Law (1987, Oxford University Press). He is among the early researchers in Artificial Intelligence and the Law. His subsequent books — The Future of Law and Transforming the Law — developed themes about the relation of ICT and the legal profession, focusing on the ways ICT would change the practice of law and the interactions among lawyers, government administrators, and the public. In addition to the books, Susskind consults widely, is an editor of the journal International Journal of Law and Information Technology, and is a law columnist for The Times. He is very uniquely informed about the technologies that are available and how the legal community regards and uses them. This makes it all the more interesting to draw attention to what he does not discuss in depth.
His recent book The End of Lawyers has garnered a very significant amount of attention, and online excerpts along with comments can be found at:
The End of Lawyers
Legal Technology Tools
In this book, he develops and elaborates his main themes. He points out a range of technologies, briefly outlined below, which will contribute to changing the legal profession. As there is substantial information already on line about his proposals, I will not here repeat them in depth, but to say that by and large I agree with many of the overt points he makes about the applicability of technology to the legal profession as well as why the legal profession has been and remains slow to take up ICT solutions.
Among the key technologies Susskind outlines, we find:

      Automated document assembly — structuring blocks of legal documents.
      Connectivity — email, fax, cell phones, facebook, twitter, blogs.
      Electronic legal marketplace — legal services advertised, rated, and traded.
      E-learning — lawyers and members of the public having the opportunity to learn about the law online.
      Online legal guidance — rather than face-to-face with individual lawyers, a chance to read, learn about the law, have questions addressed at different levels of formality.
      Legal open-sourcing — user generated content, free and unrestricted legal information (e.g. BAILII), legal wikis.
      Closed legal communities — collectives of lawyers, justices, or government officials exchange information.
      Workflow and project management — using software and services to monitor and support the work of legal professionals. This includes case-management and electronic filing.
      Embedded legal knowledge — legal information and knowledge is more readily transparent in daily interactions or prevents non-compliance.
      E-disclosure — finding and processing documents and information relevant to the disclosure phase of a case.
      Online dispute resolution — systems to mediate and support the resolution of disputes.
      Courtroom annotation — transcribing and noting courtroom proceedings manually and automatically.
      Improving access to law — giving citizens more information and advice.

Engineering and Managing Legal Knowledge
In the course of the book, he says that the engineering and management of legal knowledge is central to these technologies, where:

      Legal knowledge management (p. 155) — the systematic organization, standardization, preservation, and exploitation of the collective knowledge of a firm. It is intended to maximize the firm’s return on the combined experience of its lawyers over time.
      Legal knowledge engineer (p. 272): someone who carries out basic analysis, decomposition, standardization, and representation of legal knowledge in computer systems.

However, little is said about how the engineering and management is to be done other than that some of the technologies outlined above contribute to them.
What is said is largely by way of brief references or outlines to additional issues such as the semantic web (p. 68), wikis (but not semantic wikis), online dispute resolution (but little on current developments), and open source legal information (e.g. BAILII, but not WorldLii).
More to the point, there is no discussion of research on key technologies such as:

      Legal ontologies by which legal knowledge is formalised, acquired, processed, and managed.
      XML which underlies the semantic web
      Web-based inference systems
      Textual information extraction which is essential to make use of open source legal information
      Rule-based systems such as provided by Oracle (previously known as Softlaw, RuleBurst, and Haley) which are prominently used by UK tax authorities
      E-government services which go beyond providing information and submission of forms but also allow some interaction such as Parmenides and DEMO-net

These are all topics of central relevance to our blog and to the AI and Law community which organises around the International Conference on AI and Law or Jurix
We agree by and large with Susskind. However, there is much more which would be highly relevant and valuable to draw to the attention of the legal community. Moreover, it would be very valuable to the AI and Law community were his prominent and respected voice in the legal and governmental circles to be heard advocating further for research such as in AI and Law.
Copyright © 2009 Adam Wyner