Friday, April 07, 2017

What is law? - Part 7


Last time we ended with the question : “Given a corpus of law at time T, how can we determine what it all means?”

There is a real risk of disappearing down a philosophical rabbit hole about how meaning is encoded in the corpus of law. Now I really like that particular rabbit hole but I propose that we not go down it here This whole area is best perused, in my experience, with comfy chairs, time to kill and a libation or two (semiotics, epistemolgy and mereotopology anyone?).

Instead, we will simply state that because the corpus of law is mostly written human language it inherits some fascinating and deep issues to do with how written text establishes shared meaning and move on. For our purposes, we will imagine an infinitely patient person with infinite stamina, armed with a normal adults grasp of English, who is going to read the corpus and explain it back to us, so that we computer people can turn it into something else inside a computer system. The goal of that “something else” being to capture the meaning but be easier to work with inside a computer than a big collection of “unstructured” documents.

This little conceptual trick of employing a fantastic human to read the current corpus and explain it all back to us, allows us to split the problem of meaning into two parts. The first part relates to how we could read it in its current form and extract its meaning. The second part relates to how we would encode the extracted meaning in something other than a big collection of unstructured documents. Exploring this second question, will, I believe, help us tease out the issues in determining meaning in the corpus of law in general, without getting bogged down in trying to get machines to understand the current format (lots and lots of unstructured documents!) right off the bat.

I hope that makes sense? Basically, we are going to skip over how we would parse it all out of its current myriad document-form into a human brain and instead look at how we would extract it from said brain and store it again – but into something more useful than a big collection of documents. Assuming we can find a representation that is good enough, the reading of the current corpus should be a one-off exercise because as the corpus of law gets updated, we would update our bright shiny new digital representation of the corpus and never have to re-process all the documents ever again.

So what options do we have for this digital knowledge representation? Surely there is something better than just unstructured document text? Text after all, is what you get if you use computers as typewriters. Computers do also give us search, which is a wonderful addition to typesetting, but understanding is a very different thing again. In order to have machines understand the corpus of law we need a way to represent the knowledge present in the law - not just what words are present (search) or how the words look on the page (formatting).

This is the point where some of you are likely hoping/expecting that I am about to suggest some wonderful combination of XML and Lisp or some such that will fit the bill as a legal corpus knowledge representation alternative to documents... It would be great if that were possible but in my opinion, the textual/document-centric nature of a significant part of the legal corpus is unavoidable for reasons I will hopefully explain. Note that I said “significant part”. There are absolutely components of the corpus that do not have to be documents. In fact, some of the corpus has, already transitioned out of documents but, if anything, this has actually increased the interpretation complexities – of establishing meaning - not reduced them. I will hopefully explain that too:-)

I think the best way of explaining why I think some form of electronic documents is as good as we can hope for, for large parts of the legal corpus, is to look at the things that are not actually part of the corpus of documents at all, but are key to how law actually works. It turns out that these things cannot be put into a computer at all, in my opinion.

What are these mystical things? There are two of them. The first I call the closed world of knowledge (CWoK) and the second I call the Unbounded Opinion Requirement (UOR) of law.

We will look at CwoK and UOR in Part 8.

-->

1 comment:

Saurabh Shekhar Verma said...

Very nice post, I think there is a tiny typo at 5th paragraph 2nd line 2nd word, "that" - "than".