Thursday, May 04, 2017

What is law? - part 11

Gliding gracefully over all the challenges alluded to earlier with respect to extracting the text level meaning out of the corpus of Law at time T, we now turn to thinking about how it is actually interpreted and utilized by practitioners. To do that, we will continue with our useful invention of an infinitely patient person who has somehow found all of the primary corpus and read it all from the master sources, internalized it, and can now answer our questions about it and feed it back to us on demand.

The first order of business is where to start reading? There are two immediate issues here. Firstly, the corpus is not chronologically accretive. That is, there is no "start date" to the corpus we can work from, even if, in terms of historical events, a foundation date for a state can be identified. The reasons for this have already been discussed. Laws get modified. Laws get repealed. Caselaw gets added. Caselaw gets repealed. New laws get added. I think of it like a vast stormy ocean, constantly ebbing and flowing, constantly adding new content (rainfall, rivers) and constantly loosing content (evaporation) - in an endless cycle. It has no "start point" per se.

In the absence of an obvious start point, some of you may be thinking "the index", which brings us to the second issue. There is no index! There is no master taxonomy that classifies everything into a nice tidy hierarchy. There are some excellent indexes/taxonomies in the secondary corpus produced by legal publishers, but not in the primary corpus.

Why so? Well, if you remember back to the Unbounded Opinion Requirement mentioned previously, creating an index/taxonomy is, necessarily, the creation of an opinion on the "about-ness" of a text in the corpus. This is something the corpus of law stays really quite vague about - on purpose - in order to leave room for interpretation of the circumstances and facts about any individual legal question. Just because a law was originally passed to do with electricity usage in phone lines, does not mean it is not applicable to computer hacking legislation. Just because a law was passed relating to manufacturing processes does not mean it has no relevance to ripening bananas. (Two examples based on real world situations, I have come across by the way.)

So, we have a vast, constantly changing, constantly growing corpus. So big it is literally humanly impossible to read, regardless of the size of your legal team, and there are no finding aids in the primary corpus to help us navigate our way through it....

...Well actually, there is one and it is an incredibly powerful finding aid. The corpus of legal materials is woven together by an amazingly intricate web of citations. Laws invariably cite other laws. Regulations cite laws. Regulations cite regulations. Caselaw cites law and regulations and other caselaw....creating a layer that computer people would call a network graph[1]. Understanding the network graph is key to understanding how practitioners navigate the corpus of law. The don't go page-by-page, or date-by-date, they go citation-by-citation.

The usefulness of this citation network in law cannot be overstated. The citation network helps practitioners to find related materials, acting as a human-generated recommender algorithm for practitioners. The citation networks not only establish related-ness, they also establish meaning, especially in the caselaw corpus. We talked earlier about the open-textured nature of the legal corpus. It is not big on black an white definitions of things. Everything related to meaning is fluid on purpose. The closest thing in law to true meaning is arguably established in the caselaw. In a sense, the caselaw is the only source of information on meaning that really matters because at the end of the day, it does not matter what you or I or anyone else might think a part of the corpus means. What really matters is what the courts say it means. Caselaw is the place you go to find that out.

"But", I hear you say, "graphs do not necessarily have a start point either!". True. But this is where one of the real skills of a lawyer manifests itself. Legal reasoning, is, for the most part (UK/US style), reasoning by analogy. For any given case, a lawyer looks to take the facts, the desired outcome and then seek to make an analogy with a previously adjudicated case so that if the analogy holds up, the desired outcome is achieved by virtue of the over-arching desire of the legal ecosystem to maintain consistency with previous decisions. There is perhaps no other field where formulating the right question is as important as it is in law.

Having constructed an analogy, initial entry points into the corpus of law can be identified and from there, the citation network works it magic to route you through the bottomless seas of content, to the most relevant stuff. The term "most relevant" here is oftentimes signaled by the presence of lots of in-bound citations. I.e. in caselaw, if your analogy brings you to case X and case X has been cited by lots of other cases with the outcome you are looking to achieve, and if case X is still good law (has not been repealed), then case X is a good one to cite in your legal argument.

If this leveraging of the citation network link topology reminds you of Google's original page rank algorithm then you are on the right track. Lawyers, perhaps to the surprise of computer science and math folk, have been leveraging the properties of scale free network graphs[2] for centuries[3].

I said "legal argument" above and this is another critical point in understanding what law actually is and how it works...The corpus of law is not a place you go to find black and white answers to black and white questions. Rather, it is a place you go with an analogy you have formed in order to find arguments for and against your desired outcome from that analogy. It is a form of rhetoric. A form of debate. It is not a form of formulaic application of crisp rules that generate crisp answers.

In short. It is not mathematics in the sense that many computer science folks might initially assume when they hear of talk of "rules" and "decisions" and so on. However it arguably is mathematics in some other ways. Leveraging the citation graph is a very mathematical thing. Weighing up the pros and cons of legal argument strategies often exhibits properties familiar from optimization problems and game theory.

It is in these latter senses of "mathematical" that most of the recent surge in interest in computational law have arisen. In particular, machine learning and neural network-centric approaches to artificial intelligence are re-igniting interest in computational law after an overall disappointing outcome in the Eighties. Back then, rule-centric approaches prevailed and although there have been some noticeable successes in areas such as income tax calculation, rules-based approaches have largely run out of steam in my opinion.

The citation network - and in particular - how the citation network changed over time, is, in my opinion, the key to unlocking computational law. I do not think it is stretching things to state that the citation network is the underlying DNA that holds the world of law together. Rather that seek to replace this DNA - in all its magnificent power and complexity - with nice tidy lego-bricks of conditional logic and data objects, we need to embrace it. Of course it has its flaws. Nothing is perfect. But it is the way it is, for the most part, for good reasons. We will make progress in computational law faster if more computing folk understand the world of law for what it is - as opposed to what they might initially think it is at a high level, or perhaps wish it to be.

I hope this series of blog posts has helped in some small way, to show what it really is. At least, from my perspective which of course, is just one persons opinion. As we have seen in this series of posts on law - "opinion" is as good as it gets in law. Again, finally, this not a bug. It is a feature...In my opinion:-)


Saurabh Shekhar Verma said...

I really enjoyed reading this series of blog posts, found it very informative. Highly appreciated.

Sean McGrath said...

Thanks. I have a lot more I would like to add related to Deep Learning, Smart Contracts, Blockchain and NLP but I will need to take a little while to better formulate my thoughts on these into a series of posts. The rough plan is to close the loop back to John Searle's Chinese Room Argument in a hopefully useful way.

Saurabh Shekhar Verma said...

I am very much looking forward to it, there are so many implicit references to machine learning (for my biased brain almost all), actually these posts can be used to explain how deep learning techniques are performing better than other shallow machine learning techniques. In deep leaning we let the algorithm to make its own representation of the data and then chain of opinions, which should be aligned to the end result which also may be an opinion (that final opinion, human can also make sense of it). Like a big virtual box or a big virtual box containing several other virtual boxes inside.

Though, I thought earlier it was a last blog post of this series. Good to hear that, that there are more.