Thursday, March 23, 2017

What is law? - Part 3

The corpus of law - the stuff we all, in principle, have access to and all need to comply with, is not, unfortunately a nice tidy bundle of materials managed by a single entity. Moreover, the nature of the bundle itself, differs between jurisdictions. Ireland is quite different from Idaho. Scotland is quite different from the Seychelles. Jersey is quite different from Japan, and so on.

I will focus here on US and UK (Westminister)-style legal corpora to keep the discussion manageable in terms of the diversity. Even then, there are many differences in practice and terminology all the way up and down the line from street ordinances to central government to international treaties and  everything in between. I will use some common terminology but bear in mind that actual terminology and practice in your particular part of the world will very likely be different in various ways, but hopefully not in ways  that invalidate the conceptual model we are seeking to establish.

In general, at the level of countries/states, there are three main sources of law that make up the legal corpus. These are the judiciary, the government agencies and the legislature/parliament.

Let us start with the Legislature/Parliament. This is the source of new laws and amendments to the law in the form of Acts. These start out as draft documents that go through a consideration and voting process before the become actual law. In the USA, it is common for these Acts to be consolidated into a "compendium", typically referred to as "The Statutes" or "The Code". The Statutes are typically  rganized according to some thematic breakdown into separate "titles" e.g. Company Law, Environmental Law and so on. In the UK, the government itself does not produce thematic compendia.

Instead, the Acts are a cumulative corpus. So, to understand, for example, criminal law, it may be necessary to look at many different Acts, going back perhaps centuries to get the full picture of the "Act" actually in force. In UK-style systems, areas of law may get consolidated periodically in through the creation of so-called "consolidations"/"re-statements". These essentially take an existing set of Acts that are in force, repeal them all and replace them with a single text that is a summation of the individual acts that it repeals.[1]

It is common for third party publishers to step in and help practitioners of particular areas of law by doing unofficial consolidations to make the job of finding the law in a jurisdiction easier.
Depending on how volatile the area of law is in terms of change, the publisher might produce an update every month, every quarter, every year etc. In the USA, most US states do a consolidation in-house in the legislature when  they produce The Statutes. Similar to publishers, this corpus is updated according to a cycle, typically every year or two years.

So here we get to our first interesting complication with respect to being able to access the law emanating from Legislatures/Parliaments that is in force at any time T. It is very likely that no existing compendium produced by the government itself, is fully up to date with respect to time T. There are a number of distinct reasons for this.

Firstly, for Parliaments that do not produce "compendiums", there may not be an available consolidation/re-statement at time T. Therefore, it is necessary to find a set of Acts that were in force at time T, which then need to be read together to understand what the law was at time T.

Secondly, for Legislatures that produce compendia in the form of Statutes, these typically lag behind the Acts by anything from months to years. Typically, when a Legislature is "in session", busily working on new Acts, it is not working on consolidating them as they pass into law. Instead, they are accumulated into a publication, typically called the Session Laws, and the consolidation process happens after the session  has ended.  This is an area where third party publishers typically add value because they do consolidate "on the fly" and this is something that is very useful to many practitioners.

Thirdly, the concept of "in force" is quite tricky in practice. An Act may become law as soon as it passes through a signing process but the law itself may not take effect until some other event has happened. Typically there is some form of official government publication - register/gazette - and laws come into force when they appear in the register/gazette. Through a device called a "line item veto" it may be that a law comes into force but some parts of it are essentially elided. Trickier still is the concept of conditional legislation which comes into force, if, for example the cost of a barrel of oil hits some threshold value.

Even if it is possible to arrive at the text in force as it stood at Time T, the nature of the text itself has a large role to play in its direct usefulness for practitioners. The clearest example of this is what are known as amendatory acts. An amendatory act, rather than replacing a textual unit with a replacement textual unit, expresses the required changes in terms of amendatory instructions. E.g. "After the first occurrence of the word 'dog', insert 'cat or '". Again, this is an area where third part publishers often step in.

This brings us to a very important point about law that needs to be emphasised and it is this: what the text of the law says at any time T and what the text of  the law means at time T, are two totally different things on a number of levels. At a purely text management level, there is often a big difference between what the law says and what is means because the journey towards true meaning can only start once the editorial aspects of amendment consolidation have taken place and this might not be a function that the government performs at all. Even if it is, it may lag behind the creation of new Acts in a way that impacts its usefulness to practitioners as a definitive reference of the laws in force at any time T.

Once we get past the text management level of 'meaning' in the corpus, we are still only part of the way towards "the law" because the text needs to be read/parsed in order to find the parts of the text are is in force and what is not at any time T. A simple example of this is a so-called "sunset clause" in which the consolidated text of an  area of law as it was at time T may contain a statement which repeals part of the law - potentially somewhere else entirely in the corpus of law! - at some time later than time T.

Are we having fun yet? Complex, isn't it? I will just add a few more layers to it and then we will take a step back, I promise...

Having arrived - by whatever means - at the text of the law as it stood at Time T, it might not be the case that the text has definitive status as "law" , even if it is produced by the government itself. A good example of this is the United State Code[2]. In the world of law, there is the concept of "prima facie evidence of the law" which is distinct from "the law" because the corpus that is the US Code has not itself passed through Congress as a corpus.

A similar nuance comes up in US State Legislatures where the Journals - essentially the meeting minutes of the formal chambers - may be considered by the judiciary as the one true source of new and amended laws. In this way of thinking, even Statutes produced by Legislatures are, in a sense, secondary sources.

Two more wrinkles and then I will stop. I promise. Stay with me here...

The first is that the corpus of Acts in force is not necessarily self consistent. Over the course of hundreds of years and thousands upon thousands of amendments errors can creep in such that a statement in Act A with is "the law" might contradict another statement in Act B which is also "the law". This is another  point where IT people tend to wince! Paradoxes, the law of the excluded middle[3], the entire glorious edifice of boolean logic, is dependent on the absence of
logical contradictions and yet, they can and do happen in law.

When this happens, jurisdictions do not SEGFAULT or go into endless loops or refuse to boot up in the morning. Rather, the legal system exhibits an interesting property that might be referred to as autonomic resolution[4]. Texts that conflict can co-exist in law (perhaps in the form of "unconsolidated statute") alongside consolidated statute, perhaps in the form of separate acts that conflict with each other. The entity that then deals with it is, typically the judiciary where that most ineffable of concepts : "human judgement" resolves the conflict.

Peter Suber[4] has argued that such contradictions cannot be fully eradicated from law. In his book the Paradox of Self Amendment[5], he uses an argument reminiscent of Godel's Incompleteness Theorem[6] to show that any system that can amend itself needs to be able to break out of contradictions/dead ends it might get itself into through the process of amendment.

In a memorable piece of prose[7], he puts it this way:

"One may regret the lapse of law from abstract logic, appreciate the equitable flexibility it affords, take satisfaction in the pretensions it punctures, or decry the dangers it makes possible."

The second, and final wrinkle I will add for now, is the concept of a  retroactive provisions[8]. These beauties have the effect of changing the way the law as it stood at time T needs to be interpreted at some future time T+1. If your head hurts, you are not alone. It is a tough one to grasp. Basically a full understanding of the law as some historical time point T1 is dependent, not just on the corpus as it was at that time point T1 but also, as it was as some future point T2. This is because the law at T2 may contain retroactive changes to how the law at time one needs to be interpreted.

By now you will have noticed that I keep saying "the law at time T". Hopefully, given the discussion so far, you are beginning to get a feel for why the concept of time is so important. Time, the passage of time, its impact on the corpus of law....references to time in the law is inextricably woven into the way law works in my opinion. That is why, I believe any computational model of law must have the concept of time as a first class member of model, to be able to accurately reflect what law really is.

Not convinced about the primary importance of time in the conceptual model of law? Consider this, every single litigation, every single dispute that arrives in a court of law, needs to be able to look backwards to what the law was at the time of the litigation event. The law as it is today is not the point of departure in a court case. It is the law as it was at the date or dates relevant to the case. The nature of court cases is that this can be many years after the events themselves.

Next up regulations/statutory instruments which come from the executive branch i.e. government agencies.


Wednesday, March 22, 2017

What is law? - Part 2

Previously: What is law? - Part 1.

The virtual legal reasoning box we are imagining will clearly need to either contain the data it needs, or be able to reach outside of the box and access whatever data it needs for its legal analysis. In other words, we can imagine the box having the ability to pro-actively reach out and grab legal data from the outside world when it needs it. And/or we can also imagine the box directly storing data so that it does not need to reach out and get it.

This brings us to the first little conceptual maneuver we are going to make in order to  make reasoning about this whole thing a bit easier. Namely, we are going to treat all legal data that ends up inside the box for the legal analysis as having arrived there from somewhere else. In other words, we don't have to split our thinking into stored-versus-retrieved legal data. All data leveraged by the legal reasoning box is, ultimately, retrieved from somewhere else. It may be that for convenience, some of the retrieved data is also stored inside the box but that is really just an optimization - a form of data caching - that we are not going to concern ourselves with at an architectural level as it does not impact the conceptual model.

A nice side effect of this all-data-is-external conceptualization is that it mirrors how the real world of legal decision making in a democracy is supposed to work. That is, the law itself does not have any private data component. The law itself is a corpus of materials available (more on this availability point later!) to all those who must obey the law. Ignorance of the law is no defense.[1]

The law is a body of knowledge that is"out there" and we all, in principle, have access to the laws we must obey. When a human being is working on a legal analysis, they do so by getting the law from "out there" into their brains for consideration. In other words, the human brain acts as a cache for legal materials during the analysis process. If the brain forgets, the material can be refreshed and nothing is lost. If my brain and your brain are both reaching out to find the law at time T, we both - in principle - are looking at exactly the same corpus of knowledge.

I am reminded of John Adams statement that government should be "A government of laws, not of men."[2] i.e. I might have a notion of what is legal and you might have a different notion of what is legal but because the law is "out there" - external to both of us - we can both be satisfied that we are both looking at the same corpus of law which is fully external to both of us. We may well interpret it differently, but that is another matter, we will be returning to later.

I am also reminded of Richard Dworkin's Law as Integrity[3] which conceptualizes law as a corpus that is shared by and interpreted for, the community that creates it. Again, the word "interpretation" comes up, but that is another days work. One thing at a time...

So what actually lives purely inside the box if the law itself does not? Well, I conceptualize
it as the legal analysis apparatus itself, as opposed to any materials consumed by that apparatus. Why do I think of this as being inside and not outside the box? Primarily because it reflects how the real world of law actually works. A key point, indeed a feature, of the world of law, is that it is not based on one analysis box. It is, in fact lots and lots of boxes. One for each lawyer and each judge and each court in a jurisdiction...

Legal systems are structured so that these analysis boxes can be chained together in an escalation chain (e.g. district courts, appeal courts, supreme courts etc.) The decision issued by one box can be appealed to a higher box in the decision-making hierarchy. Two boxes at the same level in the hierarchy might look at the facts of a case and arrive at diametrically opposing opinions. Two judges in the same court, looking at the same case might also come to diametrically different opinions of the same set of facts presented to the court.

This is the point at which most IT people start to furrow their brows because it goes against the grain of most other computational systems that they work on. The law is not a set of predicate calculus rules that can combined in a classical conditional logic system. There are very few black and white predicate functions in law. This is not a bug. It is a feature. This is not a lack of logic either. Rather, it is a different type of logic, known as non-monotonic logic[4]. Just as valid and just as useful and necessary as the Boolean Logic IT people are more familiar with.

We will be returning to this later on. For now, suffice it to say that the logic of the analysis process is considered to be inside the box because it is private in the same way that a human brain is private. I might analyse legal data and arrive at a tentative conclusion and then write down my reasoning for others to see but the explanation may or may not reflect what my brain actually did. Moreover, nobody knows if it actually reflects what my brain actually did. Including me. That's brains for you!

So the analysis logic is inside the box and to a degree hidden from view in the same way that humans cannot look inside brains to see what is actually going on. The law itself is outside the box and not hidden from view. It is a corpus of knowledge that is "out there". The analysis process itself is always tentative in its conclusions. The outcomes of courts are called "opinions" for a reason - not "answers".

The world of law not only tolerates but is actively architected to allow differing interpetations of the same corpus of law. Society  deals with the non-determinism through an escalation process (appeals), majority voting (Many judges, same case, one judge one vote. Majority prevails.) and a repeals process (what is valid law today, might not be valid law tomorrow.)

Again, this is not a bug. It is a feature. It is a feature because as you may have noticed, the world is a messy place full of ambiguity and change and shifting views. Human behavior is a messy thing. Justiice/morality/the common good...these are complex concepts. The legal systems of the world have evolved in order to try to deal with the messy parts. Any computer system that gets involved in this has to engage with the reality that a lot of what sure looks like messy aspects in the world of law - especially to most computer programmers - are there for good reasons. They are not "bugs" to be fixed by getting rid of the English and replacing it with computer code.

Having said that I was going to focus on the data side first, I appear to have drifted off to the algorithm side somewhat. Oh well, best laid plans...

Coming back to the data side now, if all the data required for the legal reasoner is outside the box, then what is it? Where do we get it? Can we actually get at all of it?

We will pick this up in Part 3.


Wednesday, March 15, 2017

What is law? - Part 1

Just about seven years ago now, I embarked on a series of blog posts concerning the nature of legislatures/parliaments. Back then, my goal was to outline a conceptual model of what goes on inside a legislature/parliament in order to inform the architecture of computer systems to support their operation.

The goal of this series of posts is to outline a conceptual model of what law actually is and how it works when it gets outside of the legislatures/parliaments and is used in the world at large.

I think now is a good time to do this because there is growing interest around automation "downstream" of the legislative bodies. One example is GRC - Governance, Risk & Compliance and all the issues that surround taking legislation/rules/regulations/guidance and instantiating it inside computer systems. Another example is Smart Contracts  - turning legal language into executable computer code. Another example is Chatbots such as DoNotPay which encode/interpret legal material in a "consultation" mode with the aid of Artificial Intelligence and Natural Language Processing. Another example is TurboTax and programs like it which have become de-facto sources of interpretation of legal language in the tax field.

There are numerous other fascinating areas where automation is having a big impact in the world of law. Everything from predicting litigation costs to automating discovery to automating contract assembly. I propose to skip over these for now, and just concentrate on a single question which is this:
      If a virtual "box" existed that could be asked questions about legality of an action X, at some time T, what would need to be inside that box in order for it to reflect the real world equivalent of asking a human authority the same question?
If this thought experiment reminds you of John Searle's Chinese Room Argument then good:-) We are going to go inside that box. We are taking with us Nicklaus Wirth's famous aphorism that Algorithms + Data Structures = Programs. We will need a mix of computation (algorithmics) and data structures but let us start with the data sources because it is easiest of two.

What data (and thus data structures) do we need to have inside the box? That is the subject of the next post in this series.

What is law? - Part 2.

Monday, February 27, 2017

Custom semantics inside HTML containers

This article of mine from 2006 (I had to dig it out of the way back machine!) Master Foo's Taxation Theory of Microformats came back to mind today when I read this piece Beyond XML: Making Books with HTML. It is gratifying to see this pattern start to take hold. I.e. leveraging an existing author/edit toolchain rather than building a new one. We do this all the time in Propylon, leveraging off-the-self toolsets supporting flexible XML document models (XHTML, .docx, .odt) but encoding the semantics and the business rules we need in QA/QC pipelines. Admittedly, we are mostly dealing with complex, messy document types like legislation, professional guidance, policies, contracts etc. but then again, if your data set is not messy, you might be better off using a relational database to model your data and use the relational model to drive your author/edit sub-system in the classic record/field-oriented style.

Monday, February 20, 2017

Paper Comp Sci Classics

Being a programmer/systems architect/whatever brings with it a big reading load just to stay current. It used to be the case that this, for me, involved consuming lots of physical books and periodicals. Nowadays, less so because there is so much good stuff online. The glory-days of paper-based publications are never coming back so I think its worth taking a moment to give a shout out to some of the classics.

My top three comp sci books, the ones I will never throw out are:
- The C Programming Language by Kernighan and Ritchie
- Structure and Interpretation of Computer Programs, Abelson and Sussman
- Godel, Escher, Bach, Hofstadter

Sadly, I did dump a lot of classic magazines:-/ Byte, Dr Dobbs, PCW....

Your turn:-)

Friday, January 27, 2017

ChatOps, DevOps, Pipes and Chomsky

ChatOps is an area I am watching closely, not because I have a core focus on DevOps per se, but because Conversational User Interfaces is a very interesting area to me and ChatOps is part of that.

Developers - as a gene pool - have a habit of developing very interesting tools and techniques for doing things that save time down in the "plumbing". Deep down the stack where no end-user ever treads.

Some of these tools and techniques stay there forever. Others bubble up and become important parts of the end-user-facing feature sets of applications and/or important parts of the application architecture, one level beneath the surface.

Unix is full of programs, patterns etc. that followed this path. This is from Doug McIllroy in *1964*

"We should have some ways of coupling programs like garden hose--screw in another segment when it becomes when it becomes necessary to massage data in another way."

That became the Unix concept of a bounded buffer "pipe" and the now legendary "|" command line operator.

For a long time, the Unix concept of pipes stayed beneath the surface. Today, it is finding its way into front ends (graphics editing pipelines, audio pipelines) and into applications architectures (think Google/Amazon/Microsoft cloud-hosted pipelines.)

Something similar may happen with Conversational User Interfaces. Some tough nuts might end up being cracked down in the plumbing layers by DevOps people, for their own internal use, and then bubble up....

The one that springs to mind is that we will need to get to the point where hooking in new sources/sinks into ChatBots doesn't involve breaking out the programming tools and the API documentation. The CUI paradigm itself might prove to be part of the solution to the integration problem.

For example, what if a "zeroconf" for any given component was that you could be guaranteed to be able to chat to it - not with a fully fledged set of application-specific dialog commands, but with a basis set of dialog components from which a richer dialog could be bootstrapped.

Unix bootstrapped a phenomenal amount of integration power from the beautifully simple concept of standard streams for input, output and error. A built-in lingustic layer on top of that for chatting about how to chat, is an interesting idea. Meta chat. Talks-about-talks. That sort of thing.

Dang, just as Chomsky's universal grammar seems to be gathering dissenters...:-)

Wednesday, December 21, 2016

The new Cobol, the new Bash

Musing, as I do periodically, on what the Next Big Thing in programming will be, I landed on a new (to me) thought.

One of the original design goals of Cobol was English-like nontechnical readability. As access to NLP and AI continues to improve, I suspect we will see a fresh interest in "executable pseudo-code" approaches to programming languages.

In parallel with this, I think we will see a lot of interest in leveraging NLP/AI from chat-bot CUI's in programming command line environments such as the venerable bash shell.

It is a short step from there I think, to a read-eval-print loop for an English-like programming environment that is both the programming language and the operating system shell.