Featured Post

Linkedin

 These days, I mostly post my tech musings on Linkedin.  https://www.linkedin.com/in/seanmcgrath/

Sunday, December 26, 2010

Palfrey on open legislation

"This new legal information architecture must be grounded in a reconceptualization of the public sector’s role and draw in private parties, such as Google, Amazon, Westlaw, and LexisNexis, as key intermediaries to legal information." A new legal information environment for the future.

Monday, December 13, 2010

MicroXML

Yay! I hope this initiative flies. It is long overdue and continues in that that long and largely-successful pattern of evolving standards by *taking stuff out*.
Remember Antoine de St-Expurey who said (in Wind, Sand and Stars):
    "A designer knows s/he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away."

Tuesday, December 07, 2010

Towards mashups based on timelines

Dan Jellinek writes about a very important point Open Data 'Must Add Context'. Context is absolutely king when it comes to interpreting public records which is why KLISS works the way it works.

Lawrence Lessig has also written about the problem of context in his Against Transparency.

We will soon, I hope, get passed the problem of data access. OGD, data.gov, law.gov, legislation.co.uk etc. will see to that.

Then we can move onto addressing the context problem. To do that, we will need to address what is, to my mind, the key missing piece of the Web today: the time dimension of information.

It does not have to be complicated. I would suggest we start with some simple "social contracts" for URI's that contain temporal information. Tim Berners Lee's Cool URI's don't change has been around for many years now and contains what is to my mind the key idea: encoding dates in URIs. e.g. this URI signals the time dimension in its elements: http://www.w3.org/1998/12/01/chairs.

The notion of URIs having structure has been a wee bit controversial (See Axioms but I think its a fine idea :-) Jon Udel is worth reading on this point too.

So, where could a few simple agreements about temporal URI patterns get us?

In two words *timeline mashups*. Today, the majority of mashups are essentially data joins using location as the join point. Imagine a Web in which we can create similar dynamic data expositions but based on time lines. That is the prize we will win if we can get agreements on encoding the temporal dimension of information.

Imagine a world in which we can automatically generate beautiful information displays like this, or this that mashup data from many disparate, independent sources?

Would it be worth the effort? In my opinion, absolutely! It would be a great place to start, yielding huge value for a relatively small effort.

Higher up the effort scale but still very worthwhile would be mechanisms for querying w.r.t. time e.g. Memento and Temporal RDF.

How wonderful would it be if we could then create temporal mashups of temporal mashups? How wonderful would it be if we could create a temporal dimension *on top* of a geo-spatial dimension to create spatial-and-temporal mashups?

As Don Heiman, CITO for the Kansas State Legislature and the visionary behind KLISS likes to say: "be still my heart"...

Monday, November 01, 2010

KLISS Slides

We presented KLISS at last week's Cutter Consortium event in Boston. The slides are online here. The slides cover eDemocracy vision, business strategy, governance etc. as well as the tech parts.

More to come in the weeks and months ahead. Exciting times!

Monday, October 11, 2010

KLISS: Author/edit sub-systems in legislative environments

Last time in this KLISS series, I talked some more about the KLISS workflow model. The time has come (finally!) to talk about how that workflow model incorporates author/edit on the client side i.e. the creation of, or update of, legislative artifacts such as bills, statute sections, chamber events, meeting minutes etc. Earlier on in this series, I explained in reasonable detail, why the author/edit subsystems cannot be simple little text editors and also why they cannot be classic heavy XML editors so I won't go over that ground again here, I'll just cut to the chase about how KLISS actually does it.

Units of information and micro-documents

To my mind, the most important part of modeling a document-centric world such as legislatures for author/edit is decided what the boundaries are between information objects. After all, at some level, some information object needs to be edited. We will need all the classic CRUD functions for these objects so we need to pick them carefully.

When I look at a corpus of legal information I see a fractal in which the concept of "document" exhibits classic self-similarity. Is a journal a document? How about a title of statute? Or a bill? How about the Uniform Commercial Code. Is that a document?

Pretty much any document in a legislature can reasonably be thought of as an aggregation of smaller documents. A journal is an iteration of smaller chamber event documents. A title of statute is an iteration of statute sections. A volume of session laws is an iteration of acts...and so on.

This creates an interesting example of a so called banana problem. How do you know when to stop de-composing a document into smaller pieces?

My rule of thumb is to stop decomposing when the information objects created by the decomposition cease to be very useful in stand-alone form. Sections of statute are useful stand-alone. The second half of a vote record less so. Bills are useful standalone. The enacting clause less so.

The good news is that when you do this information de-composition, the size of the information objects that require direct author/edit support get smaller and they get less numerous. They get smaller because you do not need an editor for titles of statute. A title is what you get after you aggregate lots of smaller documents together. Don't edit the aggregate. Edit the atoms. They get less numerous because the decomposition exposes many shared information objects. For example, the bill amendment may be a document used in chamber but it is also in the journal. Referring a bill to a committee will result in a para in the journal but will also result in an entry in the bill status application...and so on.

In KLISS we generally edit units of information – not aggregates. We have a component that knows how to join together any number of atoms to create aggregates. Moreover, aggregates can be posted into the KLISS time machine where they become atoms, subject to further aggregation. A good example would be a chamber event document that gets aggregated into a journal but the resultant journals are themselves aggregated into a session publication known as the permanent journal.

Semantics and Micro-formats


KLISS makes extensive use of ODF for units of information in the asset repository. We encode metadata as property-value pairs inside the ODF container. We also leverage paragraph and character style names for encoding "block" and "inline" semantics. As discussed previously line and page numbers are often critically important to the workflows and we embed these inside ODF markup too.

The thin client author/edit approach


Some of our units of information are sufficiently small and sufficiently metadata-oriented that we can "author" them using Web-based forms. In other words, the asset will be stored in the time machine as an ODT document but the users author/edit experience of it will be a web form. We make extensive use of django for these.

This is particularly true on the committee/chamber activity side of a legislature where there are a discrete number of events that make up 80% of the asset traffic and the user interface can be made point-and-click with little typing.

The thick client author/edit approach


Some of our units of information are classic word-processor candidates. i.e. a 1500 page bill, a 25 page statute section consisting of a single landscape table with running headers, a 6 level deep TOC with tab leaders and negative first line indents...For these we use a thick client application created using Netbeans RCP which embeds OpenOffice. We make extensive use of the UNO API to automate OpenOffice activities. The RCP container also handles identity management, role based access control and acts as a launchpad for mini-applications – created in Java and/or Jython – that further extend our automation and customization capabilities on the client side.

RESTian time-machine interface


Although we tend to speak of two clients for author/edit in KLISS – the thick client and the thin client – in truth, the set of clients is open-ended as all interaction with the KLISS time machine is via the same RESTian interface. In fact, the KLISS server side does not know what was used to author/edit any document. This gives us an important degree of separation between the author/edit subsystem and the rest of the system. History has shown, that the most volatile part of any software application is the part facing the user. We need to know that we can evolve and create brand new author/edit environments without impacting the rest of the KLISS ecosystem.

Why ODT?


ODT was chosen as it presented the best trade off when all the competing requirements of an author/edit subsystem for legislatures were analyzed. A discussion of the issues on the table when this selection was made are listed here. To that list I would also add that ODT is, to my knowledge, without IP encumberments. Also, the interplay between content and presentation is so important in this domain that it is vital to have free and unfettered access to the rendering algorithms in order to feel fully in possession of the semantics of the documents. I'm not saying that a large corpus of C++ is readily understandable at a deep level but I take great comfort in knowing that I can, in principle, know everything there is to know about how my system has rendered the law by inspecting the rendering algorithms in OpenOffice.

Next up, long term preservation and authentication of legal materials in KLISS.

Thursday, October 07, 2010

ActiveMQ guru required

Looking for an ActiveMQ guru to work 6 month contract out of Lawrence, KS. Possibility for full time post after that for the right person.

Thursday, September 30, 2010

Cutter Consortium Summit, Boston

I will be attending the Cutter Consortium Summit in Boston next month. If you are attending, or in the area, and would like to meet up, let me know.

Friday, September 24, 2010

Thursday, September 16, 2010

Pssst...there is no such thing as an authentic/original/official/master electronic legal text

I know of no aspect of legal informatics this is plagued with more terminology problems than the question of authentic/original/official/master versions of legal material such as bills and statute and caselaw. In this attempt at addressing some of the confusion, I run the risk of adding more confusion but here goes...

1 - The sending fallacy
What would it mean for the Library of Congress to send their Gutenburg Bible to me? Well, they would put it in a box and ship it to me. Afterwards, I would have that instance of the Gutenburg Bible and they would not have it. The total number of instances of the Gutenburg Bible in the world would remain the same. The instance count chez McGrath would increment and the instance count chez LOC would decrement.

If they were to electronically send it to me, there would be no "sending" going on at all. Instead, a large series of replications would happen - from storage medium to RAM to Network Buffers to Routers...culminating in the persistent storage of a brand new thing in the world, namely, my local replica of the bit-stream that the Library of Congress sent (replicated) to me. The instance count chez McGrath would increment and the instance count chez LOC would remain unchanged. I would have mine but they would still have theirs.

Sadly, the word "send" is used when we really mean "replication" and this is the source of untold confusion as it leads us to map physical-world concepts onto electronic world concepts where there is an imperfect fit...Have you ever sent and e-mail. I mean really "sent" an e-mail? Nope.

2 - The signing fallacy
An example of that imperfect fit is the concept of "signing". What would it mean for the Library of Congress to sign their physical copy of the Gutenburg Bible? They could put ink on a page or maybe imprint a page with an official embossing seal or some such. The nature of physical media makes it relatively easy to make the signing tamper-evident and hard to counterfeit.

What would it mean for the Library of Congress to sign their electronic replica of the Gutenburg Bible with PKI and replicate it (see point 1 above) to me? Well, its really very, very different from a physical signing.

It is just more bits. Every replica completes a completely perfect replica of the original "signature". There is no "original" to compare it too. The best you can do is check for "sameness" and check the origin of the replica but doing these checks rapidly becomes a complex web of hashes and certificates and revocations and trusted third parties and...lots of stuff that is not required for physical-world signatures.

3 - The semantics fallacy
What does it mean for me to render a page of my replica of the the Gutenburg Bible on my computer screen? Am I guaranteed to be seeing the "same" thing you see when you do something similar? Does it matter if the file is a TIFF or a Microsoft Word file? Does it matter what operating system I am using or what my current printer is or my screen resolution? Do any of these differences amount to anything when it comes to the true meaning of the page?

The unfortunate fact - as discussed earlier as part of the KLISS series - is that the semantics of information is sometimes a complex mix of the bits themselves and the rendering created from those bits by software.

Sometimes - for sure - the different renderings have no impact on meaning but it is fiendishly difficult to find consensus on where the dividing line is. Moreover, the signing fallacy (see above) adds to the problem by insisting that a document that passes the signing checks is "the same" as the replica it was replicated from. No account is taken of the fact that that perfect replicate at a bit-stream level may look completely different to me, depending on what software I use to render it and the operating context of the rendering operation.

Semantics in digital information is a complex function of the data bits, the algorithms used to process the bits, and the operating context in which the algorithms act in the bits. Consequently, the question "are these replicas 'the same'?" is not simple to answer...

4 - The either/or fallacy
...When someone asks me, as they sometimes do - and I quote - "How do I know that you sent me the original, authentic document?". I answer that it all depends on what you mean by the words "sent", "original", "authentic" and "document" :-)

Part of the problem is that fake/real, same/different are very binary terms. In the physical world, this not a huge problem. What are the chances that the Gutenburg Bible in the Library of Congress is a fake? I would argue that it is non-zero but extremely small. The same goes for ever dollar note, every passport, every drivers licence on the planet.

In the physical world, we can reduce the residual risk of fakes very effectively. In the electronic world, it is much much harder. How do I know that the replica of the Gutenburg Bible on my computer is not a fake? When you consider points 1,2 and 3 above I think you will see that it is not an easy question to answer...

What to do?

...It all looks quite complicated! Is there a sane way through this? Well, there had better be because, at least in the legal world, we seem to be heading rapidly into a situation where electronic texts of various forms are considered authentic/original/official/masters etc.

I personally believe that there are effective, pragmatic and inexpensive approaches that will work well, but we need to get out from under the terrible weight of unsuitable and downright misleading terminology we have foisted upon ourselves by stretching real world analogies way past their breaking points.

If I had my way "hashing" and "signing" would be utterly distinct. The term "non-repudiation" would be banned from all discourse. I would love to see all the technology around PKI re-factored to completely separate out encryption concerns from counterfeit detection concerns. The two right now feature some of the same tools/techniques, but the amount of confusion it causes is striking. I have lost count of the number of times I have encountered encryption as a proposed solution for counterfeit detection.

As time permits over the next while, I will be blogging more about this area and putting forward some proposed approaches for use in electronic legal publishing. I will also be talking about approaches that are applicable to machine readable data such as XML as well as frozen renderings such as PDF. A concept that is very important in the context of the data.gov/law.gov movements.

I expect pushback because I will be suggesting that we need to re-think the role of PKI and digital signatures and get past the dubious assertion that this stuff is necessarily complicated and expensive.

I truly believe that neither of these are true but it will take more time that I currently have to explain what I have in mind. Soon hopefully...

Friday, September 10, 2010

Sustainable data.gov initiatives in 4 easy steps

(Note: this post is largely directed at government agencies and businesses working with government agencies working on data.gov projects.)

At the recent Gov 2.0 Summit Ellen Miller expressed the concern that the data transparency initiative of the Obama administration has stalled. Wandering the halls at the conference, I heard some some assenting voices. Concerns that there is more style than substance. Concerns about the number of data sets, the accuracy of the data, the freshness of the data and so on.

Having said that, I heard significantly more positives than negatives about the entire data.gov project. The enthusiasm was often palpable over the two day event. The vibe I got from most folks there was that this is a journey, not a destination. Openness and transparency are the result of an on-going process, not a one-off act.

These folks know that you cannot simply put up a data dump in some machine readable format and call your openness project "done". At least at the level of the CIOs and CTOs, my belief is that there is a widespread appreciation that there is more to it than that. It is just not that simple. It will take time. It will take work but a good start is half the work and that is what we have right now in my opinion, a good start.

I have been involved in a number of open data initiatives over the years in a variety of countries. I have seen everything from runaway successes to abject failures and everything in between. In this post, I would like to focus on 4 areas that my experiences lead me to believe are critical to convert a good start into a great success story in open data projects.

1 - Put some of your own business processes downstream of your own data feeds


The father of lateral thinking, Edward de Bono was once asked to advise on how best to ensure a factory did not pollute a river. De Bono's solution was brilliantly simple. Ensure that the factory take its clean water *downstream* from the discharge point. This simple device put the factory owners on the receiving end of whatever they were outputting into the river. The application of this concept to data.gov projects is very simple. To ensure that your organization remains focused on the quality of the data it is pushing out, make sure that the internal systems consume it.

That simple feedback loop will likely have a very positive impact on data quality and on data timeliness.

2 - Break out of the paper-oriented publishing mindset


For much of the lifetime of most government agencies, paper has been the primary means of data dissemination. Producing a paper publication is expensive. Fixing a mistake after 100,000 copies have been printed is very expensive. Distribution is time consuming and expensive...

This has resulted – quite understandably – in a deeply ingrained "get it right first time" publishing mentality. The unavoidable by-product of that mindset is latency. You check, you double check, then you check again...all the while the information itself is sliding further and further from freshness. Data that is absolutely perfect - but 6 months too late to be useful - just doesn't cut it in the Internet age of instantaneous publishing.

I am not for a minute suggesting that the solution is to push out bad data. I am however suggesting that the perfect is the enemy of the good here. Publish your data as soon as it is in reasonable shape. Timestamp your data into "builds" so that your customers know what date/time they are looking at with respect to data quality. Leave the previous builds online so that your customers can find out for themselves, what has changed from release to release. Boldly announce on your website that the data is subject to ongoing improvement and correction. Create a quality statement. When errors are found – by your or by your consumers – they can be fixed with very little cost and fixed very quickly. This is what makes the electronic medium utterly different from the paper medium. Actively welcome data fixes. Perhaps provide bug bounties in the same way that Don Knuth does for his books. Harness Linus Torsvald's maxim that "given enough eyeballs all bugs are shallow" to shake bugs out of your data. If you have implemented point 1 above and your are downstream of your own data feeds, you will benefit too!

3 - Make sure you understand the value exchange


Whenever A is doing something that will benefit B but A is taking on the costs, there must be a value exchange for the arrangement to be sustainable. Value exchange comes in many forms:
- An entity a may provide data because it has been mandated to do it. The value exchange here is that the powers-that-be will smile upon entity A.
- An entity A may provide data out of a sense of civic duty. The value exchange here is that A actively wants to do it and receives gratification – internally or from peers - from the activity.
- An entity A may provide data because entity B will return the favor.
- And so on.

One of the great challenges of the public sector all over the world is that inter-agency data exchanges tend to put costs and benefits into different silos of money. If agency A has data that agency B wants, why would agency A spend resources/money doing something that will benefit B? The private sector often has similar value exchange problems that get addressed through internal cross-billing. i.e. entity A sees value in providing data to B because B will "pay" for it, in the internal economy.

If that sort of cross-billing is not practical – and in many public sector environments, it is not – there are a number of alternatives. One is reciprocal point-to-point value exchange. i.e. A does work to provide data to B, but in return B does work to provide data that A wants. Another – and more powerful model in my opinion – is a data pool model. Instead of creating bi-lateral data exchange agreements, all agencies contribute to a "pool" of data in a sort of "give a penny, take a penny" basis. i.e. feel free to take data but be prepared to be asked by the other members of the pool, to provide data too.

In scenarios where citizens or the private sector are the consumers of data, the value exchange is more complex to compute as it involves less tangible concepts like customer satisfaction. Having said that, the Web is a wonderful medium for forming feedback loops. Unlike in the paper world, agencies can cheaply and easily get good intelligence about their data from, for example, an electronic "thumbs up/down" voting system.

The bottom line is that value exchanges come in all shapes and sizes but in order to be sustainable, I believe a data.gov project must know what the value exchange is if it is going to be sustainable.

4 - Understand the Government to Citizen dividend that comes from good agency-to-agency data exchange


In the last point, I have purposely emphasized the agency-to-agency side of data.gov projects. Some may find that odd. Surely the emphasis of data.gov projects should be openness and transparency and service to citizens?

I could not agree more, but I believe that the best way to service citizens and businesses alike, is to make sure that agency-to-agency data exchange functions effectively too.

Think of it this way: how many forms have you filled in with information you previously provided to some other agency? We all know we need an answer for the "we have a form for that" phenomenon but I believe the right answer is oftentimes not "we have an app for that" but rather "there is no app, and no form, for that data because it is no longer necessary for you to send it to us at all".

Remember: The best Government form is the form that disappears in a puff of logic caused by good agency-to-agency data integration.

In summary


1 - Put yourself downstream of your own data.gov initiative
2 - Break out of the paper-oriented "it must be perfect" mindset
3 - Make sure you understand the value exchanges. If you cannot identify one that makes sense, the initiative will most likely flounder at some point and probably sooner than you imagine
4 – When government agencies get their agency-to-agency data exchange house in order, better government-to-citizen and government-to-business data exchange, is the result.

Thursday, September 09, 2010

The Web of Data and the Strata conference

I'm a grizzled document-oriented guy but I'm not blind to the amazing potential of numerical data on the Web. I do not think it is an exaggeration to say that in years to come, data volumes on the web-o-data will be, in order of size: multi-media data, numerical data and then text. Text will bring up the rear. A distant third behind numerical data, which in turn will be some distance behind multimedia data.

That is what I think the volume graph will look like but in terms of business value, I suspect a very different graph will emerge: numerical data - text data - multi-media data. In that order.

In blunt, simple terms, there is serious money in numbers and number crunching. As more and more numerical data becomes available on the web and is joined by telemetry systems (e.g. smart-grid) generating vast new stores of numerical data we are going to see an explosion of new applications. I had the good fortune to be involved in the early days of Timetric and they have now been joined by a slew of companies working on innovative new applications in this space.

At the Gov 2.0 conference that has just ended, I had the opportunity to talk to my fellow survivor of the gestation of XML, Edd Dumbill of O'Reilly who is involved in the Strata conference. Edd really gets it and I look forward to seeing what he pulls together for the Strata conference. Exciting times.

Wednesday, September 08, 2010

The Semantic Web is not a data format

Surfing around today, I get the impression that some folk believe that the Semantic Web is a data format question. It isn't in my opinion. It is an inference algorithm question. Data is just fuel to the engine. If we get sufficient value-add through the inference algorithms - the engines - the data format questions will fall like so many skittles. Deciding on a data format is, compared to the problem of creating useful inference engines, trivial.

Of course, to create an environment where clever inference algorithms can be incubated, you need a web of data but that is the petri dish for this grand experiment - not the experiment itself.

When I characterize the effort as an "experiment" I mean that it is not yet clear (at least to me) if the Semantic Web will usher in a new class of algorithms that provide significantly better inference value-add over the algorithmic approaches of the weak/strong AI community of the eighties. E.g. Forward chaining, Backward chaining, Fuzzy logic, Bayesian inference, Blackboard algorithms, Neural nets, probabilistic automata etc.

If it does, then great! The Semantic Web will be a new thing in the world of computer science. If it doesn't, the absolute *worst* that can happen is that we end up with a great big Web of machine readable data because of all the data format debates :-)

Even if the algorithms end up staying much as they were in the Eighties, we will see more interesting outputs when they are applied today because of the richness and the volume of data becoming available on the Web. However, that does not constitute a new leap forward in computer science. It is this point which is the sticking point for many who are dubious about the brouhaha surrounding the Semantic Web in my opinion.

I've never met anybody who thinks a web of machine readable data is a bad idea. I have met people who think the web-o-data *is* the semantic web. I have also met people who think that the semantic web is all about the inference performed over the data.

Of course, there are many who characterize the Semantic Web differently out there and one of the great sources of debate at the moment is that people find themselves passing each other at 30,000 feet because they do not have a shared conceptual model of what critical terms like "web of data", "semantics", RDF, sparql, deductive/inductive logic etc. mean.

Part of the problem no doubt is that many approaches to machine readable semantics involve the creation of declarative syntaxes for use in inference engines. These data formats are really "config files" for inference engines as opposed to discrete facts (such as RDF triples) to be processed by inference engines. Ontologies are a classic example.

My personal opinion : if the Semantic Web proponents were to stand up and say "Hey, there was all this amazing computer science done in the Eighties but there was never a rich enough set of machine readable facts for it to flourish...Lets give it another go!". I'd be shouting from the rooftops in support.

However, I tend not to hear that. Perhaps its the circles I move in? Most of what I hear is "The Semantic Web is a brand new thing on this earth. Come join the party!"

The CompSci major in me has trouble with that characterization. Its not universal but it does seem quite pervasive.

Yes, it is ironic that the stumbling block for the semantic web is establishing the semantics of "semantics" :-)

Yes, I derive too much pleasure from that. It goes with the territory.

Monday, September 06, 2010

History in the context of its creation

Tim O'Reilly's Twitter feed pointed me at this great piece on historiography.

I just love the 12 volume set of the evolution of a single Wikipedia entry. In KLISS we take a very historiographic approach to eDemocracy.

The primary difference in the way we do it to the Wikipedia model is that we record each change - delta - as a delta against the entire repository of content : not just against the record modified. Put another way, we don't version documents. We version repositories of documents.

In legislative systems, this is very important because of the dense inter-linkages between chunks of content. To fully preserve history in the context of its creation, you need to make sure that all references are "point-in-time" too. I.e. if you jump back into the history of some asset X and it references asset Y, you need to able to follow the link to Y and see what it looked at *at the same point in history*.

Obviously, this is only practical for repositories with plausible ACID semantics. I.e. each modification is a transaction. It would be great if the universe was structured in a way that allowed transaction boundaries for the Web as a whole but of course, that is not the way the Universe is at all :-) - And I'm not for a minute suggesting we should even try!

Having said that, versioning repositories is a darned sight more useful than versioning documents in many problem domains : certainly law and thus eDemocracy which is my primary interest i.e. facilitating access, transparency, legislative intent, e-Discovery, forensic accounting and the like.

The fact that supporting these functions entails a fantastic historical record - a record that future historians will likely make great use of - is, um, a happy accident of ths history we are currently writing.

Wednesday, September 01, 2010

what does law.gov mean to you?

Herein is my response to the question what does law.gov mean to you?

I am an IT architect and a builder of legislative systems more so than a direct legal publisher. Having said that, I have worked with most of the worlds legal publishing entities at some time or other over the last twenty years. My current focus is creating legislative systems for legislatures - mostly in the U.S.A. - the content our systems produce is then published by legislatures themselves and also by third party publishers.

I am a technologist first and foremost. I recently started blogging about the KLISS eDemocracy system here in Kansas in the hope that the technical details I am blogging will help other technologists to understand the legislative domain better and thus help create a more informed tech community around one of the most important aspects of any democracy.

I agree with pretty much everything Ed Walters said about the AOL Moment that is currently happening in the legal publishing industry. I also also agree with pretty much everything Carl Malamud says about the desirability of free, unfettered access to authenticated, machine readable primary legal materials in the context of the law.gov initiative.

For me however, the most interesting vista that law.gov opens up is the potential for the most significant event in the evolution of democracy since the funeral oration of Pericles 2400 years ago. For the first time in human history, we now have all the technological pieces we need to bring participation in the democratic process to levels not seen since ancient Greece when everyone could literally congregate in the same place. To quote Don Heiman, CITO for the Kansas State Legislature:

    "Anything, including law making, you do in the presence of government you can do electronically without regards to wall or clocks provided it is easy to use and free to citizens."

There are no longer any technical reasons why we cannot publish the public activities of a legislature in real-time, or have statute databases codified on the fly, or provide direct visibility of what the impact of a proposed modification to the law would look like before it gets voted on. No technical reason why we cannot allow citizens to not only observe, but also participate in the making of law *as it is being made* - not just see the results ex post facto.

It is a lot of work for sure but it is only work at this point. No new technology breakthroughs are required. What needs to happen next (and there are signs it is happening) is for the world of law and the world of software development to both come to the realization that they are both in the same business from content management and publishing perspectives. I really believe that law is source code in the sense that the disciplines and techniques that have been perfected in the software development world have a tremendous amount to offer those who manage corpora of legal texts.

I look forward to the day when we speak of, for example "release 7.8a (Rev 456422) of the consolidated statutes of Tumbolia (MD5: checksum d03730288a7f0278e36afc82f220ddab)."

I look forward to the day when we can jump into a time machine and look at Rev 674245 of the 2011 Legislative Biennium Corpus for Tumbolia in order to better understand the legislative intent of an amendatory bill.

I look forward to the day when we can look at the laws of Tumbolia, as they were at noon Wed, 20 Jan 2010 in order to present attorneys and the courts with a complete view of what the law said at the time some contested action took place.

I look forward to the day when we can detail edit-by-edit how the consolidated statutes of Tumbolia came to be what they are by starting with the Constitution of Tumbolia from 1899 and rolling forward changes to its statute from its session laws, step-by-step with all the rigor of an accounting audit trail of transaction ledgers.

I hope that the law.gov initiative heads in that direction. The http://legislation.gov.uk website clearly points the way for what is possible. Speaking as a technologist, we techies stand ready willing and able to make this happen. Is the political will there to make it happen? Is the disruption of the status quo too much too soon for such a staid and contemplative field as law and law-making? I can answer neither of these questions but I sincerely hope the answers are "yes" and "no" respectively.

The biggest threat to any democracy is a disinterested electorate. In years to come, I hope law.gov will be seen as the catalyst that re-invigorated an entire generation to engage with the democratic process. A process that too many currently feel is beyond their realm of influence. We can change that now. For our sakes and the sakes of future generations, I hope we do.

Monday, August 30, 2010

Its all about the back end

David Eaves : Creating effective open government portals. Amen to that.

Here is the thing...most http://data.[whatever] websites are only as good as their ability to serve up fresh content. That oftentimes means that re-thinking back-end processes is required. Otherwise a one-off data dump happens to get things rolling but then...

Nothing kills a web-o-data project so ruthlessly as information latency.

Machine readable content - even more so than human readable content - must be current.

Monday, August 23, 2010

Normal people, normal spreadsheets and RDF

In a post about Gridworks Jeni says:

"Like a lot of spreadsheets created by normal people, who want to create something readable by human beings rather than computers, it has some extra lines at the top to explain what the spreadsheet contains..."

There is a terribly, terribly common pattern here and it has always surprised me that spreadsheet developers have never made row 1 and col 1 "special" for exactly this reason. I've lost count of the number of spreadsheets I've seen that have labels in row 1, labels in col 1 and data in the intersection cells.

Subject, predicate, object anyone:-) Where do all the triples go?.

Monday, August 16, 2010

More on the KLISS workflow model

Last time in this KLISS series I introduced the KLISS approach to workflow and (hopefully) explained why workflow in legislative environments can get very complex indeed. I mentioned that the complexity can be tamed by zooming in on the fundamental features that all legislative workflows share. This post will concentrate on fleshing that assertion out some more.

Somebody once said that a business document such as a form, is a work flow snapshotted at a point in time. I really like that idea but I do not think that a document alone can serve as a snapshot of the workflow in all but the simplest of cases. To do that, in my opinion, you need an extra item : a set of pigeon holes.

The pigeon holes I am talking about are not just storage shelves with some sort of alphabetic or thematic sorting system. I am talking about the kind of pigeon holes that have labels on them that indicate what state the documents in each hole are in. Some classic states for documents to be in (in a legislative environment) include:

- Awaiting introduction in the Senate
- Pending engrossment into the Statute
- Bills currently being processed in the Agriculture committee
- etc.

The power of the incredibly simple, time honored pigeon hole system is too often overlooked in our database centric digital world. The electronic equivalent of these pigeon holes is, of course, nothing more complex than the concept of a file-system folder. In truth, the electronic pigeon hole is generally more powerful than its physical analog because in the electronic world, folders can trivially contain other folders to any required depth. Moreover, electronic folders can have any required capacity.

Sadly, I have rather a lot of personal experience of how this simple-yet-powerful concept of recursive, expandable folders can be "pooh poohed" by folks who think that data cannot possible be considered "managed" unless it it loaded into a database or otherwise constrained in terms of shape and volume. Oftentimes, said folks use the words "database" and "relational database" interchangeably. For such folks, the data model for a "record" is the center of the universe. Insofar as that record has workflow, the workflow is an attribute of the record – not a "place" where the record lives... This record-centric world view is oftentimes the beginning of a slippery slope in legislative informatics where designers find themselves tied up in knots trying to:

  • create enough state variables – fields – in the tables to capture all possible workflow states
  • capture all the business rules for workflow transitions in machine readable form
  • shred the legislative content into pieces (often-times with XML) to fit into the non-recursive, tabular slots provided by relational databases
  • re-assemble the shredded pieces to re-constitute working documents for publication


I do not subscribe to this record-centric model. It works incredibly well when record structures are simple, workflows are finite and record inter-dependencies are few. That is not the world we live in in legislative informatics. Legislative content is messy, hierarchical, time-oriented and often densely interlinked. Relational databases are just not a good fit either for the raw data or for the workflows that work on that raw data. Having said all that, I hasten to point out that ye-olde recursive folder structure on its own is not a perfect fit either. There are two main missing pieces.

Firstly, as I've said before, legislative informatics is all about how content changes over time and the audit trail that allows the passage through time to be accessed on demand. Out-of-the-box recursive file systems do not provide this today. (Aside: those with long memories may remember Digital Equipment Corporations VMS operating system. It was the last mainstream operating system to transparently version files at the operating system level.).

Secondly, legislative informatics is heavily event-oriented. i.e. when an event happens, entire sets of sub-sequent events are kicked off, each of which is likely to create more events which may in turn, create more events... Out-of-the-box recursive file systems do not provide this easily today. i.e. a way of triggering processing based on file-system transaction events (Yes, you can do it at a very low level with device driver shenanigans and signals but its not for the faint of heart).

To address these two short-comings of a classic folder structure for use as a workflow substrate, the KLISS model added two extra dimensions.

  • Imagine a system of recursive pigeon holes that starts empty and then remembers all Create/Read/Update/Delete/Lock operations of pigeon holes and of the documents that flow through them
  • Imagine a system of recursive pigeon holes in which each hole carries a complete history of everything that has ever passed through it (including other pigeon holes)
  • Imagine a system of recursive pigeon holes in which each hole can trigger any required data processing at the point where new content arrives into it.


The first two items above are provided by the time-machine that I have previously talked about. The last one is what we call the Active Folder Framework in KLISS. The best way to explain it is perhaps by analogy with a workflow system realized with a good old fashioned set of physical pigeon holes. Consider this example:
    A new bill is introduced in the House. The requested bill draft is acquired from the sponsor (or perhaps legislative council) and placed in the "introduced" pigeon-hole. This event kicks off the creation of an agenda item where the initial fate of introduced bill will be discussed. That agenda item is lodged in the "pending agenda items" pigeon hole. Later, when the order of business gets to it, items from the "introduced" pigeon hole are taken out and considered. They may go back into that pigeon hole or be moved to pigeon holes specific to particular committees.

KLISS - and more generally the Legislative Enterprise Architecture that underlines it - operates like that. Workflow items - documents - are moved around named folders. Every move is audit-trailed in the time machine. Every time something is changed, events are fired so that down-stream processes that update their internal views of what the pigeon-holes represent. In KLISS all the workflow folders are "active" in the sense that they are not just passive place-holders for work artifacts. Putting something into a folder triggers an event. Taking something out triggers and event etc. Moreover, the event processors have access to the pigeon-hole structure so that the event-processors can create new work artifacts and move them around...this triggering more events. The event processors can even trigger the creation of new folders and new event processors!

The combination of (a) recursive named folders, (b) time machine audit trail and (c) event propagation covers a tremendous amount of ground. These are the three "pillars" on top of which, most of KLISS is built. Internally in Propylon we call them the Three pillars of Zen or TPOZ for short.

At a business level, there are some very attractive upshots to this model.

  • The abstraction that the end-users interact with is a very familiar one. Files in folders...All the time machine and event propagation machinery is transparent to end-users.
  • Ad-hoc workflows can be very easily accommodated without custom programming. Just create some folders and shunt work through them. The audit-trail will continue to be rigorous and the event-propagation will continue to function even for workflows created on the fly by staff operating under pressure (i.e. the House has just suspended the rules and is now about to do X...)
  • Automation can be added incrementally. i.e. if workflow step X is currently manual, the entire workflow can be put in place now and manual steps can be automated over time. The system as a whole operates on the basis that all active folder processing is asynchronous in nature. i.e. we assume that there is a non-deterministic delay for each workflow action. The net result of automating any given folder in KLISS is simply that its associated workflow steps simple get faster over time. Nothing else in the system changes.
  • Workflows have autonomic characteristics. For example, an interface to a voting board may malfunction because of a network error. The result would be that an active folder (an automated workflow step) ceases to be active. No problem, simply revert to the manual processing of the electronic voting documents i.e. fill in the vote forms to create new vote items. Remember : the complete audit trail and event machinery is still working away under the hood. Everything else in the system will continue to function unaffected by the point-failure of one component.

Perhaps the most subtle aspect of the workflow model to grasp is the asynchronous nature of it all. I wrote earlier about naming things with rigid designators in KLISS and that is critical to workflow processing as is the consistency model. Each active folder processor works to its own concept of time, always referring to content in the system via point-in-time URLs that lock down – snapshot – the entire repository as it was at that moment in time. Events that happen in the repository are queued up for consumption by active folder processors. If a processor is slow or goes offline for an upgrade, no problem, the event messages are queued up to be processed whenever the active folder comes back on line.

In summary, KLISS models workflows by extending the familiar pigeon hole abstraction with temporal and event-oriented dimensions. In terms of formalisms in systems theory, it is perhaps closest to Petri nets in which the "tokens" moving between states are information-carrying objects such as digital bill jackets or votes or explanatory memoranda.

So far, pretty much everything I have discussed in this KLISS series has been server-side focused. The next few posts will be client side focused. Next up: author/edit sub-systems in legislative environments.

New office in Lawrence, Kansas

Well, today I did the paperwork for our new office in Lawrence, Kansas. We move in start of September. Looking forward to further establishing relationship with various KU schools : Engineering, Law etc.

Its all about the smudges

There is a profound issue underlying this article on Documentation capturing from a legal perspective.

Unless we find ways of preserving work-in-progress in our digital world we will be the first major civilization to leave no traces behind the great intellectual works it produced. No pentimenti for the visual arts. No Scribbledehobbles for the literary arts.

This is not just a tweedy humanities issue. Fastidious recording of how written works come to say what they say, needs to be a central concern of democracy. Without it, there is no transparency. Democracy and the rule of law cannot work without transparency. A corpus of law is a bit like a humongous novel but unlike literary novels, it never gets finished. It is always a work in progress.

Friday, August 13, 2010

Lefty Day

Today is lefty day. Thanks to James Tauber for reminding me.

Personally, I don't mind the right-oriented college desks or the right-oriented scissors or the right-oriented tin opener. What really irks me is not being able to walk into a music shop and pick up the guitars or the banjos or the mandolins...

Thursday, August 12, 2010

Strong math needed always? I don't think so

For some reason I watched this video today entitled "A Day in the Life - Computer Software Engineer". Towards the end it says that it is important to have "a strong grasp of mathematics".

I remember hearing that back in 1982 and it very nearly scared me away from getting involved in computing. Its not that I'm particularly bad at math but I certainly would not consider myself "strong" in it.

Sometimes I wonder how many young people who would be very competent developers get scared off by this sort of talk? I have been lucky enough to work with some very, very good developers over the years and "strong math" has not been a common thread amongst them.

U.S. Senate Rules/Floor Procedures

http://www.senatefloor.us/ is a nice high level view of the U.S. Senate Rules/Floor Procedures.

Monday, August 09, 2010

A good example of a legislative workflow constraint

This is a good example of a legislative workflow constraint.

Many legislative systems split bills into two buckets: metadata and data. Metadata fields for things like long title and bill number are commonplace. So too is the concept of the data itself being an opaque "payload" of metadata workflow checks.

The difference between the two is oftentimes a side-effect of the data/document duality. In order to leverage scalar types for indexing/sorting, duplicates of data in the text of the bill itself are created.

As soon as data is duplicated like this, consistency becomes an issue. In an effort to deal with this, some try to fully leverage data normalization by shredding bills into chunks in an RDB. That approach fixes one problem - consistency - but introduces another : you now have to worry about re-assembling a bill from the chunks, often preserving line/page number fidelity. Not easy!

The answer, in my opinion, is to preserve the sanctity of the document and make sure that any metadata extraction from the document is purely an optimization for workflow engine purposes and is never treated as "normative".

Friday, August 06, 2010

Gov 2.0 Summit

I will be attending Gov 2.0 summit in Washington D.C. in September. If you are going, or are in that kneck of the woods and want to meet up, let me know.

Show your support for law.gov

This morning, having read this I felt it was time to create some spot on the Web where citzens and businesses alike can show their support for the law.gov initiative.

I'm not exactly a dab hand at this sort of thing...but please sign here if you are in favor of the law.gov principles espoused here.

Thursday, August 05, 2010

Typography matters

Via Slaw comes this gem:
"...neither the Canadian tax system nor, indeed, the Canadian economy, ought to be held hostage to a type-setter’s selection, at any given time, of what is considered a pleasing and useful type-face for a dollar sign."

Monday, August 02, 2010

The KLISS workflow model

My last post in this KLISS series marked a half way point in this whirlwind tour of the KLISS architecture. Before proceeding, I'd like to summarize the main discussion points so far and link back to earlier posts for ease of navigation.
This agnosticism about data organization is a critical component of the KLISS workflow model which is the primary topic of this post. If you have been reading along in the series, it will come as no surprise that legislatures/parliaments pose many challenges when it comes to applying "off the shelf" standard models for databases or content management systems or video publishing systems. Central to the complexity, in my opinion, is that legislatures/parliaments resist analysis and decomposition. By that I mean that the standard methods of systems analysis and design pre-suppose that the analysis phase can result in the crisp expression of the "business rules" or the "workflows" that govern any particular domain. In the standard model, one does not begin implementing business rules/workflows until one understands what they are. This sounds abundantly sensible doesn't it?

Not so in legislatures/parliaments. I'm sure most readers will find that statement surprising. How hard can the rules be? In Civics class they talk about bills being introduced and debated and modified and voted on and passed into law. Seems like a pretty straight-forward workflow to me? Why not just walk around, ask everyone what they do in support of that workflow and then write it all down. Voila! One set of "as-is" business rules!

I would like to split the reasons why workflow is not that simple in legislatures/parliaments into the following seven areas (in no particular order), each of which is discussed below:

  • 1. The workflow rules are cherished
  • 2. The workflow rules can have non-rule lineage
  • 3. The workflow rules are malleable
  • 4. Every workflow rules has an exception which is itself, included in the rules
  • 5. The workflow rules are instruments of differentiation
  • 6. The workflow rules include the ability to suspend all rules
  • 7. The workflow rules are an instrument in the art of politics

1. The workflow rules are cherished


Legislatures/Parliaments are generally grand old institutions. It goes with the territory that there is a lot of tradition and a lot of history behind how they operate. Unfortunately, the distinction between operating "rules" and operating "traditions" is easily lost in the mists of time.

Here is a pot-pourri of rules from my own experiences to give you a flavor of what I am talking about:
  • The Finance Bill is always Bill number 7.
  • Chairmanship of committee X lie with the Senate on odd numbered years.
  • Amendments are worded differently if the Bill is at report stage.
  • Votes of finance bills only can be considered after the 90th session day.
  • Only one bill may be named in a motion to introduce.
  • Bills cannot be directly referred to sub-committees.
  • A sub-committee can continue to exist after its parent committee has been dissolved.


During analysis phases I have come across many rules such as these. It has been my experience that rules in legislatures/parliaments only have one thing truly in common. Namely, that they are cherished equally by staff and Members alike. Asking the dreaded question "why?" to any of the rules above and you are likely to get a reaction of the "because!" variety. Digging deeper, a variety of outcomes are possible:

  • It may be that nobody can remember why the rule is the way it is but all agree that it is the way it is and probably cannot be changed just to help an IT project.
  • It may be that there is a statutory reason (but the statute can change at any time).
  • It may be that there is an explicit chamber rule (but the rules can change at any time).
  • It may be that there is a precedent the explains the rule.

Practical upshot: Digging deeper into any of these rationales may result in "bottoming out" the rule but it may also result in another layer of detail that itself, needs further bottoming out. That bottoming process may or may not end.

2. The workflow rules can have non-rule lineage


In some cases, it is possible to bottom out the "why" of a rule only to find that it is something of an accident of history. A common example is rules that exist in order to work around problems in document processing over the years. Over time, the rationale for the rule gets lost and the rule ends up having the same status as other rules in the minds of the actors concerned. i.e. the rule becomes cherished. Some examples from my own experience:

  • Rule: The markup code "@XYZ" is used to trigger double-spaced printing of bill drafts. (This, upon investigation, turned out to be a bug in a computer system from the Seventies. The system had no code "@XYZ" but somebody discovered that it had the useful side-effect of double spacing the bill drafts. The rule was subsequently added into the bill drafting guide, right alongside "real" rules for statute citation and quorum rules for conference committees.)
  • Rule: All internal cross-references must take the form ABC (This upon investigation, turned out to be a rule created to work around a Word Perfect macro bug which would miss certain cross-references in the reports it was generating unless they were constructed just so.)
  • Rule: The House is limited to 20 committees. (This turned out to be because of a fixed size lookup table for committee names in a mainframe application which set a maximum committee count to 20 on the House side and 20 on the Senate side.)

Practical upshot: Accidental workflow rules are still rules.

3. The workflow rules are malleable


The rule-sets of legislatures/parliaments invariably include the rule(s) that govern changing of the rules themselves. From an IT perspective, the presence of a rule that allows the rules to be changed has profound implications. It means – in one fell swoop – that any attempt at codifying the rules directly into a programming language is doomed to fail. Anybody who approaches a legislature/parliament attempting to map business rules to business logic to computer code, in the classic model, is going to get into trouble.

Practical upshot: the rules are not written in stone so they cannot be coded in stone either. In the IT vernacular, the "rules engine" for expressing and executing the rules of a legislature/parliament needs to be Turing complete. No approach based on, say, mapping finite state machine state transitions to middle-tier function calls is going to cut the mustard.

It could be argued that the rules might not change very often. One might point, for example, the 43 standing rules of the U.S. Senate. Whilst it is true that the rules of the U.S. Senate do not change very often there is a vast (and I do mean vast) set of precedents which are, in effect, rules. Precedents result whenever something happens on the floor that requires adjudication by the Parliamentarian. The amount of precedent grows over time. Every sitting day is an opportunity for new precedent – new rules – to be formed.

4. Every workflow rule has an exception


Remember how, it math class, your teacher exploded your brain by telling you that between every to numbers on the number line, lies another number? Business rules in legislatures/parliaments are very similar in my opinion. There may be a rule for what happens in situation X and a rule for what happens in situation Y but when Z occurs – a combination of situations X and Y – a new rule is born. This process can be repeated ad infinitum.

Practical upshot: The rules of a legislature/parliament are like a fractal. Between each pair of rules lies the potential for a literally infinite family of sub-rules. In this sense, the rules are somewhat like the corpus of law itself. Primarly law becomes denser and denser – more fractal like – as more and more refinement is added to the rules. Similarly caselaw interprets statute and once created, the caselaw itself becomes law (in the common law tradition at any rate). Then new caselaw can come along clarifying interpretation for cases that fall between existing caselaw and existing statute. This process can be repeated ad infinitum.

5. The workflow rules are instruments of differentiation


Speaking of fractals...organizational boundaries are fractal-like too. Consider a new legislature/parliament formed out of an existing one. Say, the New England States or the African countries that emerged from French/English/Dutch colonies...They may start as replicas of their parent institution but soon the differences start to appear. The new institutions differentiate themselves from their parents by injecting differences in how they operate. Over time, the injection of difference percolates the new institution. The chambers inject differences so that House procedure differs from Senate procedure. Appropriations committees do things differently than redistricting committees. The Senate creates a new form of Resolution, not used in the House. The House changes the way it numbers its rules so that they are visibly different from the Senate rules...etc. etc.

Practical upshot: Rules serve to differentiate as well as control the behavior of institutions. The human organizational need for differentiation essentially guarantees that rules will change even if there is no specific technical reason for the change.

6. The workflow rules include the ability to suspend all rules


This one needs little explanation. In an earlier post I mentioned Nomic. Suffice it to say that building computer systems that are capable of interpreting and checking and executing a set of rules is made significantly more complex if the rules can be disabled at any point.

7. The workflow rules are an instrument in the art of politics


This one also needs little explanation. Politics is, in my opinion, best understood in terms of game theory. Equilibria become more complex when the rules can be modified in a way that impacts the outcome matrix.

Conclusion


I hope the above has convinced you that getting to the bottom of the business rules of a legislature/parliament is no easy matter. It is not a question of effort. Doubling the number of resources trying to bottom out the rules will not help you. Doubling the time available to do it will not help you.

The workflows of legislatures/parliaments are, in my opinion, complex in the formal, mathematical sense of complex. That is to say, the feedback loop created by the self-referential nature of the rules combines with the other factors listed above to create a fascinating phenomenon in which homeostasis (i.e. stable parliamentary process and procedure) emerges as a sort of emergent property of the system as a whole.

Conclusion


The key, in my opinion, is to not let the surface complexity overwhelm you. After all, the very definition of an emergent property is one that arises out of a multiplicity of relatively simple operations that includes a feedback loop. The complexity can be tamed! However, taming it requires looking at the right level of abstraction. A full frontal assault based on decomposing the rules is about as likely to yield actionable understanding as decomposing the ants in an ant hill.

Over the years I have seen many fall into the trap of attempting to bottom out all the rules in a legislature/parliament. I tried it myself on more than one occasion. It is seductive. It would be great it if were possible, but it is not possible in my opinion.

What to do? The critical thing is to change the focus of the hunt. The hunt is not to find individual rules. The hunt is to find the relatively simple operations from which all rules are made.

That is what KLISS is based on. The application substrate it sits on has a workflow model based on this emergent model of workflows.

That is where we will go next.

Friday, July 30, 2010

Law and eDemocracy : watch that space...

The bad old days when simply publishing PDF copies of the fundamental texts that underly democracy and calling it "eDemocracy" or "eLegislation" or "transparency" or "participation" are thankfully numbered.

For a great pointer in the direction of what the future can hold when a democracy steps boldly into the electronic world, see http://www.legislation.gov.uk/.

To paraphrase Churchill, we are not at the end of the bad old days. Not even the beginning of the end. But we have definitely seen the end of the beginning :-)

This is going to be a fascinating few era for anyone who lives in a democracy - not just those involved in IT. Churchill also said that Democracy was the worst form of Government ever invented - apart from all the others.

I would love to be able to invite him to come back to Earth in a few years time and give us his opinion because I believe its about to get better again.

In my opinion, the Internet is about to have a more profound impact on the practice of democracy than television, the telephone and the printing press combined.

Friday, July 09, 2010

On vacation...

I'm taking a 2 week, "e-cold turkey" break. More posts on KLISS when I return...

Tuesday, July 06, 2010

The end of print for law?

Bob Berring muses on the future of print for law and references the Book of Kells and Newgrange...

In the magnificent long room in my Alma Mater, Trinity College Dublin, the book of Kells is on display and shockingly legible. By that I mean that it is a lot more legible than the text in the Wordstar files on the CP/M-based 8 inch floppies in my basement. Even if I could read them (which I can't) they wouldn't be "real" in the sense that the real files were on other floppies that were used to create replicas. In the digital world, no document is ever "real" in the way that the Book of Kells is real. Everything is a best-efforts replica of something which is itself a replica...all the way down to what you saw on the screen at the moment of content creation, inter-mediated by an operating system, then a software application, then a display device driver...This is deeply worrying stuff if you are trying to write down content for the ages : be it sacred texts or legal texts. I spend a goodly amount of my time these days thinking about this in the context of law, law.gov, data.gov and of course, the KLISS project.

It is fitting I think, to ponder this stuff and how it relates to law, in the Irish countryside because the Irish played an instrumental role in the creation of copyright law many, many moons ago. Cooldrumman, the location of the battle, is close to my house in Sligo, Ireland.

Saturday, July 03, 2010

KLISS, law and eDemocracy

I am roughly half way through my high level description of KLISS and the Legislative Enterprise Architecture that underpins it. It is the eve of the 4th of July independence day celebrations as I write this. It seems like an appropriate moment to step back from the detail a little and look at the bigger picture.

As a specialist in legal informatics, I cannot help but think of this historic time in terms of America's foundational documents, without which, the great enterprise known as "democracy and the rule of law" would simply not be possible. Missing my homeland of Ireland as I do from time-to-time; sitting in my home in Lawrence, Kansas; I cannot help but be drawn to the involvement of some generally forgotten Irish people in the events of 1776.

The Dunlap Broadside, the first printed copies of the declaration of independence, were produced by an Irishman John Dunlap in 1776. Of the eight foreign-born signatories of the declaration, three where Irish: James Smith, George Taylor and Matthew Thorntorn.

I cannot help but marvel at the fact that 27 of the original 200 or so copies still exist. So too, of course, does the *real* declaration in the form of the engrossed parchment prepared by Timothy Matlack. It was itself copied from the drafts produced by the founding fathers on, (probably) hemp paper of some description.

If you have been following along in this KLISS series you will probably be sensing where I am going with this. The drafts, the engrossed version, the promulgated copies...establishing the relationships between these artifacts is critical to establishing the laws/regulations of the land. It is critical because there can be - and there often is - ambiguity and room for disagreement as to what the law actually means. Law is a very complicated business after all. As a society, we can find ways to deal with that complexity as long as there is no ambiguity as to what the law actually says in terms of the text of the language itself. Once we have that, at least we are all arguing (or zealously advocating) different takes on the same thing. If we start arguing for different takes on different things, chaos reigns.

In the case of the declaration, thankfully, we are in good shape. The Dunlap broadsides are unambiguously copies, not the original. The hemp drafts of Thomas Jefferson are "just" drafts (fantastically important for historical research but not the real thing from a legal perspective). The real thing is the engrossed parchment prepared by Timothy Matlack, and signed by each of the founding fathers. That is why, for example, debates about the accuracy of the Jefferson memorial can be resolved. The placement of commas can be compared with the for-reference, original : the parchment. As for whether or not Jefferson intended "inalienable" rather than "unalienable", the intent is something we can and should be able to argue over in a civil society as long as we can look at the engrossed version and see one or the other unambiguously present.

The ancient Romans seemed to understand the importance of non-ambiguity of legal text well. Although they had early forms of paper, knew how to write on animal skin and knew how to make clay tablets, they chose to "engross" their foundational legal text : the Twelve tables by engraving them on ivory. Something that would withstand fire better than paper. Is harder to tamper with than a clay tablet, smudge resistant...

Removing ambiguities as to the for-reference original text of law is vital for another reason. Law, although it is not expressed mathematically or interpreted via formal logic, is very much based on mathematical concepts: induction, deduction, the law of the excluded middle, contravalence etc. In particular, it shares with mathematics the concept of axioms : foundational, self evident truths from which further truths can be derived and against which assertions of truth can be tested.

Historical documents show that both Jefferson and Adams were familiar with Euclid's Axioms, as was Abraham Lincoln. The Euclidian overtones in phrases like "We hold these truths to be self-evident" (Declaration of Independence) and "...dedicated to the proposition that all men are created equal." (Gettysburg address) are striking indeed.

It is very easy to arrive at bad results in mathematics if your starting assumptions – the axioms – are wrong. So too in Law. Law builds on itself just as mathematics builds on itself. It is accretive. Thanks to legal principles like stare decisis interpretation of the law is itself accretive because caselaw builds on caselaw...any ambiguities that creep into the vast self-supported edifice of law is bad for the rule of law. (I hold that to be self-evident:-)

Looking back at the history of law and the history of democracy, I think we have reached an inflection point. Something *big* is about to happen I suspect. I am not sure what shape it will take but here are the drivers as I see them:
  • The volume of law – including all the material used in adjudicating on and practicing law - is growing exponentially.
  • In practice, because of the sheer volume (and some other reasons) the copies of law used in the practice of law and cited in court are often "owned" by commercial third parties who amass all the material into private repositories.
  • Even if the text of legal materials is not owned/claimed by a commercial entity, the citation mechanisms can be. E.g. page numbers of case law publications or consolidations/re-statements of specific areas of law.

Now into this world, over the last two decades or so, comes the Internet and the Web in particular. It has so much to offer the world of law (and the world of democracy) that tensions between the "old world" and the new are mounting fast.

A quiet revolution is taking shape. Citizens are now armed with their knowledge of instantaneous publishing via Blogging or Google docs or Facebook. They are armed with knowledge of instantaneous search via Google or Bing. They are armed with knowledge of instantaneous revision with revision history via Wikipedia. They are armed with knowledge of hyperlinks for instantaneous follow-up of citations. They expect video to be instantly available on Youtube or blip.tv...When these citizens look at how laws/regulations are made today and how formal meetings are conducted today and how content that should be free (i.e. the laws/regulations of the land) is either hidden behind paywalls or only available in hard copy or buried deep inside large PDFs or 2 weeks out of date...

Something has got to give. Especially if you tell these citizens that they must abide by all these laws/regulations. Also, because they live in a participative democracy, they can get involved in shaping those laws and are entitled to free an unfettered access to the process of making law...The gulf between the feature-set of the Web-world for this sort of activity i.e. participation & publishing versus the existing "feature-set" of the status quo for law/regulation-making is so striking.

It seems to me that the world of law is somewhat like the worlds of news or music or of TV. For many years they fought against the Internet but have now finally started to embrace it. The Internet is an amazing force. So far the number of areas of human endeavor that have resisted its advances successfully stands at 0 and counting. I believe that the world of law/regulation-making is next up for a significant, world changing transition to the Web. It certainly is not as sexy as the world of music or sports news or TV shows but in a democracy, I cannot think of any one thing that is more important. I cannot think of anything that should be more free than the law and the ability to participate in its creation.

Although I am overwhelmingly positive in my outlook on what the Web will do for law and for democracy, there are some negatives. My primary concern is in the area of reference copies of the law. That concern I hope is evident from my opening remarks in this post. The reference copy of law is no longer etched onto ivory or engrossed onto animal skin. The sheer volume of law make that impossible anyway. In recent decades acid free paper and non-fugitive inks and master-copies kept in safes in the offices of Secretaries' of State, have substituted.

Nowadays, many law-producing entities such as legislatures/parliaments, agencies, courts are moving away from having heads of state sign or initial vellum sheets towards treating electronic legal artifacts as authentic. This, quite frankly, scares me as I believe I know enough about technology to know all the possible ways in which digital data can be compromised between producer and consumer and can degrade over time. (I talked about some of them earlier in this KLISS series.)

The folks who are making this transition to digital are are well intentioned and are seeking to take advantage of the Web to better serve their citizens. I'm all for that obviously. However, I do worry that the language of information technology creates incorrect assumptions in the minds of those not versed in the details of how digital machines actually work. A digital signature is really nothing like a real signature. An e-mail really is not like snail-mail at all because nothing ever gets sent. Everything is a copy – with all the issues that copies brings...The word "authentic" is so much more slippery in a digital world.

Having sounded that note of caution, let me end by saying I truly believe we live in profound times from the perspective of democracy. The Web can - and will - fundamentally change how we think about participative democracy and the process of making laws and regulations. We now have all the individual pieces of technology (I have mentioned most of them already in this KLISS series) we need. No new breakthrough algorithms or devices are required. We just need to assemble everything coherently. It is now a matter of design - not a matter of research.

We are on a fascinating road to a different world, we will get there via some disruptive technologies and disruptive memes. Not everyone will be best pleased but if the history of the internet tells us anything it is that resistance - once all the stars are aligned - is futile. Better to be part of it rather than fight against it. Better to help shape it and drive it forward than simply react to it.

In KLISS, I have been lucky enough to contribute to an initiative that strives to fully embrace technology for the betterment of democracy and the transparent making of law that it depends on.

I look forward to doing my bit going forward to ensure that the compelling vision of KLISS is realized and sharing the design and our experiences with anybody who is interested in it.

Next up: The KLISS workflow model

Thursday, July 01, 2010

The Point-in-time issue. A stock exchange example

In a recent post, I talked about the importance of temporal decoupling and point-in-time stamping of data in our increasingly lightening-fast-yet-fundamentally-asynchronous world...

In that context, this post about the recent stock market flash crash is interesting.

Tuesday, June 29, 2010

Data models, data organization and why the search for the "correct" model is doomed

I have received some e-mails about my assertion that there is no such thing as the "correct" way to model anything in a computer system. I.e. no "pure" model that does not gain its correctness status via mere engineering concerns such as fitness-for-purpose.

My argument boils down to this:

- to model anything in software you need a human

- that human needs to carve up reality in some way in order to create a model. I.e. name things, classify things, link things to other things, distinguish causes and effects, distinguish entities from actions, declare some aspects of reality "unimportant", create a model boundary etc.

- no two humans carve up reality in exactly the same way as we are all unique creatures whose view of the world is influenced by our language, culture, experiences etc.

- therefore, no two models are likely to be exactly the same

- even if they appeared to be the same, there is no way to be sure because human language is lossy. I.e. there is no way to be sure that the model I have in my head is what I have communicated through language. As Wittgenstein said, some things cannot be said - they can only be shown. In Zen terms, our words are just fingers pointing at the moon.

The best book I have read on this subject - highly recommended - is Bill Kent's Data and Reality.

Kent looks at the world from a relational database perspective. A couple of articles from my scribenatorial past might be of interest..They look at the world from a - surprise - XML perspective:

Next up: KLISS, Law and eDemocracy.

Saturday, June 26, 2010

KLISS: Organizing legislative material in legislatures/parliaments

Last time in this KLISS series, I talked about the event model in KLISS. I also talked about how it works in concert with the "time machine" model to achieve information consistency in all the "views" of legislative information required for a functioning legislature/parliament. For example a bill statute view, a journal view, a calendar view, and amendment list view, a committee view etc...

I am using the word "view" here is a somewhat unusual way so I would like today to explain what I mean by it. Doing that will help set the scene for an explanation of how legislative/parliamentary assets are organized in the KLISS repository and how metadata-based search/retrieval over the repository works.

It goes without saying (but I need to say it in order to communicate that it need not be said (ain't language wonderful?)), that legislatures/parliaments produce and consume vast amounts of information, mostly in document form. What is the purpose of the documents? What are they for really? In my view, they serve as snapshot containers for the fundamental business process of legislatures/parliaments, which is the making of law. In other words, a document in a legislature is a business process, snapshotted, frozen at a point in time.

By now, if you have been reading along in this KLISS series, you will know that it is very much a document-centric architecture. The documents themselves, in all their presentation-entangled, semi-structured glory, are treated as the primary content. We create folders, and folders inside folders. We create documents with headings and headings inside headings and we put these into folders. We then blur the distinction between folder navigation (inter-document) and heading "outline" navigation (intra--document) so that the whole corpus can be conceptualized as a single hierarchical information store. The entire state of a legislature/parliament, is in KLISS, *itself* a document – albeit a very large one! Simply put, KLISS does not care about the distinction between a folder and a heading. They are both simply hierarchical container constructs.

In KLISS a "view" is simply a time-based snapshot generated from the enormous document that is the repository, seen at a point in time, in some required format. So, a PDF of a bill is such a snapshot view. So too is a the HTML page of a committee report, a journal, a corpus of promulgated law etc. HTML, PDF, CSV, there are all the same in the KLISS information model. They are just views, taken at a point in time, out of the corpus as a whole.

Earlier in this series I talked about how the web blurs the distinction between naming something to pick it out and performing a query to pick it out. KLISS takes advantage of that blurring in the creation of views. So much so that a consumer of a KLISS URI cannot tell if the resource being picked out is "really there" or the result of running a query against the repository.

The hierarchical information model in KLISS has been strongly influenced by Hebert Simon and his essay The Architecture of Complexity. The view/query model is a sort of mashup of ideas from Bertrand Russell (proper nouns as query expressions) and John Kripke (rigid designators) combined with the Web Architecture of Sir Tim Berners Lee.

The most trivial views over the KLISS repository are those that correspond to real bytes-on-the-disk documents. Bills are generally like that. So too are votes. So too are sections of statute. Another level of views are those generated dynamically by assembling documents into larger documents. Volumes of statute are like that. Journals are like that. Once assembled, these documents often go back into the repository as real bytes-on-the-disk documents. This creates a permanent record of the result of the assembly process but it also allows the assemblies to be, themselves part of further assemblies. Permanent journals are like that. Final calendars are like that. Chronologies of statutes are like that.

Yet another level of views are those generated from the KLISS meta-data model...In KLISS, any document in the system can have any number of property/value pairs associated with it. When transactions are stored in the repository, these property/value pairs are loaded into a relational database behind the scenes. This relational database is used by the query subsystem to provide fast, ordered views over the repository. The sort of queries enabled are things like:

Give me all the bill amendments tabled between dates X and Y
Give me all the sponsors for all bills referred to the Agriculture committee last session
Give me all bills with the word "consolidation" in their long titles
How many enrolled bills have we so far this session?
etc.

At this point I need to point out that although we use a relational database as the meta-data indexer/query engine in KLISS, we do not use it relationally. This is by design. At this core level of the persistence model, we are not modeling relationships *between* documents. Other levels provide that function (we will get to them later on.). Effectively what we do is utilize a Star schema in which (URI+Revision Number) is the key used to join together all the metadata key, value pairs. The tabular structure of the meta-data fields is achieved via a meta-modeling trick in which the syntax of the field name, indicates what table and what field and what field type should be used for the associated value. In the future, we expect that we will gravitate away from relational back-ends into more non relational stores that are thankfully, finally, beginning to become commonplace.

It is important to note that in KLISS, the meta-data database is not a normative source of information. The master copy of all data is, at all times in the documents themselves. The metadata is stored in the documents themselves (the topic of an upcoming post). The database is constructed from the documents in order to serve search and retrieval needs. That is all. In fact, the database can be blown away and simply re-created by replaying the transactions from the KLISS time machine. I sometimes explain it by saying we use a database in the same way that a music collection application might use a database. Its purpose is to facilitate rapid slicing/dicing/viewing via meta-data.

This brings me to the most important point about how information is organized in KLISS. Lets step all the way back for a moment. Why do us humans organize stuff at all? We organize in order to find it again. In other words, organization is not the point of organization. Retrieval is the point of organization. Organization is something we do now, in anticipation of facilitating retrieval in the future. For most of human history, this has meant creating an organizational structure and packing stuff physically into that structure. Shoe closets, cities, pockets, airplanes, filing cabinets, filo-faxes, bookshelves, dewey decimal classification...

As David Weinberger explains in his book "Everything is Miscellaneous", there is no need for a single organizational structure for electronic information. A digital book does not need exactly one shelf on one wall, classified under one dominant heading. It can be on many shelfs, on may walls under many headings, in many ontologies, all at the same time. In fact, it can be exploded into pieces, mashed up with other books and represented in any order, in any format, any where and any time. Not only is this possible thanks to IT, it cannot be stopped. All known attempts – and their have been numerous – since the dawn of IT have failed to put the organization genie back in the bottle...

Having said that, the tyranny of the dominant decomposition appears, per Herbert Simon to be woven into the fabric of the universe. In order to store information – even electronically - we must *pick* at least some organizational structure to get us started. At the very least, things need to have names right? Ok. What form will those names take...Ten minutes into that train of thought and you have a decomposition on your hands. So what decomposition will be pick for our legislative/parliamentary materials? Do committees contain bills or do bills contain committees? Is a joint committee part of the house data model or part of the senate data model or both? Are bill drafts stored with the sponsor or with the drafter? Are committee reports part of the committee that created them or part of the bills they modify? etc. etc...One hour later, you are in a mereotopology induced coma. You keep searching for the perfect decomposition. If you are in luck, you conclude that there is no such thing as the perfect decomposition and you get on with your life. If you are unlucky, you get drafted into a committee that has to decide on the correct decomposition.

Fact of life: If there are N people in a group tasked with deciding an information model, there are exactly N, mutually incompatible models vying for dominance and each of the N participants is convinced that the other N-1 models are less correct than their own. Legislatures/parliaments provide and excellent example of this phenomenon. Fill a room with drafting attorneys, bill status clerks, journal clerks, committee secretaries, fiscal analysts and ask each of them to white-board their model of, for example bills, you will get as many models as there are people in the room.

That is why, in KLISS, by design, the information model – how it carves up into documents versus folders, paragraphs versus meta-data fields, queries versus bytes-on-the-disk does not really matter. Just pick one! There are many, many models that can work. Given a set of models that will work, there is generally no compelling reason to pick any particular one. In legislatures/parliaments – as in many other content-centric applications the word "correct" needs a pragmatic definition. In KLISS, we consider an information model to be "correct" if it supports the efficient, secure production of the required outputs with the required speed of production. That is essentially it. Everything else is secondary and much of it is just mereotopology.

Two more quick things before I wrap up for today. You may be thinking, "how can a single folder structure hope to meet the divergent needs of all the different stakeholders who likely have different models in their head for how the information should be structured?" The way KLISS does it is that we create synthetic folder structures – known as "virtual views" – over the physical folder structure. That allows us to create the illusion – on a role by role basis – that each group's preferred structure is the one the system uses :-)

As well as helping to create familiar folder structures on a role-by-role basis, virtual views also allow us to implement role based access control. Every role in the system uses a virtual view. Moreover, all event notifications use the virtual views and all attempted access to assets in the repository are filtered through the users virtual view - that includes all search results.

To sum up...KLISS uses a virtualized hierarchical information model combined with property/value pairs arranged in a star-schema fashion. Properties are indexed for fast retrieval and based on scalar data types that we leverage for query operators e.g. date expression evaluation, comparisons of money amounts etc. The metadata model is revision based and the repository transaction semantics guarantee that the metadata view is up to date with respect to the time machine view at all times. All event notifications use the virtual view names for assets.

You may be wondering, "is it possible to have a document with no content other than metadata?". The answer is "yes". That is exactly how we reify non-document concepts like committees, members, roles etc. into document form for storage in the time machine. Yes, in KLISS, *everything* is a document:-)

Next up: Data models, data organization and why the search for the "correct" model is doomed.