Featured Post

Linkedin

 These days, I mostly post my tech musings on Linkedin.  https://www.linkedin.com/in/seanmcgrath/

Friday, January 15, 2016

The biggest IT changes in the last 5 years : dynamically typed data

Back in 2010 I believe it was, I started writing bits-and-pieces about NoSQL and I remember a significant amount of push-back from  RDB folks at the time.

Since then, I think it is fair to say that there has been a lot more activity in tools/techniques for unstructured/semi-structured/streamed data than for relational data.

The word "unstructured" is a harsh word:-) Even if the only thing you know about a file is that it contains bytes, you *have* a data structure. Your data structure is the trivial one called "a stream of bytes". For some operations, such as data schlepping, that is all you need to know. For other operations, you will need a lot more.

The fact that different conceptualizations of the data are applicable for different operations is an old, old idea but one that is, I think, becoming more and more useful in the distributed, mashup-centric data world we increasingly live in. Jackson Structured Programming for example, is based on the idea that for any given operation you can conceptualize the input data as a "structure" using formal grammar-type concepts. Two different tasks on exactly the same data may have utterly different grammars and that is fine.

The XML world has, over time, developed the same type of idea. Some very seasoned practitioners of my acquaintance make a point of never explicitly tying an XML data set to a single schema. Preferring to create schemas as part of the data processing itself. Schemas that model the data in  a way that suites the task at hand.

I think there is an important pattern in there. I call it Dynamically Typed Data. Maybe there is an existing phrase for it that I don't know:-)

"Yes, but", I hear the voices say, "surely it is important to have a *base* schema that describes the data 'at rest'?"

I oscillate on that one:-) In the same way that I oscillate on the idea that static type checking is just one more unit test on dynamically typed code.

More thinking required.


No comments: