I had asked Shane Schick of ComputerWorld a question as to why there seemed to be little interest about the Universal Business Language (UBL) in Canada, whereas in Europe it seems to be being adopted and rolled out quite readily.

Shane repIied “I suspect part of the problem is people are still getting their heads around all the other XML variants out there. We tried to address that earlier this year with our “Dictionary of Markup languages” (http://www.itworldcanada.com/a/search/552e0ba5-8f9a-4362-a6eb-de8ad1aea521.html) but maybe we’ll look at this one in more detail. With OASIS behind it, you never know.”

I thought about this reply for some time and decided that perhaps there is a basic misunderstanding about XML based technologies.  With dozen of XML vocabularies and more getting created all the time, the reader may think that the world of XML is extremely complex.    The reality is that it is actually a very simple idea.  The XML recommendation found at   http://www.w3.org/TR/xml describes a class of data objects called XML documents and partially describes the behavior of computer programs which process them.  XML is an application profile or restricted form of Standardized General Markup language (SGML) [ISO8879]  which has been with us since 1986.  Yes we are talking about a 20+ year old technology!  By construction, all XML documents are conforming SGML documents.

Contrary to what many think, XML does not predefine the “tags” that are used for an application.  Rather the application defines such a tag set and constructs them based on the XML recommendation.  Thus, SGML and the XML subset are meta languages.  For example, the Hyper Text Markup Language (HTML) is an SGML application designed to describe to browser applications how a document on the web is to be displayed on the viewers screen.  XHTML is merely HTML recast based on the XML recommendation and uses the same basic tag sets.

A minimum requirement for any XML document in any problem domain is that it be “well-formed”.  That is, the tags in the document must be properly nested.  Hence “…<x>…</x> <y>….</y>…” is well formed, but “…<x>…<y>…</x>…</y>…” is not.  In addition, the application designer can formally define their tag set in the form of a Document Type Definition (DTD) or Schema, and have their application test the XML instance for “validity” against this formal definition with an off the shelf parser.

Why would anyone want to use XML to design a markup language for their application? First it is non proprietary.  Once you have designed your language – any application that can read an XML document can directly read and process yours.  This makes XML very useful for data interchange.  Second, re-usability.  The pieces of an application that can test for well-formedness or validity are standard off the shelf components these days. Thirdly, it allows you to achieve the ability to capture your data once – then automatically transform it into all the outputs you may require., eliminating any re-purposing and re-keying of data for different purposes.  Keep this in mind when we talk about UBL below.

Transforming data is such a ubiquitous requirement in the XML world that there is something called XSLT – the transform language for XML.  This XML application allows you to specify the transform of an XML document belonging to a specific class of documents into anything else -in a declarative way (i.e. you declare what you want done, not specify procedurally how it is done).

Now here is what I think is a very key point.  You specify this transformation process by authoring an XML document.  This you can do using any XML authoring tool.  There is nothing special about this being an “XSLT document”.  You give the XSLT application both the original XML document using the tag set you designed,  and another XML document that uses the tag set defined for the XSLT application.  Since this transformation specification is in fact merely an XML document, you can test it for well-formedness and validity with the same code you use to handle the XML document.  Right from the start you know quite a bit about XSLT.

To transform your XML document into multiple output streams all you need is a set of XML XSLT documents, one for each output you require.  A special case is where you want to transform you XML document into a paginated form.  Another XML application called XSL-FO provides you with all the “type-setting” capabilities you may require for designing your pages.  Again the XSL-FO specification is just another XML document and be processed as an XML document.  It transforms your XML document into a second XML document tagged with XSL-FO tags.  This FO document declares how your content is to be “poured” onto a physical page.  Any applications (or even physical printer) that understands the standard XSL-FO document can now output paper or other formats such as standard PDF.

So a person looking at an XSLT or an XSL-FO document already knows quite a bit about these languages, since they are all only just XML documents.

Now of course you do have to learn the specific tags for XSLT and XSL-FO as they provide the required functionality for that application.  However, it does not mean you have learn the tagging for all XML applications – only those those applications that are relevant to your problem domain.

Having said that though, there is activity going on to reduce the number of variants,  Pre-competive consortia get together to pre-define the XML language for a specific application – say the banking, legislative or mathematics domain.   Coming to agreement on how the document instances for a particular class of documents are to be tagged, reduces the amount of effort required to exchange data between all the players in the future.

Think of what might have happened if anyone building a web page, did so with their own proprietary software, making the data for that web page to be stored in some proprietary format, requiring special software to look at the page. Ouch.  No – the genius of the web is that it was decided to use the SGML recommendation and in advance, build a common set of tags (HTML) that allowed everyone to define what their web page looked like,  in a common way.  Then browsers can be created – many of them, all relying on on the fact that all pages will be tagged in a standard way.  How the browser processes that page is up to the application, and they can be as innovative as they like.

Another case in point is the Universal Business Language (UBL) which is an effort to predefine, in XML, all standard business documents world wide.  Hence things like a purchase order or invoice can be predefined so that a corporate entity that adopted the standard definition for these could exchange them effortlessly between all companies that support the common definition.

If you have any experience with XML, then, when you look at any UBL document, it looks familiar.  It is XML.  It conforms to the XML recommendation.  It can be processed with XML applications you already have.  You can transform it to anything you need using XSLT and print out any part of it using XSL-FO.  So the technology is being built in layers, and each new layer does not require you to throw out the previous layers and start all over again, or even learn a pile of new concepts for basic processing.

Now, back to Shane’s point – what about all the XML variants.  This is only a reflection of the number of individual problem domains that exist in the world.  All these variants are layered on top of base XML.  You only need to know the variant that is a solution to your particular business problem.  If your problem does not involve vector graphics, you need not learn SVG.  If your problem does not involve mathematical notation, then you need not learn MATHML.  However, if your problem involves exchange of business documents with government or other businesses, either locally or world wide, then yes, you need to take a look at UBL.

UBL is interesting in that it had to be designed from the start to be useable world wide, in any language, supporting the laws and processes of your country, and handling all variants of data.  In Canada and other countries for instance, ‘state’ and ‘zip’ have no relevance, but “province” and “postal code” do.  A small company cannot afford to implement the entire UBL infrastructure, so it must be possible to implement only a small subset of the functionality, yet be able to scale up to global proportions if necessary.   Cost to implement has to be scaled to the size of company using it.

UBL is making headway,  particularly in Europe. Why?  Consider the lowly invoice.

Every business wants to get paid and the standard way of doing this is to create an invoice.  The majority of businesses generates this invoice as a piece of paper, puts it in an envelop, adds postage, sends it by mail, which involves trucks and airplanes to transport to the business being invoiced.  There they recieve and open the mail, re-key all the information, then check the invoice to see if it should be paid and if so print a cheque, which is put in an envelop and sent back to the issuer.  The issuer then opens, re-keys this information, processes that cheque, and the $ finally get transferred from one account to another.  It has been estimated that this single transaction costs about 12/13 euros. If we could automate this exchange, businesses generating thousands of invoices could save a significant amount.  This amount is so significant that the Danish government even passed a law that said they would process an invoice iff it was submitted electronically in UBL format.  That caused a lot of sudden interest in UBL in Denmark.

UBL is now entering its second and third wave in Europe.  The objective of the Pan-European Public eProcurement On-Line (PEPPOL) project is to set up a pan-European pilot solution that, conjointly with existing national solutions, facilitates EU-wide interoperable public eProcurement.  Government purchases in the European Union account for around 16 % of GDP,so there are significant savings to be had.

A third wave is starting.  XML was specifically designed to be exchangeable on the web.  UBL is based on XML technologies and hence can be exchanged on the web. UBL is designed to be sub-setted, and scaleable up to global size, handling business documents world wide.

Social applications such as Facebook, Twitter and others have demonstrated that the web technology can handle exchange of “social data” between millions of people worldwide.  The convergence of technologies like UBL and social networking software will be a transformative technology that will change how we exchange business data in the future… ah but that is a story for a future blog entry.

…Hugh