Wednesday, 29 August 2007

Example (10): Import again

How can we parametrise our protocol descriptions with respect to document types? In other words how can we import opaque document type so that we can instantiate it later?

("opaque" may not be a good word, what I mean is its content is not visible now.)

This is the problem of late binding. Consider the following protocol:
import Invoice, Order;

protocol BuyerSeller {
    participant Buyer, Seller;
    channel chSeller @ Seller, chBuyer @ Buyer;

    chSeller.order(Order) from Buyer to Seller;
    choice @ Seller {
        chBuyer.invoice(Invoice) from Seller to Buyer;
    } or {
        chBuyer.outOfStock(void) from Seller to Buyer;
    }
}
In the first line we are importing two document types: one may as well consider them coming from a default name space (say from the current directory): this means that by varying the content of a file named "Order" the definition of a protocol varies --- this protocol is parametric over the content of two files in the current directory.

Alternatively we can be slightly more sophisticated: we can set the default name space by an enclosing environment (say a model description which uses this protocol): then one can bind Invoice and Order into arbitrary XML namespace-based names (which includes the use of URL). Here again we are using late binding but through the explicit use of an enclosing context.

This is more organised and general, and in fact subsumes the first approach. But what is an approriate precedence in the way we bind these names? For example:
  • if we have a binding from the enclosing context then it should precede:
  • if not then we shall try to look at the default name space;
  • if there is no default name space or if that name is anyway not bound then it will go to the current directory (well that is the default then) and we try to find file of the same name (perhaps with some suffix --- there are too many suffices these days so I am wary of introducing but for example sdd --- for scribble data definition --- seems not used yet).
We also want the import clause to be able to describe direct bindings to all basic data types such as arrays and vectors and all that; and wish to be able to import Java classes, UML class models, XML schemas. We wish to have packages and we wish to have hierarchical directories (folders) and we wish to refer to somewhere on the web via URL as a location of the schema file (such as xsd files). Once imported, we wish to be able to call it not only by canonical (qualified) names but also by their local names or even aliases.

We may call the name used in Import clause designated name.

One thing I should remember: since document types are also used as (part of) selectors and becaue selectors are a key component of finite state automata (as transition labels) hence are essential for ensuring "non-mix-up" in the face of asynchrony and parallel and all that, we need to have a good way to:

  1. At editing/validating/compile time: check if seemingly different two document types (given by local names or aliases) are really distinct or not: so if there is a binding in the above sense we need to go back to the original.
  2. At compile time: give distinct identifiers to distinct document types (this is possible largely because selectors are meaningful only within a conversation at least in principle).

The identifiers in 2 are used for execution. And as Greg Morrisett said, it is often a very good idea to carry types at runtime even if you have "compiled them away" so we shall carry canonical/qualified names at runtime as much as possible.

As to 1 when there is no binding present of course we treat two (Invoice and Order) to be distinct types: so one safe way is to impose this distinction whenever we have a binding --- if two distinct opaque names are bound then the targets should also be distinct documents.

Alternatively we can treat a canonical name with an opaque name as one when we think about signature. One thing which looks a bit of an issue in this approach is that it does not allow textual replacement of an opaque name with the target of its binding.

On the other hand it has a merit of allowing the following procedure:

(1) We check consistency of a protocol --- even that of model/program descriptions conforming to it --- statically and compile it.

(2) We add bindings by enclosing environments (programs etc.) perhaps at the time of module composition (lazy binding).

(3) At runtime we want (because of (1)!) everything works all right.

Now suppose at (2) somebody insists that he wishes to use the same document format for Invoice and Order (well unlikely but anyway we can assume such a situation): then the validation in (1) becomes meaningless.

If we take the "two-in-one" approach, this issue is gone away.

Or perhaps we should take the "two-in-two" approach: that is we take the designated name for the signature used in selectors; and for data formats, use the real format. we distinguish these two.

So there are quite a few things around here, and that's quite interesting. Gary and I are at least understanding the whole domain, so I am sure we find at least one good design soon.