Saturday, 30 June 2007

Before technical discussions: a terminology

Before going further, I realised that it may be better for us to share basic terminology. Our purpose is practical: we wish to exchange concepts clearly, with the following two practical ends: understanding each other, either in agreement or in disagreement; and positioning scribble in the global picture of MDA artifacts, in particular in the UML/MOF infrastructure.

There are three basic terms we treat: model, metalevel hierarchy, and model transformation.

(1) Model


In the UML/MOF literature, we often say "models". If you happen to know about mathematical logic, you will soon find out that "models" in the UML/MOF sense is different from those in logic. For example, a standard notion of a model in first-order logic (FOL) means an interpretation of a syntactic formula in the concrete universe of sets and relations (predicate symbols are interpreted as relations).

So, for example, a formula Eq(1+1, 2) (which we usually write as 1+1=2) is interpreted as the equality relation (interpreting the predicate Eq) between an interpretation of 1+1 and that of 2: constant symbols 1 and 2 are interpreted as the obvious natural numbers and + as an addition function.

Practically, when you do such a modelling, you first decide what is the target of interpretation (which we often call universe of discourse, or UoD) and then consider an appropriate map from each syntactic construct to an element/function/relation in UoD.

I wrote a bit about this since the idea of UoD and its relationship to syntax is often useful when thinking about UML and related models. But at this point we just note the following: this logical notion of models is not the notion of models in UML/MOF. There, a model means:
A description of a piece of software abstracting away some of its features in which we are not interested for our present purpose.
The idea is that for many kinds of business communication, many kinds of decision making, and in general through many stages of software development, we do not wish to go into all the details. Or from a different viewpoint when you wish to design software, either a huge information system or a small application, you start from some vague ideas which may not be fully specified but will nevertheless tell us what it is about. So we need a model --- and we need to scribble a model for interaction.

From now on when we simply say "model" it is used with this UML sense, not in the logical sense. But we shall also keep the logical idea of models in mind.

(2) Metalevel Hierarchy


The next is a review of the standard metalevel hierarchy in UML/MOF. People interpret this in varied ways, but let's be practical, from the viewpoint of how we use it. In essence:
Fix a level of your modelling: then its meta-model gives you a set of rules how you can describe (scribble with pencil, draw a picture with CAD, type up using a linear syntax, ....) your model in that level.
Well this is not the whole story in UML/MOF: we usually assume a stronger relationship between entities in two levels in that each node/association in a model often has its meta-node/association in the meta-level, which allows us to have a very good way for transformation and also a good way to bootstrap (once you understand UML, you can understand the format of MOF). So our definition is a bit loose: and perhaps this is not too bad after all given we after all need a bit of OCL to define well-formedness of UML models (as this document says: the main point is that the definition of UML demands not only syntax but also static semantics, or well-formedness rules).

Now onto the hierarchy, given below in the concrete case of UML.
  • M3:this is the meta-metamodel level, giving rules for specifying M2. Concretely this is MOF, offers a grammar and constraint to define the specification rules for UML.

  • M2: this is the metamodel level, giving rules for specifying M1. Concretely this is the UML itself, including its syntax and semantics, offering specification rules for UML models which are at M1 (M2 is often called UML metamodel).

  • M1: this is the model level, which is our target: we wish to have a model of software, or a description of a model of software. Concrete UML models reside in this level, for your pet application or a large corporate system.

  • M0: this is the instance level w.r.t. M1, in which realisation of models in M1 at specific space-time reside. If we have a model in M1 it may be realised in a memory electronically as a linear syntax or graphically or perhaps you will write it down by pencil and these instances relates to a UML model residing in its metalevel, M1.
Our interpretation of M0 may not be agreed upon by all: people often say that this level is about "execution". Let's not worry too much about this point however since what really matters is done at M1 and above. Here a crux (for practical convenience!) is to put UML models in their concrete syntax in M0, while keeping those in abstract syntax in M1. This fits well in practice when we wish to treat extra information such as geometric information UML models in graphical syntax have (such as position of nodes) in addition to what their representation in abstract syntax says: you do not want an up-side-down diagram even if their abstract syntax is the same, don't you? Yet all in all what essentially matters is M1 which is transformed by the rules stipulated w.r.t. metamodels at M2.

So we are taking M1 to be described in abstract syntax: this makes our discussions simple and precise.

Observe that the same meta-level hierarchy can be considered for other languages than UML, which is precisely why we have M3: in particular, we can consider each programming language , say Java, as making up such a hierarchy.

(This view is not so strange since any programming language is also a modelling language except that a PL usually demands us to specify all details except with respect to some feature such as abstract classes, interfaces and such. And some language is even equipped with an assertion method, or DBC as is seemingly more popular in MDA community. This viewpoint becomes important when we consider model transformation in the next item.)

Finally: from the viewpoint of the logical notion of models mentioned earlier, we can consider each level in the hierarchy as either containing syntactic entities or as being a UoD: one of the useful formal discussions of relationship among four levels in the latter's viewpoint can be found in the work written by Bezivin and Gerbe in 2001.

(3) Model Transformation

Model transformation is at the heart of MDA. We consider only a specific case:
  • Assumption: We have two metalevel hierarchies which shares the identical M3 (say MOF), say source and target respectively (source hierarchy and target hierarchy can be identical, since we wish to do for example UML to UML transformation).

  • Transformation: This is specified at M2 once and for all, so that it maps each M1-model in the source to a M1-model in the target. This in effect also induces a transformation at the level of M0, perhaps with a minor elaboration. The transformation function stipulated at M2-level is helped by two meta-models sharing M3, but can be highly non-trivial.
So we can have transformation from UML to Java, UML to XML, etc. as far as their metamodels are specified in MOF. Having the same M3 gives common foundation to related metamodels.

In practice what is important for such a transformation is to be meaningful (or useful): should preserve certain structure and meaning. This notion of meaningfulness can however be discussed only for each concrete case (even if UML is fullly formalised there is no way to have general criteria for meaningfulness).

In practical MOF-oriented transformations, we also need be concerned how features and classifiers relate with each other: we shall discuss such aspects as needed. Also it is M2-level mapping (hence operating on M1-level entities) that we care: and we are most concerned with the mapping from a high-level modelling towards implementation.

* * *

Now we are ready to go back to the questions we asked in my previous post (link). We shall start from clarifying our first criteria, why it matters to describe operational behaviour in interactions in concurrency unlike in the modelling of sequential programs.

Friday, 29 June 2007

today just a couple of lines...

It is wise that current MDA tools (in particular UML) are focussing on static structures of a model rather than dynamic ones, for two reasons:
  • Static structures are important for design, and once you reach good ones they tend to persist, with incremental refinement, over software's life.

  • And that is practically the only aspect which can be manipulated in an abstract model without over-specification.
So one of the main remaining questions is how we can consistently integrate modelling of dynamic, behavioural aspects into the existing framework: after all one of the unique aspects of software architecture, in comparison with physical one, is that it has a dynamic part. Another question, pertaining to our present context proper, is how we can translate this wisdom into concurrency and distribution.

In a subsequent post we elaborate this point in the context of our language design.

Thursday, 28 June 2007

Frankel's book: and some theses

OK I have just finished reading Frankel's book --- the latter part somewhat in a haste. The first half is a sweeping illustration of MDA's backgrounds and vision; the second and later parts offer a good introduction to basic and somewhat refined aspects of what MDA based on UML/MOF methodologies is about, with a focus on MOF-based transformation.

The book is written in a casual style, clearly intended to be an easy introduction to the subject, so the author's intention is not so different from other "how-to" books in computing technologies. And yet this book has something different: the author is a person with a clear view and understanding of layered reality of computing and technologies. He is modest but he sees clearly and widely. Such writers are rare.

As a book, what is best about it may be, in addition to the author's vivid understanding of computing reality and its environments, the fact that the author keeps the potential ---- the invisible but clearly directed potential, or vision, of MDA in his mind throughout his writing, especially in its initial part (Part One and Two). And I think that all the more because of that, the author may be somewhat frustrated in the present state-of-the-art understanding of MDA architecture, apart from (naturally) being unsatisfied with concrete realisation of what we have understood well so far. This is particularly visible when he discusses the stages where we wish to add more and more constraints to models in order to make it come closer to executable specifications, or programs, beyond static structure. And this tone of honesty is also a good point of this book.

From our concern, there is a major, but necessarily inevitable, lack of treated topics in this book --- a discussion on concurrency and interaction. Why inevitable? Because these are the very elements for which software methodologies are not yet well-understood, lacking a conceptual basis.

But concurrency and interaction are sheer necessity when we design a large information system , building which is quite like building a sky scraper. Frankel said (Chapter 1, Page 27):
....Yet virtually all attempts to build high-rises succeed. Now imagine that we constructed such buildings without detailed architectural blueprints and engineering specifications. What would the success rate be?
And then continues with a hilarious paragraph:
Michi Hennig ... pointed out that the software book racks [in the bookshop] are filled with titled such as Teach Yourself C++ in 14 Easy Lessons, Java for Morons, CORBA For Dummies, Complete Idiot's Guide to WIn32, and so on and so forth. He challenges us to imagine how odd it would seem to come upon books with titles such as Brain Surgery in 14 Easy Lessons, Air Traffic Control for Morons, Bridge Design for Dummies, or Complete Idiot's Guide to Contrat Law.
And he observes that, in spite of the need to produce a high-quality software, it is still hard for the "culture of rigour" to take hold in computing: he blames this to, as one of the reasons, constant changes ("volatility") in technology. Frankel then introduces UML and associated ideas and how they can be used, with many vivid examples, in model development, whose underlying tone may be summarised in his thesis that, in MDA, we focus on models as "development artifact" instead of models as "design artifact". I also appreciate very much his emphasis on the need for using assertions, or Design-By-Contract, as extensively as possible throughout the development stages (it looks the current DBC technology is somewhat limited: we shall discuss this point in our later posts).

We said that it is essential for us to consider concurrency and interaction when we architect a huge software system. If so, how can we develop software if we are without models which specify essential structures of concurrency and interaction? But what does this term "structures" mean in this context? What aspects of concurrency and interaction should we model? And using which methods?

To answer these questions, I wish to put forward several theses, some of which are:

(1) In sequential programs, a model wishes to hide interaction since they are often just implementation. In concurrent programs, interaction is the target of modelling: it is not part of implementation.

(2) State machine cannot by itself cleanly model practical interaction for concurrent processes.

(3) Sequential or concurrent, abstract behaviour is the target of modelling: abstract behaviour is not part of implementation if our viewpoint is in that level. And this abstract behaviour consists of at least two parts, "structural" part and "dynamic" part, the former articulating invariant part of the whole behaviour (such as signature) while the latter representing variable part of the whole behaviour.

There are other basic themes which we shall treat such as those concerning infrastructural elements in concurrency: Frankel does not try to be pure in his book, referring to and admitting the importance of many ephemeral technological elements while not losing his central vision. That freedom gives this book a real charm. So we should discuss about the environment whenever necessary.

Wednesday, 27 June 2007

Reading Frankel's Book

I am now reading Frankel's book, whose title, author and link are:

Model-Driven Architecture (David S. Frankel) (link)

This looks --- and intended to be --- just as one of those "how-to" books: but it is in fact a surprisingly readable and (albeit being rough) contains substantial discussions.

I will write on this book tomorrow.

Tuesday, 26 June 2007

not much time today...

(1) Frankel has not arrived yet

(2) UML semantics is organised following ML: syntax, static semantics and dynamic semantics. However the last "dynamic" part is mostly absent.

(3) UML contains action semantics which allows us to model dynamics. It's worth noting that the primitives for defining actions include instance creating, such as:
This system will generate one "red" instance of this class, then another instance but with colour "green", then do something (which we leave vague) and later create a "blue" instance and combine all together and put them into a big bag.
It is worth noting that a class in OOPL can have a code in the shape of so-called "static methods" (in Java jargon): so a class does have a code in it, it is not purely declarative even if we forget its code for its instances. What does this tell us? It tells us that, all the more because of this, we should be careful in distinguishing the structural part (cleanly declarable) and the dynamic part (partly declarable by e.g. assertions in DBC but mostly hard to treat by declarative methods) in our modelling.

Monday, 25 June 2007

Waiting for Frankel's book

The book has not arrived yet: I have been waiting for it for some time.

For the time being let's see what the following document says about semantics of UML:

http://etna.int-evry.fr/COURS/UML/semantics/index.html
http://www.ii.uib.no/~rolfwr/thesisdoc/main.html

UML is aimed to help object-oriented articulation of software behaviour: as such it intends to be more declarative than operational (I wonder what has become of its "action semantics" component) and focusses on a lot of static aspects of OOPL-oriented modelling. This stems from two distinct natures in OOPL: its "message passing" (MP) side and its "knowledge representation" (KR) side. The MP side has a long tradition since Lisp (functions), Algol/Pascal/C (procedures) while the latter was originally an AI concept. We remember that the former aspect has been purified by researchers such as CSP, actors and CCS and later in the form of the pi-calculus.

The latter, the KR side, is in fact dynamic in nature in Simula and more recent executable OOPLs: but the declarative nature --- and usability as such --- of such notions as class hierarchy, attributes, instances, etc. was hard to hide from people's eyes, and some people thought of making the best of these ideas --- which does have operational side too but they can be hidden if you like --- as a way to organise modelling process and resulting artifacts, somewhat keeping the dynamic side of OOPL hidden.

I will continue on this topic in the next post tomorrow.

Sunday, 24 June 2007

holiday

I decided to make (generally) Sundays holidays for this blog. It's good to have holidays.

One short note however: in my posts so far and from now on, I am using the terms "models" and "specifications" internchangeably --- both are used informally to denote descriptions of behaviour which pick up some specific interesting aspects of software which you are making (interesting for you!).

So models do not have to be formal, they can even be a rough pictorial sketch of structures of systems you are going to make, even just a stroke of a red line which is something meaningful for you. Of course one problem of such an informal sketch is when you look it again later you may have forgotten what it meant.

So if such description --- may it be called a specification or a model --- is written in a clearly understandable way, and even in an unambiguous way, then it is useful, you can read it again to remember what you were thinking, other designers can read it and understand your intention, get hints how it can get updated and evolve.

That's why it is good to have a language, graphical or textual or otherwise, for neatly and clearly describe your non-executable ideas.

(The term "specification" has a connotation of formalism, verification, etc. .. which are surely some of the possible things we can do for non-executable descriptions: but they are far from all we wish to do with models and specifications, for example clarity is a virtue in both formal and informal descriptions.)

Saturday, 23 June 2007

a short reflection on models and programs.

Today I may not be able to continue the discussions on the previous post (link). Instead I wish to present a short reflection on the meaning of "description of behaviour".

Modelling or programming, we describe how software behaves. While what matters is how it works, that is, what effects it has on people, on machines, on other programs and processes, one basic significance of models/software development lies in that, apart from their ultimate execution, apart from the use of a model for its translation into a program, it has an independent status as a description. This is so in many levels.

For example Matthew Rawlings told me how this is so in real corporate architecting: a high-level documentation, or a model, is used not only as a top-level specification from which executable programs are constructed through refinement: high-level description has its own function, is used when architects think about the general principles of the design of corporate information systems, when corporate business strategy is discussed among management.
Such descriptions will be updated and augmented independently from its lower or higher layers (even though basic consistency needs be naturally maintained), since they offer the representation of general ideas of computing behaviour in question.

The situation is the same in executable programs. Executable programs do not exist solely for execution: the central point in programs is that, before execution, it has the shape of program texts, to be read and perused and edited by others: their details will be re-examined, designs will be appreciated and evaluated, the texts will be re-formatted for readability, refined in the use of algorithms, models extracted and program analysis applied, and of course they will be used for debugging and as a result will be repeatedly re-rewritten, as if it were a piece of a poem, a piece of a short story, or a section in a long-winding novel.

Software models/specifications and programs present a quintessential form of hyper text. Their nature is more functional than literary texts, to be sure, but still they are nothing but texts, and this is how they allow engineers and analysts and managers to communicate and collaborate, in order to design, build and think about software systems which do function. With0ut this material basis, it is almost impossible for people to discuss design, effects, side-effects and future prospects of computing systems.

Of course not all information are in the texts: without those implicit knowledge, communication never works (as in any communication activities). But they have a central place in all our works about systems design and development.

This special nature of models and programs is one of the reasons why it is worth treating the design of description languages for them seriously, why I believe that it may deserve a scientific basis. On this last point --- what I mean by "scientific basis" --- I will be able to discuss again in later posts: but it is worth stressing that, at each level of description hierarchy (which may be looked either from the viewpoint of chief software architects or from the viewpoint of proud programmers), the result of description, be it a C-program or a UML model, plays a fundamental role in practice, over and over again and through a long period of time (in fact the identity of a program/model resembles identity of a poem which is revised again and again) . There are many things we can say from this observation such as authorship of programs and models, but we leave these discussions to later occasions.

Friday, 22 June 2007

Programming or Modelling? Part 1

Before even starting to discuss the central elements of our scribble design, one key point needs be answered. When we scribble interactions, is it for programming interations or for specifying (or modelling) interactions? In fact what is the difference between "programming" and "specification/modelling"?

When you write a sequence diagram in UML, certainly you are not programming: it may as well be part of your programming activity, but what you creat --- the sequence diagram --- is different from programs.

Programs describe (or prescribe) behaviour, the behaviour which you wish to realise when programs are executed. So if it is a sorting program, then you specify the behaviour by which it receives a list and returns the same list except its elements are ordered. Or if it is a word processor then it realises the behaviour by which characters are displayed and edited and printed following your typing in the keyboard. The same with for example video games. So programs realise behaviour.

Well we can be more precise taking into consideration how in reality a big software is made up of. A behaviour is often made up from sub-behaviour which is not so interesting from human usage but is essential to realise the overall behaviour. For example the only interest in the sorting program is when we input a list of (say) integers then it outputs, after some calculation, the list of integers which are the result of sorting the original integers. But for realising this behaviour it should contain a sub-behaviour to extract an integer from a given list, compare two integers, reorder and create a new list, for instance. So these algorithmic (and other) sub-behaviour is also part of the essential "behaviour" we wish to realise.

The same thing can be said about communicating programs. Consider an application whose key elements are communications among endpoints (such as business protocols). At each endpoint of that application, one or more processes are running, and each such process may consist of many threads. While what we are most keenly interested in would be the top-level interactions among these endpoints, these threads may be communicating among each other to calculate the value to be communicated: for example each of these threads may be running in a different core and they may together be calculating FFT (fast fourier transform). In such calculation, a thread may be engaged in even 1 million communications per second: and the accumulation of these communication sub-behaviours realises "the" overall communication among the endpoints, that is the business protocol we are interested in.

Well this has been a detour: the main point is that the programming is for description of behaviour to be realised when that program is run. In your description computation is encapsulated which is jumping and kicking if it is unfolded. what a fun.

Specification, on the other hand, is a very different activity. First we should admit that specification is also about behaviours: but this time we describe behaviour by writing down their properties. We write properties --- such as: if this program is run under such and such conditions it will terminate and gives a good result. For example

{x=n} P {x=n!}

If P is a program, this specification says that if we start from the content of x to be n then
after P runs we get the factorial of n as the content of x. We can extract the property part of this specification:

{x=n}, {x=n!}

so this pair gives us a specification of a program. It does not specify in detail how P (a program) is realised, but it does specify a slice of its properties which may be interesting to users. We may also call specifications models, since models are abstraction of behaviour, and properties are nothing but abstractions of programs behaviour to be realised.

And just as we need a good language to do sound, consistent and scalable programming, we need a good specification/modelling language to do sound, consistent and scalable modelling/specifications. The question is: how does a language for programming relate to a language for specification? And which kind of languages do we need for our present purpose? The first one is a general question, but is a basis to answer the second question which reflects our own present needs.

On these questions, let's think on in the next post (since it is getting long and I am getting hungry).

Thursday, 21 June 2007

Starting...

This is a diary for the design of scribble, a language for describing application-level interactions among distributed agents. A major part of this language will be designed over this summer --- the summer of 2007.

The immediate purpose of this language is to describe business protocols, in particular financial protocols, especially those in the public domain. Once described we wish to implement that protocol and execute it. Financial protocols can get complex and can incorporate any kinds of high-level communication actions among participants. So this language is also for all these application-level protocols.

You may write a simple script just you write a Ajax-based java script which uses other sites and services and combine them and does something nice with them. Those other sites and services will also be doing the same for yet other sites and services.

Or you are an EU official and should publish the description of a protocol, together with a message format, so that huge/large/small businesses can use them to maintain interoperability of their applications and machines in Euro area. Your protocol description should be able to be used for checking whether an endpoint application really conforms to that protocol, given an appropriate tool. Your description should be understandable, and should be able to be presented in multiple ways. And in future, as the needs and technologies change, we should be able to adapt and extend the description.

All in all it needs be a general-purpose language for describing application-level protocols. We wish to easily scribble interactions --- and execute them.

The language is co-designed with Gary Brown with inputs from our other dear colleagues whose names you will encounter as we proceed (surely you can help, Marco and Ray!). It will be based on a preceding language with the same name. But we shall re-examine the key design elements from their foundations.

This blog records the design evolution of this language. My objectives are:

(1) It will reflect design ideas real-time, with as immediate responses to any new ideas, contemplations, inputs from various sources, .. as possible.

(2) The corollary of (1) is we shall not mind updating the design decisions even if they have been recorded. A summary will be posted in some intervals.

(3) It will record the design philosophy, different design alternatives and their trade-offs: writing these down is a central objective of having this blog.

Discussions on concrete design elements begins in the next post.

Let's have a fun together...