Microsoft DotNet

Tue Mar 20 00:50:23 CST 2001

>   The presentation is on Wednesday.  Any feedback you could give me
> before then is appreciated.  

Here are some typos I noticed:

Typo on the slide about CLR:
"...the familiar *drag-and-drog* design time environment of VB..."

"to re-written" -> "or re-written" (?) on slide about CLR - continued:
"Besides changing a check-box, no code has to be changed to re-written
to support the separate platforms."

Typo:  "allready" and "build-in" on the "Advantages/Disadvantages"
slide.

Here are some real thoughts:

This paragraph caught my mind:
<QUOTE>
XML is just a universal, ASCII database definition lanaguge. It gives
meaning (indexes, records, fields, etc) to text through the use of
tags. XML is very good at "object orienting" data; it can be used to
give a complete description of a service, book, vehicle,etc.
</QUOTE>

(**Warning:  You're about to be subjected to a stump speech.** : )  )

I've seen a lot of similar statements in much of what's been written
about XML.  Many authors suggest that the fruits of XML are better than
their predecessors (e.g.- EDI standards like X12 and EDIFACT or even a
"flat" file with corresponding "file layout" metadata that supplies
field labels/descriptions).  Some even claim that XML-based stuff is
better because XML documents provide meaning/semantics.  In truth, XML
only supplies labels/syntax/grammar - just like its predecessors.  

(But XML-based standards are an improvement over previous standards in
that XML is the agreed upon metalanguage for defining the standard
transaction sets, documents, messages, etc.  The preceding standards
never had an agreed upon metalanguage (as far as I know), but they
certainly could have.)

But I'm not just saying all this to be picky or critical; I have a
suggestion...  : )

To really get beyond the labels/syntax of XML documents/messages and
into "meaning", I think we need supporting conceptual data models.  A
conceptual model can be represented as an entity relationship diagram,
a UML class diagram, etc.  The model explains the domain of discourse -
i.e.- the thingies we're talkin' about, how the thingies are related, 
their functional dependencies, etc.  and it provides a framework within
which to begin to understand the meaning of all those labels and
collections of fields found in XML documents.

Now it may sound like I'm just getting all abstract, theoretical, and
stuff, so I'll give an example to bring it down to the concrete:

Say you work for a health plan like Priority Health and you've got an
XML DTD that defines a standard structure for a message for
transmitting information about members (i.e.- the customers who are
covered by insurance).  Say you want to use it to communicate member
info from system X (at company A) to system Y (at company B).  Now
let's say that the inventor of that DTD for communicating member info
defined a "Member ID" field (sort of the identifier) together with a
bunch of name, address, phone number, etc. fields.

Now the programmer writing the code in system Y to do something with
these XML-based member info messages looks at the corresponding
structures in system Y and finds, say, a MEMBER table.  Or, wait, no,
better yet, he finds some kind of API for adding members - it might
even be based on Microsoft's dotNet.  This API expects to receive a
bunch of member info including a member ID, name, address, phone
number, etc.  From there the system takes care of internalizing the
member info.   
So the programmer hooks up systems X and Y by feeding the XML-based
member info messages from system X into system Y using this API.

Or, heck, better yet, lets say that the creators of system X and system
Y "agreed on" using this XML standard for member info messages.  So we
can just assume that system Y knows how to consume these messages being
produced by system X.

Somebody gets these two systems talking to each other.  System Y
consumes member info messages from System X, just as advertised. 
Things are working.  Everything seems cool.

Things run perfectly for a few days or weeks.  Then one day, someone
runs a report out of system Y and notices all kinds of duplicate
records for members.  The person sees, for example, records like this:

373997744-10  Meulenberg   Alana   1475 Capricious Dr.   Genocide    MI
366081234-10  Meulenberg   Alana   1475 Capricious Dr.   Genocide    MI

What happened?

Some programmers have to "look under the hoods" of systems X and Y to
see what might be the matter.  They find that System X has a "member"
table.  And System Y has a "member" table too.  The tables even have
similar columns.  So what's the problem?

Well after a deeper investigation, it turns out there is a semantic
mismatch between system X and system Y (and maybe even between company
A and company B !).

System X considers a "member" to be a person covered under a particular
contract or policy.  Every time a person gets covered under a different
contract, a new "member" record is stored in the MEMBER table.  So the
member ID uniquely identifies the combination of a human with some
coverage contract.

On the other hand, System Y considers a "member" to be an individual
human being.  In system Y, when a person gets covered under a new
contract/policy, their member ID is simply associated with another
contract ID.  In other words, system Y has a member entity, a contract
entity, and an associative entity for relating members to contracts
(and vice versa).  A member can be covered under many contracts and a
contract can cover many memebers.

This stuff is kinda painful to describe with words.  It's much better
explained visually with ER diagrams or UML class diagrams.

Hopefully this illustrated the point that XML doesn't really address
the problem of semantics.
(And this was just a simple example of semantic mismatch.)

(Matt, you also described XML as a "database definition lanauge". 
Interestinly, the way one would typically define a DTD doesn't describe
what's going on in system Y because system Y has a many-to-many
relationship between members and contracts.  Typically, a DTD could
call for members to be nested within contracts (sort of a 1-to-many
relationship from contracts to members).  Or it could call for
contracts to be nested within members (a 1-to-many relationship from
members to contracts).  The truth of the matter -- that there's a
symmetrical many-to-many relationship between members and contracts --
would be absent from a typical DTD.)

Finally, just to support my claim that I'm not spouting off
dope-induced ideas, the HL7 standards body has a similar notion.  They
don't just define HL7 messages ("HL7:  Now with purple horseshoes and
fortified with 100% of the recommended daily allowance of XML!"); they
also define a Reference Information Model (i.e.- the "RIM").  Their RIM
provides a framework within which to begin to understand the meaning of
all those labels and collections of fields found in HL7 messages.  You
can read about it here:
http://www.hl7.org/Library/standards.cfm

+Joel

__________________________________________________
Do You Yahoo!?
Get email at your own domain with Yahoo! Mail. 
http://personal.mail.yahoo.com/