[sf-perl] [sfpug] Practical theory: graphs

Rich Morin rdm at cfcl.com
Thu Dec 15 08:56:08 PST 2005

[Somehow, I missed the original message, so I'm working from DF's response.
 Hope I didn't miss anything crucial.  -r]

QW> I'm looking for a practical, lay book on graph theory, especially
QW> one that describes how to represent graphs in RDBMSs.
QW> ...
QW> I specifically don't want formal proofs of mathematical properties
QW> of graphs.  What I want is something more emic: What can you
QW> represent with x kind of graph?  What are its properties?  What are
QW> the algorithms for traversing (or otherwise processing it)?  In
QW> short, what is it good for?

Before I get into the real question, was "emic" a typo or are you using
Pike's neologism in some arcane manner?

>As Pike defines it, the emic perspective focuses on the intrinsic cultural
>distinctions that are meaningful to the members of a given society (e.g.,
>whether the natural world is distinguished from the supernatural realm in
>the worldview of the culture) in the same way that phonemic analysis
>focuses on the intrinsic phonological distinctions that are meaningful to
>speakers of a given language (e.g., whether the phones /b/ and /v/ make a
>contrast in meaning in a minimal pair in the language).  The native
>members of a culture are the sole judges of the validity of an emic
>description, just as the native speakers of a language are the sole judges
>of the accuracy of a phonemic identification.


I've been looking into graph representations for a while, concentrating on
the use of graphs to represent concepts, metadata, etc.  As you may know,
there are all sorts of graph-based representations for this sort of thing,
including Conceptual Graphs, RDF, Topic Maps, ...  The variations are both
structural and semantic, but assorted efforts are being made at solving the
interchange problem, etc.

At the risk of being yelled at by DF, I'll note that an easy and flexible
way to store a graph in a DBMS is as an edge list, stored in a single table:

  source  |  target  |  edge_type

This technique is used by RDF, the Protege folks
etc.  Aside from flexibility (adding new structure is REAL easy), it has the
advantage that all edges are bi-directional.  Finally, it's fairly fast to
answers to questions such as "find all of the nodes that are connected by an
edge of type E".  In some other representations, this would require walking
entire graph!

However, it is not without problems.  As DF would point out, it discards
of the benefits of the RDBMS, making the application responsible for
consistency, etc.  Also, the apparent simplicity of each edge must be
against the number of edges that are needed to do anything useful.  Finally,
scaling (e.g., performance on large graphs) can be a real issue.

I've recently become quite interested in Object-Role Modeling (ORM), which
offers a way to map conceptual descriptions into RDBMS tables.  It looks
it combines nice semantics with industrial-strength DBMS design.

Basically, the ORM approach starts with "examples", such as:

  Quinn travels by bicycle.
  David travels by car.

It then formalizes these into "facts", adding entity type information, etc.

  The Person named "Quinn" Travels_by Vehicle "bicycle".
  The Person named "David" Travels_by Vehicle "car".

These facts are used to define "instance diagrams", which are generalized
"conceptual schema diagrams".  These diagrams have nodes for entities (e.g.,
Person, Vehicle) and N-ary relationships:

  "... travels_by ..."
  "... traveled_by ... to ... on ..."

Relationship nodes are subdivided into "roles", allowing the diagram to show
which entity plays which role(s) in each relationship.

After assorted constraints (etc) have been added, the diagram can be turned
into a set of table definitions.  Each relationship turns into a table; each
role turns into a column:

  Travels_by:   Person   Vehicle
                ------   -------
                Quinn    bicycle
                David    car

The definitive book on the subject is

  Information Modeling and Relational Databases:
  From Conceptual Analysis to Logical Design

  Terry Halpin
  Morgan Kaufmann, ISBN 1-55860-672-6

Terry Halpin also maintains a web site at http://www.orm.net.  Although no
Open Source tools currently exist for ORM, this is in the process of being
addressed.  Let me know if you want more details...


P.S.  Note that "ORM" is a dangerously overloaded acronym.  A Google search
      for ORM will bring up all sorts of information on Operational Risk
      Management, Object-Relational Mapping, etc!
email: rdm at cfcl.com; phone: +1 650-873-7841
http://www.cfcl.com        - Canta Forda Computer Laboratory
http://www.cfcl.com/Meta   - The FreeBSD Browser, Meta Project, etc.

More information about the SanFrancisco-pm mailing list