[Melbourne-pm] XML Namespace

Scott Penrose scottp at dd.com.au
Mon Oct 4 20:50:32 CDT 2004


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Dudes

I want to take a block of XML that I get in (eg: download some XML via 
soma URL with LWP) and output it in my block of XML.

There are two issues I have to try and solve.

1. The XML I get may be invalid (happens all the time to us!)
2. The XML I get may not have a specific name space.

Part 1 is fairly easy to deal with, put it through any XML Parser and 
make sure it is valid XML. This means I can then pass it onto my XSLT 
and get it to decide if it is a valid type and deal with it.

My problem is part 2. Lets say I download some RSS or ATOM data. Now I 
want that to go through particular parts of my XSLT to convert to HTML. 
My current XSLT process all sorts of things, so therefore, to prevent 
overlap, I want to make sure all the data is in the name space atom:

Example - RAW Input from LWP (taken from 
http://www-106.ibm.com/developerworks/xml/library/x-think24.html)

<?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://purl.org/atom/ns#">
   <title>Welcome to Stanza Web</title>
   <author>
     <name>Uche Ogbuji</name>
   </author>
   <link rel="alternate" type="text/html"
         href="http://stanzaweb.art/2004-06-01/welcome"/>
   <modified>2004-06-01T10:11:12Z</modified>
   <content type="application/xhtml+xml" xml:lang="en">
     <div class="article" xmlns="http://www.w3.org/1999/xhtml">
       <p>Welcome to
         <a href="http://stanzaweb.art/">Stanza Web</a>.
         Come back often to keep track of the best in modern poetry.
       </p>
       <p>This site is powered by
          <a href="http://atomenabled.org">Atom</a>
       </p>
     </div>
   </content>
</entry>


Now lets say that I do something special with "<a/>" in my code, 
turning them into other things - which would not necessarily be 
appropriate for ATOM, therefore, I want to identify this as all atom. 
Note that the <?...> would also be removed, as this would become a 
fragment in a larger document.

<atom:entry xmlns="http://purl.org/atom/ns#">
   <atom:title>Welcome to Stanza Web</atom:title>
   <atom:author>
     <atom:name>Uche Ogbuji</atom:name>
   </atom:author>
   <atom:link rel="alternate" type="text/html"
         href="http://stanzaweb.art/2004-06-01/welcome"/>
   <atom:modified>2004-06-01T10:11:12Z</atom:modified>
   <atom:content type="application/xhtml+xml" xml:lang="en">
     <atom:div class="article" xmlns="http://www.w3.org/1999/xhtml">
       <atom:p>Welcome to
         <atom:a href="http://stanzaweb.art/">Stanza Web</atom:a>.
         Come back often to keep track of the best in modern poetry.
       </atom:p>
       <atom:p>This site is powered by
          <atom:a href="http://atomenabled.org">Atom</atom:a>
       </atom:p>
     </atom:div>
   </atom:content>
</atom:entry>

Now in my XSLT (and other stages of the pipeline) I can guarantee that 
no one will modify the content inside until it gets to the atom: 
specific code.

So - one suggestion has been to use regular expressions - this 
technically works (remove all namespace and add your own) but it is not 
ideal, and does not pick up the validation part.

Any ideas on how this can be done "really" simply ? (by simple I mean 
short code and reasonably fast, although the later is not as 
important).

Scott
- -- 
* - *  http://www.osdc.com.au - Open Source Developers Conference * - *
Scott Penrose
VP in charge of Pancakes
http://linux.dd.com.au/
scottp at dd.com.au

Dismaimer: If you receive this email in error - please eat it 
immediately to prevent it from falling into the wrong hands.

Please do not send me Word or PowerPoint attachments.
See http://www.fsf.org/philosophy/no-word-attachments.html
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (Darwin)

iD8DBQFBYf3rDCFCcmAm26YRAt8qAJ488IebqcYD0X4ImjVHBSIpTbmg5QCgru+j
7R17GC50+xVwoAa4ReBhoLw=
=5E/Q
-----END PGP SIGNATURE-----



More information about the Melbourne-pm mailing list