[Chicago-talk] Building a hierarchy
briank at kappacs.com
Wed Mar 16 18:31:07 PDT 2011
I would try very hard to find an alternative source data format (perhaps
with level-dependent indentation?) if at all possible before spending a
lot of time trying to parse this one because it's ambiguous.
Consider an item on line "i." after an item on line "h.". Depending on
the type of the subsequent line, you may ("ii.") or may not ("V.") be
able to determine what type of line the "i." item is.
On 2011-03-16 11:31, Jay Strauss wrote:
> I need to build a hierarchy out of some data to load into a RDBMS.
> The data looks like below. I need to convert it to more like:
> code, desc, parent_code
> (where code is like "193200000X")
> I'm struggling.
> I think I could do this in a rigid manner by saying I have 4 indexes
> or levels:
> upper case roman
> lower case alpha
> lower case roman
> and keeping track where I am, and I the parent one level above.
> I'd like to do it flexibly, without having to know how many levels in
> advance (I get similarly structured data with # of levels and info
> from time to time).
> But I don't know:
> 1) whats the best structure for this (I'm thinking an array of arrays)
> 2) how to traverse the array without knowing my indexes, i.e. go one
> level up, go one level down
> Can anyone suggest ways to skin this cat?
> I.Individual or Groups (of Individuals)
> i.Multi-Specialty - 193200000X
> ii.Single Specialty - 193400000X
> b.Allopathic & Osteopathic Physicians
> i.Allergy & Immunology - 207K00000X
> 1.Allergy - 207KA0200X
> 2.Clinical & Laboratory Immunology - 207KI0005X
> ii.Anesthesiology - 207L00000X
> 1.Addiction Medicine - 207LA0401X
> 2.Critical Care Medicine - 207LC0200X
> 3.Hospice and Palliative Medicine - 207LH0002X
> 4.Pain Medicine - 207LP2900X
> 5.Pediatric Anesthesiology - 207LP3000X
> iii.Clinical Pharmacology - 208U00000X
> iv.Colon & Rectal Surgery - 208C00000X
> v.Dermatology - 207N00000X
> 1.Clinical & Laboratory Dermatological Immunology - 207NI0002X
> 2.Dermatopathology - 207ND0900X
> 3.MOHS-Micrographic Surgery - 207ND0101X
> 4.Pediatric Dermatology - 207NP0225X
> 5.Procedural Dermatology - 207NS0135X
> vi.Electrodiagnostic Medicine - 204R00000X
> vii.Emergency Medicine - 207P00000X
> Chicago-talk mailing list
> Chicago-talk at pm.org
Brian Katzung, Kappa Computer Solutions, LLC
Leveraging UNIX, GNU/Linux, open source, and custom
software solutions for business and beyond
Phone: 877.367.8837 x1 http://www.kappacs.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Chicago-talk