[Omaha.pm] lines2perl: partition.pl

Jay Hannah jay at jays.net
Sat Apr 29 14:09:08 PDT 2006


Paul Johnson wrote:
> On Fri, Apr 28, 2006 at 06:47:27PM -0500, Jay Hannah wrote:
>> Any chance you've got some magic up your sleeve which can prune my GEDCOM 
>> to only those people related to me? Some magical concoction of toolsets?  
>> Thanks!
> 
> I don't think there's anything built in (he says without looking), but
> there are ancestors and descendents methods which could be used to
> create a closure in a fairly inefficient way.  Or parents, children and
> siblings methods with could probably do the job more efficiently.  Or
> I'll bet there's another useful LifeLines script out there somewhere you
> could translate :)

Oooo...

ftp://ftp.cac.psu.edu/pub/genealogy/lines/reports/INDEX.html

partition
   Jim Eggert
   Version 8, March 31, 1995
   Requires: LifeLines 2.3.3 or higher
   This program partitions individuals in a database into disjoint partitions. Each partition is composed of people related by one or more multiples of the following relations: parent, sibling, child, spouse. There is no known relationship between people in different partitions. The partitions are written to the report in overview form, full form, or in GEDCOM form, with the partitions delimited by a long line. You will have to edit the GEDCOM output to divide it up into its constituent files to be able to import the GEDCOM back into any application.

Sounds perfect. So I used lines2perl to create a Perl script and went for it -- all partitions for my entire GEDCOM. 8 HOURS later it still wasn't done. Amazing. My GEDCOM is only 1.5 MB (4000ish people), how can it possibly got for 8 hours?

So I restarted, trying to get just MY relatives:

$ time ./partition.pl -gedcom_file jay.ged
reading.................................................................
Enter a person for just one partition, nothing for all partitions: I0313
Enter 0 for overview, 1 for full, 2 for GEDCOM report: 2
Enter filename for GEDCOM partition: new.ged

1: 1 5 17 3    167225824
          I225   Shirl                         24 Jul 2006
55 63 69 125 125 174 178 184 5    167225824
          I2     Helen FORD                     1 Jan 1912    1 Jan 1998
464 464 478 793 793 843 845 1062 7    153290844
          I513   William A. SETON               9 Jan 1861    9 Jul 1866
1519 1519 1595 1598 1804 1809 1809 

That's burned 6.5 CPU HOURS so far. Wow. Still using < 50MB of RAM.

$ ps -ef | grep perl
jhannah  10280 10014 99 09:22 pts/1    06:31:15 /usr/bin/perl -w ./partition.pl -gedcom_file jay.ged
$ ps l
F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
0  1000 10280 10014  25   0  49576 47812 -      R+   pts/1    391:17 /usr/bin/perl -w ./partition.pl -gedcom_file jay.ged

Even top is amazed by this process:

top - 16:06:19 up 5 days, 18:19,  1 user,  load average: 1.00, 1.00, 1.00
Tasks:  77 total,   3 running,  74 sleeping,   0 stopped,   0 zombie
Cpu(s): 99.7% us,  0.0% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.3% si
Mem:    256968k total,   244480k used,    12488k free,    41020k buffers
Swap:   262136k total,     2192k used,   259944k free,    84036k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                        
10280 jhannah   25   0 49576  46m 1812 R 99.8 18.6 402:37.17 partition.pl           

:)

This project has forced me to finally learn to use GNU screen. Way cool. I love discovering awesome tools that stopped being developed in 1994. :)

I wonder if the native LifeLines script would be fast?

j




More information about the Omaha-pm mailing list