APM: Some progress on efficient filehandle reading, but threads too big

Brian Michalk michalk at awpi.com
Wed Feb 25 09:53:57 CST 2004


I managed to convert my code to threaded reads with very little change of
code, and was able to delete entire subroutines dedicated to insuring the
integrity using the older select method.

My problem now is that ithreads are HOGS.

For those of you that do not know, the built in threads in perl 5.8.x are
called ithreads, which means interpreter threads.  The threading mechanism
is not COW(copy on write), so that means for every thread spawned everything
is copied to the new thread and execution continues from there.  I'm talking
about interpreter, (un)shared variables, subroutines, everything.  Extremely
heavy on the memory.  Forking a process on the other hand uses the operating
systems COW mechanism, and Linux does much better with forked code.  The
interpreter will remain shared between all forked processes, so load as many
modules as you want without the X hit on memory allocation.

Now, there supposedly is a way to get lighter threads.  I've looked into
thread pools, but have not implemented that yet.  I've only looked at 'top',
to see what all these threads were doing, and it seems like my threads are
way too heavy.  On my 64MB machines, I can reliably spawn 8 threads or so
before the program terminates.  No disk swap available as there is no
read/write hard disk.

Looking at CPAN, Elizabeth Mattijsen has done a lot with threads.  I see
from her examples with Benchmark::Thread::Size the following output:
   #   (ref)        bare        full        vars         our      unique
   0    2172          +0          +0          +0          +0          +0
   1    2624 ± 4      +4 ± 4      +4 ± 4     +27          +4 ± 4     +27
   2    3004 ± 4      +2 ± 6      +2 ± 6     +33 ± 4      +8         +36 ± 6
   5    4126 ± 6      -2 ± 6      -3 ± 8     +29 ± 4     +10 ± 2     +27 ± 4
  10    5984 ± 8      -1 ± 8      +0 ± 4      +0 ± 6     +17 ± 4     +43 ± 6
  20    9694 ± 4     +15 ± 4     +15 ± 2     +13 ± 6     +32 ± 6     +58 ± 6
  50   20832 ± 4     +51 ±10     +50 ± 8     +50 ± 8     +68 ±12     +96 ± 6
 100   39392 ± 8    +106 ±10    +156 ±12    +108 ±10    +131 ±10    +155 ±12

 ==== bare ========================================================
 $VERSION = '0.01';

 ==== full ========================================================
 $main::VERSION = '0.01';

 ==== vars ========================================================
 use vars qw($VERSION);
 $VERSION = '0.01';

 ==== our =========================================================
 our $VERSION = '0.01';

 ==== unique ======================================================
 our $VERSION : unique = '0.01';

 ==================================================================

I'm not sure how VERSION plays into memory management, so this confuses me,
but it shows that 20 threads are consuming less than 10MB total.  This would
be great prformance for me.  From my 'top', I'm getting the following:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3815 root      16   0 28172  27m 1844 S  5.8 47.3   0:17.07 filter_dmi.pl
 3817 root       8   0 28172  27m 1844 S  0.0 47.3   0:00.00 filter_dmi.pl
 3818 root       9   0 28172  27m 1844 S  0.0 47.3   0:00.01 filter_dmi.pl
 3819 root       9   0 28172  27m 1844 S  0.0 47.3   0:00.07 filter_dmi.pl
 3821 root       9   0 28172  27m 1844 S  0.0 47.3   0:00.00 filter_dmi.pl
 3822 root       9   0 28172  27m 1844 S  0.0 47.3   0:00.07 filter_dmi.pl
 3823 root       9   0 28172  27m 1844 S  0.0 47.3   0:00.00 filter_dmi.pl
 3824 root       9   0 28172  27m 1844 S  0.0 47.3   0:00.00 filter_dmi.pl
 3825 root       9   0 28172  27m 1844 S  0.0 47.3   0:00.08 filter_dmi.pl

In case you are wondering about the pids, perl -v has this little snippet:
ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN

Why do my ten threads eat 27 megs, when Elizabeth doesn't get there until
after 50 threads?




More information about the Austin mailing list