[Chicago-talk] Speaking of threading

Steven Lembark lembark at wrkhors.com
Tue Dec 9 13:29:48 CST 2003



-- Greg Fast <gdf at speakeasy.net>

> On Mon, 08 Dec 2003 08:35:24 -0600, Steven Lembark <lembark at wrkhors.com>
> wrote:
>> Noone ships with much of anything threaded by default, aside
>> from perhaps POSIX & linux threads in libc.
>
> You mean no one ships with threaded Perls by default?

Not on *NIX that I've seen.

> Frankly, having worked lot with standard Unix fork()-based
> multiprocessing and with more "modern" threaded environments (Java,
> Python, Ruby, etc), the advantages of threading over fork() are pretty
> self-evident.

Yeah: overwritten varibales, semaphonre and mutex lock waits :-)

Threads can be useful, but forks are actually better for work where
the child process may have to die independently of the parent, and
threaded exec's are a real pain.

> As an example example, one of the common patterns in Java programs is
> the thread/task pool, which is pretty trivial to implement in an
> environment with lightweight threads.  Doing it with fork() is pretty
> painful (one of the reasons the Subversion folks chose to use Apache
> as their engine, rather than trying to write their own pre-forking
> server scratch...  a C example, but the techniques and issues for
> forking with Perl are basically the same).

Until your pool is in a cluster, which doesn't support any
notion of a distributed thread even it can handle a virtual
fork across nodes.

> So I've always been a little confused that ithreads haven't been
> pushed more than they have.

They can be a bear to manage properly at low levels, and
easily lead to reduced performance if you are not careful
to manage synchronization points. The main problem for
most cases is lack of private data (which perl-5.8 solves
by privatizing lexicals). Closures provide one good way to
deal with this by dispatching threads to deal with non-
overlapping portions of data but you still get into issues
with returning data into the common pool.

Threads can be a Real Nice Thing (tm) but are like any other
technology: used incorrectly they can cause pain.

> [2] By "portably", I mean "given two ports of language X, the same
> code will run under both".  fork() is not "portable" (though it's
> *mostly* portable, these days).

Neither are threads, depending on your threading model
and what's supported on the configuration (e.g., cluster).
I'm doing more work on Beowulf clusters lately, which
leaves message passing as the best way to move data. Given
a decent high-level message handling library (e.g., PVM)
it's no more work to fork-and-wait than split off threads.


--
Steven Lembark                               2930 W. Palmer
Workhorse Computing                       Chicago, IL 60647
                                            +1 888 359 3508



More information about the Chicago-talk mailing list