[Kc] Threads/fork/Event-based programming oh my (anyevent/coro)

David Nicol davidnicol at gmail.com
Mon Nov 5 10:17:54 PST 2012


an effective way to chunk and parallelize, using the OS instead of the
language, is to write work units into a directory using DirDB, and fork
many workers (or launch them independently) that consume and delete the
work units. Everything gets its own process.

You can have more control over locking if you use sqlite for the IPC; you
can keep everything in memory instead of disk using anonymous pipes and
select.

manager node's code looks something like

    use DirDB;
    tie %Q, DirDB => 'QueueDir';
    while(<>){
         %{$Q{"$$wu".++$counter}} = ExpressLineAsWorkUnitPairs($_);
         $Q{"$$wu".++$counter}{READY} = 1;
    };

worker code looks something like


    use DirDB;
    tie %Q, DirDB => 'QueueDir';
    fork;fork;fork;  # now you've got 8 workers
    for(;;){
               @WUs = keys %Q;
               for (@WUs){
                     $Q{$_}{READY} or next;
                     mkdir "QueueDir/$_/GOTIT", 0777 or next; # this will
succeed once
                     DoWorkUnit(%{$Q{$_}});
                     delete $Q{$_};
               };
               sleep (2+rand 5);
    }

Run the manager on new input as it appears, and the workers will consume it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/kc/attachments/20121105/9b64f2ec/attachment.html>


More information about the kc mailing list