<br>
<div>an effective way to chunk and parallelize, using the OS instead of the language, is to write work units into a directory using DirDB, and fork many workers (or launch them independently) that consume and delete the work units. Everything gets its own process.</div>
<div><br></div><div>You can have more control over locking if you use sqlite for the IPC; you can keep everything in memory instead of disk using anonymous pipes and select.</div><div><br></div><div>manager node's code looks something like</div>
<div><br></div><div>Â Â use DirDB;</div><div>Â Â tie %Q, DirDB => 'QueueDir';</div><div>Â Â while(<>){</div><div>Â Â Â Â Â %{$Q{"$$wu".++$counter}} = ExpressLineAsWorkUnitPairs($_);</div><div>Â Â Â Â Â $Q{"$$wu".++$counter}{READY} = 1;Â Â Â </div>
<div>Â Â };</div><div><br></div><div>worker code looks something like</div><div><br></div><div><div><br class="Apple-interchange-newline">Â Â use DirDB;</div><div>Â Â tie %Q, DirDB => 'QueueDir';</div></div><div>
  fork;fork;fork;  # now you've got 8 workers</div><div>  for(;;){</div><div>        @WUs = keys %Q;</div><div>        for (@WUs){</div><div>           $Q{$_}{READY} or next;</div><div>           mkdir "QueueDir/$_/GOTIT", 0777 or next; # this will succeed once</div>
<div>Â Â Â Â Â Â Â Â Â Â Â DoWorkUnit(%{$Q{$_}});</div><div>Â Â Â Â Â Â Â Â Â Â Â delete $Q{$_};</div><div>Â Â Â Â Â Â Â Â };</div><div>Â Â Â Â Â Â Â Â sleep (2+rand 5);</div><div>Â Â }</div><div><br></div><div>Run the manager on new input as it appears, and the workers will consume it.</div>
<div><br></div>