[sf-perl] threads, threads::shared, and multiple locked variables

Thu Jul 9 17:08:18 PDT 2020

On 2020-07-09 14:40, yary wrote:

> On Wed, Jul 8, 2020 at 7:45 PM David Christensen wrote:

>> I have been experimenting with concurrent programming with Perl, 
>> threads, and threads::shared.  I have successfully used a shared 
>> variable, lock(), cond_wait(), and cond_broadcast() to ensure 
>> thread-safe access to one shared resource.
>> 
>> 
>> I would now like to ensure thread-safe access to multiple shared 
>> resources.  Specifically, if a thread desires access to any one of 
>> several shared resources, how can it block, wake up when one
>> resource is available (and locked), and determine which resource to
>> access?

> How about code with # what goes here? comments
> 
> I'm a little fuzzy on what the question is. It sounds a little like 
> the old-school "select" call that takes a bit vector representing
> file handles, which paused and then filled 3 other bit vectors saying
> which handles were ready to read, write, or had an exception-is that
> what the question is-using "threads" mechanisms to signal which of
> several things are ready?
> 
> perldoc -f select shows what I remember as similar: my ($nfound,
> $timeleft) = select(my $rout = $rin, my $wout = $win, my $eout =
> $ein, $timeout);

Yes, that's the idea -- except that I am working with shared variables 
[1], rather than file handles.

> ...my go-to in these situations is to `use MCE;` and figure out
> which of its many idioms applies...

MCE [2] appears to provide a framework for creating multiple identical 
workers and partitioning work among them (?).

My question is more related to first principles of concurrent programming.

I am attempting to create a general-purpose Perl library for flow-based 
programming (FBP) [3], whereby multiple independent "processes" send 
data "items" to each other via communication "connections" and work 
together as a overall program.

FBP's are easily represented by directed graphs -- "processes" are nodes 
(boxes) and "connections" are edges (arrows).  For example, here is a 
FBP to read a file (src), compress the contents, tee the compressed 
stream into two streams, write the first duplicate stream to a file 
(dst.gz), checksum the second duplicate stream, and write the checksum 
to a file (dst.gz,md5):

                        +-----+ -> write(dst.gz)
read(src)-> compress-> | tee |
                        +-----+ -> checksum -> write(dst.gz.md5)

I use 'threads' to create and manage one thread for each process and I 
use 'threads::shared' to create and manage one array for each 
connection.  These modules work for simple programs like the above with 
processes that have single inputs and single or multiple outputs, but, 
in general, FBP's will include processes with multiple inputs:

... proc1 -> +------+
              | func | ...
... proc2 -> +------+

So, I need a mechanism for a thread to go to sleep on multiple shared 
variables, wake up when one of those variables signals, and then know 
which variable woke it up.

I am starting to believe that my question is really a feature request 
for threads::shared (and/or Perl?).

I am open to alternative approaches, but the goal is a general-purpose 
FBP library.

(As a work-around, I am implementing a polling solution with a 
configurable sleep delay.)

(Perhaps I could build a work-around using file handles and 'select', 
but file handles are scarce resources and a large FBP could consume many.)

David

[1] https://perldoc.perl.org/threads/shared.html

[2] https://metacpan.org/pod/distribution/MCE/lib/MCE.pod

[3] https://en.wikipedia.org/wiki/Flow-based_programming