[pgh-pm] Question on fork

Tom Moertel tom at moertel.com
Wed Apr 7 16:53:15 CDT 2004


Peter Williams wrote:
> Greetings...
> I have just joined this list, and wanted first to introduce myself.

Hi, Peter!  Welcome to PghPm.

> I hate to dive straight in with a question [...]

That's what the list is for, so feel free to dive in.

> So... does anyone know if a script that forks needs to be called 
> with GET?  And why is this the case?

Short answer:  You can use the POST method to submit forms to CGI
programs that fork, but you must be careful to avoid buffering and
timing issues.

Long answer: There is no restriction against using the POST method if
your CGI program forks (and I'm assuming we're talking CGIs here).
However, fork splits a process into two more-or-less identical processes
(the parent and the child).  In general, that means each copy will share
file descriptors (handles to open files, pipes, and so on) and have
duplicate copies of buffers, such as those that Perl creates to handle
I/O more efficiently.

Now, here's where the GET vs. POST stuff comes in.  When you submit a
form via GET, the form's values are communicated to the CGI program via
the environment variable QUERY_STRING.  This doesn't cause any
difficulties when forking because both parent and child have a complete
copy of the QUERY_STRING.

But the POST method is different.  When you submit a form via POST, the
form's values are communicated to the CGI program via the program's
standard input file descriptor.  Your web server pipes the form's data
into this descriptor and your CGI program reads it out.  Thus, when you
fork, both parent and child will *share* this descriptor.  If one reads
a line from it, that line will be forever gone; the other will not be
able to read it later.  Similarly, if the parent process reads from the
descriptor before forking, some of the data that the parent read may be
buffered, and the child will inherit a copy of this buffer.  So some
data might be double-buffered and get read twice -- once in the parent
and once in the child.

When mixing the POST method with forking, therefore, you must be careful
to know who is going to read what, and when.  One simple policy that
eliminates most of the concern is this:  The parent always reads the
full input into variables *before* forking.  Because Perl's CGI module
reads and parses the complete input when you create a new CGI object,
you can enforce this policy simply by creating your CGI object before
forking.

There is a similar concern with output -- knowing who will write what,
and when -- but switching from GET to POST won't change that.

Cheers,
Tom





More information about the pgh-pm mailing list