scottp at dd.com.au
Wed Sep 13 22:42:35 PDT 2006
BTW. Repeated at the bottom - but I think this is an excellent
discussion, bringing up some real good questions, that we don't
really have the answers to - yet...
On 14/09/2006, at 15:16, Andrew Speer wrote:
> I understand that copy on write helps, but as I think you or Scott
> to - if each one of those processes manipulates a large amount of
> data, or
> dynmically 'requires' other modules on the fly then "real" memory
> can get
> used fairly quickly.
True of any service.
> You are right about this happening with any app/infrastructure, but
> Apache mod_perl a process may allocate 100MB to generate some dynamic
True of FastCGI too. That is its point, it loads and keeps loaded
What you are talking about though is really aggressive garbage
collection. One of the nice advantages of a CGI script, without any
other Fast/mod_perl etc code - is that at the end, it is gone -
completely garbage collected :-)
Every other solution requires that the code does the right thing.
But the down side usually way outweigh the upside. Basically that
aggressive garbage collections has to be worked around, so people
start by adding Cache::*, then using FastCGI, then adding in
mod_proxy - etc... - what you end up with is having to do the right
thing with your data anyway - thus you end up, with any significant
application going down one of two paths:
* Slower execution time by aggressive garbage collection
* Do the right thing or suffer the memory leak consequences :-) but
gain the performance.
> content in one request, then be asked to serve up a tiny static
> CSS file
> on the next one - eventually all processes pad out to 100MB, and the
That doesn't matter. Because it is already running, already forked,
the speed should be as fast as a normal request.
> server struggles to even services requests for static content (unless
> MaxRequestsPerChild is reached or other memory management
> techniques are
Not in my experience. Although if all your applications threads/forks
(depending on which apache you use) each do massive memory
allocations then you may run out of memory - that of course slows
things down because of swap.
Assuming you don't hit swap - there is no speed difference.
Assuming you do hit swap - then using some other technique of keeping
your code running all the time (e.g. a daemon) will suffer the same
problem - after all - it does not matter WHCIH process is taking up
all the memory - if one apache is small, that won't help if it is in
swap - swap is only accessed for pages you access.
> It seems to be "better" (or at least more managable) to say e.g. "OK,
> 10-20 processes will be dedicated to dynamic content, and can grow to
> 100MB each, and 40-80 processes will handle dynamic content, and will
> remain about the same size". Correct me if I am wrong, but you
> cannot seem
> to do this with Apache/mod_perl ? Looks like you can with Apache/
> or lighttpd/FastCGI.
My experience is that it is either a poorly configured apache server
or problems in code (and this of course could be 3rd party cpan code)
or borderline memory hitting swap that causes these speed issues.
Yes you can hit the symptom - by being more aggressive with memory -
or limiting apache threads with and without perl - but I rather work
on the cause.
That said of course, if I could not fix the cause I would almost
definitely do one of the following:
* Move to straight, execute each time CGI - this is useful for things
that take up masses of memory but are run rarely - it would be a
terrible choice for AJAX (for example).
* Move the code to a stand alone daemon - either single threaded non
blocking, or other limited choices - and proxy to it (btw, I mean
proxy by any protocol you like - not just HTTP Proxy, although that
is an option).
* Use a scheduler. One bit of code I have generates huge (10s of MB)
DEM and other terrain and topology files from about 10GB of raw
material and can take anywhere between 1 and 30 minutes - this is
scheduled in the background, and the web server just refreshes
occasionally to check the status, and the user can exit and get a
mail if they like. I have also implemented queuing in this, so that
if I get too many, they just have to wait (currently limited to 2
concurrent processes due to the massive memory usage to process the
AJAX, these have to be on the same domain and port - I see that it is
important to have a single HTTP request the user sees for resources
as well - the two solutions above
>>> There are ways and means around this problem, but they all seem a
>>> kludgy - front-end proxy servers, multiple Apache servers (Apache
>>> mod_perl for dynamic content, straight Apache for static content).
>> ...er, or you could tune your Apache process correctly. That way you
>> fixed the problem rather than trying to work around it.
> Tuning an Apache/mod_perl app to run in a corporate environment with a
> known or predictable load is fairly straightforward. Tuning the
> same app
> to service the most possible connections on a single box with
> unpredictable load (e.g. public facing) would seem to almost demand
> HTTP processes - one for static content, and one for dynamic. I am
> open to
> suggestions as to how you could "tune" an Apache process to handle
> such a
Not convinced. I don't have what I would like to call hard data
though. I do have lots of data, but I would like to prove this. We
can do so. We can write some simple mod_perl modules and some static
content and call it with a number of ways for performance testing:
* No mod_perl
* mod_perl hit regularly (e.g. 1 in 10) but no known memory leaks or
* mod_perl hit regularly but with a large memory usage, reused, but
done in each process (assunming of course we are using forks not
threads - or we could share it - another good reason to use mod_perl
- you actually save memory by sharing, which is hard to do, except
with fairly ordinary shared memory, instead of just threads).
* mod_perl hit regularly but with memory leaks - keep an array/hash
of the objects appended to, thus each request to mod_perl increases
memory until the process exits.
The above set of 4 tests would prove conclusively the main question -
does the mod_perl code impact on the static content - which my - not
so conclusive - tests suggest.
>> Apache also supports FastCGI -- with the original FastCGI module
>> and the
>> newer (and more free) FCGI module. Both of these give the same
>> as lighttpd and FastCGI, plus the advantages that mod_perl and other
>> Apache modules provide.
> Thanks for that - FCGI was the module I meant to refer to. I will
> have to
> try out Apache/FastCGI - I have not had a chance yet.
> Anyway, none of the above was meant to bash mod_perl
Not at all - this is VERY constructive. Maybe we can get a bit of
mod_perl2 movement in Australia - it is a little slow, mostly because
of what Skud said - that most developers don't need to know.
> - I think it is a
> great piece of software and has let me doing things quickly in
> Apache that
> otherwise I would never have been able to do. I was just discussing
> of the limitations it seems to have, and the alternatives that are
Very excellent discussion.
More information about the Melbourne-pm