[Melbourne-pm] Mod_perl2

Wed Sep 13 22:42:35 PDT 2006

BTW. Repeated at the bottom - but I think this is an excellent  
discussion, bringing up some real good questions, that we don't  
really have the answers to - yet...

On 14/09/2006, at 15:16, Andrew Speer wrote:
> I understand that copy on write helps, but as I think you or Scott  
> alluded
> to - if each one of those processes manipulates a large amount of  
> data, or
> dynmically 'requires' other modules on the fly then "real" memory  
> can get
> used fairly quickly.

True of any service.

> You are right about this happening with any app/infrastructure, but  
> with
> Apache mod_perl a process may allocate 100MB to generate some dynamic

True of FastCGI too. That is its point, it loads and keeps loaded  
that code.

What you are talking about though is really aggressive garbage  
collection. One of the nice advantages of a CGI script, without any  
other Fast/mod_perl etc code - is that at the end, it is gone -  
completely garbage collected :-)

Every other solution requires that the code does the right thing.

But the down side usually way outweigh the upside. Basically that  
aggressive garbage collections has to be worked around, so people  
start by adding Cache::*, then using FastCGI, then adding in  
mod_proxy - etc... - what you end up with is having to do the right  
thing with your data anyway - thus you end up, with any significant  
application going down one of two paths:

* Slower execution time by aggressive garbage collection
* Do the right thing or suffer the memory leak consequences :-) but  
gain the performance.

> content  in one request, then be asked to serve up a tiny static  
> CSS file
> on the next one - eventually all processes pad out to 100MB, and the

That doesn't matter. Because it is already running, already forked,  
the speed should be as fast as a normal request.

> server struggles to even services requests for static content (unless
> MaxRequestsPerChild is reached or other memory management  
> techniques are
> used).

Not in my experience. Although if all your applications threads/forks  
(depending on which apache you use) each do massive memory  
allocations then you may run out of memory - that of course slows  
things down because of swap.

Assuming you don't hit swap - there is no speed difference.
Assuming you do hit swap - then using some other technique of keeping  
your code running all the time (e.g. a daemon) will suffer the same  
problem - after all - it does not matter WHCIH process is taking up  
all the memory - if one apache is small, that won't help if it is in  
swap - swap is only accessed for pages you access.

> It seems to be "better" (or at least more managable) to say e.g. "OK,
> 10-20 processes will be dedicated to dynamic content, and can grow to
> 100MB each, and 40-80 processes will handle dynamic content, and will
> remain about the same size". Correct me if I am wrong, but you  
> cannot seem
> to do this with Apache/mod_perl ? Looks like you can with Apache/ 
> FastCGI
> or lighttpd/FastCGI.

My experience is that it is either a poorly configured apache server  
or problems in code (and this of course could be 3rd party cpan code)  
or borderline memory hitting swap that causes these speed issues.

Yes you can hit the symptom - by being more aggressive with memory -  
or limiting apache threads with and without perl - but I rather work  
on the cause.

That said of course, if I could not fix the cause I would almost  
definitely do one of the following:

* Move to straight, execute each time CGI - this is useful for things  
that take up masses of memory but are run rarely - it would be a  
terrible choice for AJAX (for example).
* Move the code to a stand alone daemon - either single threaded non  
blocking, or other limited choices - and proxy to it (btw, I mean  
proxy by any protocol you like - not just HTTP Proxy, although that  
is an option).
* Use a scheduler. One bit of code I have generates huge (10s of MB)  
DEM and other terrain and topology files from about 10GB of raw  
material and can take anywhere between 1 and 30 minutes - this is  
scheduled in the background, and the web server just refreshes  
occasionally to check the status, and the user can exit and get a  
mail if they like. I have also implemented queuing in this, so that  
if I get too many, they just have to wait (currently limited to 2  
concurrent processes due to the massive memory usage to process the  
files).

More and more applications are serving Javascript that then call  
AJAX, these have to be on the same domain and port - I see that it is  
important to have a single HTTP request the user sees for resources  
as well - the two solutions above

>>
>>> There are ways and means around this problem, but they all seem a  
>>> bit
>>> kludgy - front-end proxy servers, multiple Apache servers (Apache  
>>> with
>>> mod_perl for dynamic content, straight Apache for static content).
>>
>> ...er, or you could tune your Apache process correctly.  That way you
>> fixed the problem rather than trying to work around it.
>>
>
> Tuning an Apache/mod_perl app to run in a corporate environment with a
> known or predictable load is fairly straightforward. Tuning the  
> same app
> to service the most possible connections on a single box with
> unpredictable load (e.g. public facing) would seem to almost demand  
> two
> HTTP processes - one for static content, and one for dynamic. I am  
> open to
> suggestions as to how you could "tune" an Apache process to handle  
> such a
> situation.

Not convinced. I don't have what I would like to call hard data  
though. I do have lots of data, but I would like to prove this. We  
can do so. We can write some simple mod_perl modules and some static  
content and call it with a number of ways for performance testing:

* No mod_perl
* mod_perl hit regularly (e.g. 1 in 10) but no known memory leaks or  
large use
* mod_perl hit regularly but with a large memory usage, reused, but  
done in each process (assunming of course we are using forks not  
threads - or we could share it - another good reason to use mod_perl  
- you actually save memory by sharing, which is hard to do, except  
with fairly ordinary shared memory, instead of just threads).
* mod_perl hit regularly but with memory leaks - keep an array/hash  
of the objects appended to, thus each request to mod_perl increases  
memory until the process exits.

The above set of 4 tests would prove conclusively the main question -  
does the mod_perl code impact on the static content - which my - not  
so conclusive - tests suggest.

>> Apache also supports FastCGI -- with the original FastCGI module  
>> and the
>> newer (and more free) FCGI module.  Both of these give the same  
>> benefits
>> as lighttpd and FastCGI, plus the advantages that mod_perl and other
>> Apache modules provide.
>>
>
> Thanks for that - FCGI was the module I meant to refer to. I will  
> have to
> try out Apache/FastCGI - I have not had a chance yet.
>
> Anyway, none of the above was meant to bash mod_perl

Not at all - this is VERY constructive. Maybe we can get a bit of  
mod_perl2 movement in Australia - it is a little slow, mostly because  
of what Skud said - that most developers don't need to know.

> - I think it is a
> great piece of software and has let me doing things quickly in  
> Apache that
> otherwise I would never have been able to do. I was just discussing  
> some
> of the limitations it seems to have, and the alternatives that are  
> around
> ..

Very excellent discussion.

Scooter