[sf-perl] Server downtime reporting and recovery
woof at danlo.com
Tue Feb 24 18:30:13 PST 2009
The best test for this is one that is done externally. For example another website
checks your website. :)
Some sites will even preform metrics such as how long it takes to load and graph
that over a period of time. Some sites will also do it from different parts of
Tuesday, February 24, 2009, 5:13:57 PM, you wrote:
> Hi all,
> I was curious about how those of you who work with web aps deal with
> minimizing downtime when a particular service dies for whatever
> reason. I'm not a sysadmin by training, rather it is a responsibility
> that no one else seemed willing to take. Right now I have a perl
> script that runs as a cron job every five minutes, checking the status
> of the various services on the server and restarting and reporting if
> anything is amiss.
> I've been told that my production schedule needs to be pushed forward
> and five minutes of downtime will soon be unacceptable. Since I've
> got a .NET app running in mono (which has not been kind to me) I need
> to catch problems as quickly as possible and restart the service.
> Most frequently the mono app will just hang indefinitely, not crash
> outright. With the new schedule I don't have time to fix (read
> replace) the problematic app before I go live.
> So my question, what do you folks recommend as far as checking the
> status of services more frequently than every 5 minutes? Would you
> recommend sticking with perl, or this there some FOSS that would
> better serve my purposes? In my research, I've found programs like
> Nagios, but don't know much about them. I'd prefer not to add too much
> the way of overhead, but I also don't want to reinvent the wheel.
> Sorry if this is a little off topic.
> SanFrancisco-pm mailing list
> SanFrancisco-pm at pm.org
Daniel mailto:woof at danlo.com
More information about the SanFrancisco-pm