[sf-perl] Server downtime reporting and recovery

frosty biztos at mac.com
Wed Feb 25 01:36:43 PST 2009


Check out Pingdom.  I have not used them but I've looked into their service and it looks to me like they have a good reputation and reasonable prices.

http://pingdom.com/

I would recommend also having redundant monitoring via your Perl script, and maybe even have some external service like Pingdom checking that.

The point being that if you have to take on the thankless job of sysadmin, you probably also want to hire a robot to wake you up in the middle of the night if things go bad.

-- f.
 
On Tuesday, February 24, 2009, at 06:30PM, "Daniel Lo" <woof at danlo.com> wrote:
>Hello Matt,
>
>The best test for this is one that is done externally.  For example another website
>checks your website. :)
>
>Some sites will even preform metrics such as how long it takes to load and graph
>that over a period of time.  Some sites will also do it from different parts of
>the world/country.
>
>-daniel
>
>
>
>Tuesday, February 24, 2009, 5:13:57 PM, you wrote:
>
>> Hi all,
>
>> I was curious about how those of you who work with web aps deal with
>> minimizing downtime when a particular service dies for whatever
>> reason.  I'm not a sysadmin by training, rather it is a responsibility
>> that no one else seemed willing to take.  Right now I have a perl
>> script that runs as a cron job every five minutes, checking the status
>> of the various services on the server and restarting and reporting if
>> anything is amiss.
>
>> I've been told that my production schedule needs to be pushed forward
>> and five minutes of downtime will soon be unacceptable.  Since I've
>> got a .NET app running in mono (which has not been kind to me) I need
>> to catch problems as quickly as possible and restart the service.
>> Most frequently the mono app will just hang indefinitely, not crash
>> outright.  With the new schedule I don't have time to fix (read
>> replace) the problematic app before I go live.
>
>> So my question, what do you folks recommend as far as checking the
>> status of services more frequently than every 5 minutes?  Would you
>> recommend sticking with perl, or this there some FOSS that would
>> better serve my purposes?  In my research, I've found programs like
>> Nagios, but don't know much about them. I'd prefer not to add too much
>> the way of overhead, but I also don't want to reinvent the wheel.
>
>> Sorry if this is a little off topic.
>
>> Thanks,
>
>> Matt
>> _______________________________________________
>> SanFrancisco-pm mailing list
>> SanFrancisco-pm at pm.org
>> http://mail.pm.org/mailman/listinfo/sanfrancisco-pm
>
>
>
>-- 
>Best regards,
> Daniel                            mailto:woof at danlo.com
>
>_______________________________________________
>SanFrancisco-pm mailing list
>SanFrancisco-pm at pm.org
>http://mail.pm.org/mailman/listinfo/sanfrancisco-pm
>
>


More information about the SanFrancisco-pm mailing list