From jimbo.green at gmail.com Wed Jul 2 08:45:13 2014 From: jimbo.green at gmail.com (James Green) Date: Wed, 2 Jul 2014 16:45:13 +0100 Subject: [Nottingham-pm] Nottingham.pm July social, plus, progress on technical meetings! Message-ID: Hi all First, the important part: July's social will be a return to the Canalhouse, from 7pm onwards on a date to be confirmed. I've set up another Doodle to help pick a date, at http://doodle.com/zcqcx9myy4h7i6g7 -- I'll close it late next week and announce a definite date. Secondly: we now seem to have a suitable venue (actually, two options, but one involves more cost and hassle) for technical meetings! Hopefully we also still have some willing speakers - although more volunteers are appreciated, especially since if this goes well, I intend to make it a semi-regular event :-) For those who don't know, the general format of these events are between 2 and 5 (ish) presentations, of anywhere between 10 and 60 (ish) minutes in length - hopefully totalling around 90 minutes to 2 hours. There's then usually an exodus to a nearby (or in our case, probably, downstairs) pub to chat about the interesting things we've just heard about. I've spoken to a few of you individually already, but if anybody would like to speak, on a topic even tangentially related to Perl, please let me know -- either by email, or by tracking down me on irc.perl.org (where I'm usually 'jkg', and where the group channel is #nottingham.pm) (As an aside, a common response to the above is "I'm not good enough / don't know anything interesting enough to talk about." -- this is usually not the case! While talks on complex, advanced topics are obviously welcome, we're hoping to attract Perl users of all levels ... which means we need talks of all levels, as well. If you're not sure of your talent for speaking to an audience... it's possible a small, local event like this is a good opportunity to try it out -- I'm told it's quite addictive once you start :-) ) All the best, James From jimbo.green at gmail.com Mon Jul 7 12:07:05 2014 From: jimbo.green at gmail.com (James Green) Date: Mon, 7 Jul 2014 20:07:05 +0100 Subject: [Nottingham-pm] Nottingham.pm July social Message-ID: On 2 July 2014 16:45, James Green wrote: > First, the important part: July's social will be a return to the > Canalhouse, from 7pm onwards on a date to be confirmed. I've set up > another Doodle to help pick a date, at > http://doodle.com/zcqcx9myy4h7i6g7 -- I'll close it late next week and > announce a definite date. Hi all It's now early next week; I'm going to keep the Doodle open for a few more days, so get your selections in in the next 48 hours to avoid disappointment :-) Cheers, James From jimbo.green at gmail.com Thu Jul 10 03:02:45 2014 From: jimbo.green at gmail.com (James Green) Date: Thu, 10 Jul 2014 11:02:45 +0100 Subject: [Nottingham-pm] Nottingham.pm July Social - confirmed date Message-ID: Hi all There were enough responses on the Doodle to pick a clear winner for the next social meeting date. So I'm very pleased to be able to announce it :-) WHEN: Thursday, July 31st, from 7pm WHERE: Canalhouse, Canal St, Nottingham NG1 7EH MAP: http://thecanalhouse.co.uk/contact.html The Canalhouse has an excellent beer garden by the canalside, so unless the weather turns I imagine we'll be out there, at least until it gets dark! We may have Adam Reynolds from Host Europe Group joining us this month; Host Europe (and Heart, and 123-reg) are significant Perl users in Nottingham, and he wants to introduce himself. I've suggested that he bribe us with beer, and he seemed amenable to this. I'll also be popping across to FM&C next door to clear up some details ahead of the first technical meeting -- which is looking more and more like happening in early September. Hopefully see you there! All the best, James From auto at jwdt.co.uk Thu Jul 10 09:28:37 2014 From: auto at jwdt.co.uk (John (AUTO)) Date: Thu, 10 Jul 2014 17:28:37 +0100 Subject: [Nottingham-pm] Nottingham.pm July Social - confirmed date In-Reply-To: References: Message-ID: <53BEBF35.7030906@jwdt.co.uk> Hi James, I must apologise for my lack of being lately. Been rather busy with non-programming things and such. This one I shall, however, attend with certainty! :D JT On 10/07/14 11:02, James Green wrote: > Hi all > > There were enough responses on the Doodle to pick a clear winner for > the next social meeting date. So I'm very pleased to be able to > announce it :-) > > WHEN: Thursday, July 31st, from 7pm > WHERE: Canalhouse, Canal St, Nottingham NG1 7EH > MAP: http://thecanalhouse.co.uk/contact.html > > The Canalhouse has an excellent beer garden by the canalside, so > unless the weather turns I imagine we'll be out there, at least until > it gets dark! > > We may have Adam Reynolds from Host Europe Group joining us this > month; Host Europe (and Heart, and 123-reg) are significant Perl users > in Nottingham, and he wants to introduce himself. I've suggested that > he bribe us with beer, and he seemed amenable to this. > > I'll also be popping across to FM&C next door to clear up some details > ahead of the first technical meeting -- which is looking more and more > like happening in early September. > > Hopefully see you there! > > All the best, > James > > _______________________________________________ > Nottingham-pm mailing list > Nottingham-pm at pm.org > http://mail.pm.org/mailman/listinfo/nottingham-pm From jkg at earth.li Thu Jul 10 10:01:56 2014 From: jkg at earth.li (James Green) Date: Thu, 10 Jul 2014 18:01:56 +0100 Subject: [Nottingham-pm] Nottingham.pm July Social - confirmed date In-Reply-To: <53BEBF35.7030906@jwdt.co.uk> References: <53BEBF35.7030906@jwdt.co.uk> Message-ID: Hi John On 10 July 2014 17:28, John (AUTO) wrote: > Hi James, > > I must apologise for my lack of being lately. Been rather busy with > non-programming things and such. > I think you've only missed one, since the last one you were at :-) This one I shall, however, attend with certainty! :D > I look forward to it! Cheers, James -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimbo.green at gmail.com Tue Jul 29 10:26:06 2014 From: jimbo.green at gmail.com (James Green) Date: Tue, 29 Jul 2014 18:26:06 +0100 Subject: [Nottingham-pm] Reminder: Nottingham.pm social at the Canalhouse, Thursday July 31st Message-ID: Hey all Apparently this is only ~48 hours away. Hopefully see you there! I'll be a little late, while I pop into the FM&C next door to sort out some technical meeting details... but I should be there for 7.15 or so :) Cheers, James From jimbo.green at gmail.com Tue Jul 29 10:35:55 2014 From: jimbo.green at gmail.com (James Green) Date: Tue, 29 Jul 2014 18:35:55 +0100 Subject: [Nottingham-pm] Monitoring website uptimes Message-ID: Hey folks, Rather than my usual meeting-arranging blather on this list ... I have an actual Perl-related question! OK, it's not very Perl related. Following a bunch of recent conversations about the future of search.cpan.org, and the fact it was seemingly down all the time, I've started gathering stats on when both it, and metacpan.org, are unreachable. Unfortunately I'm getting a lot of what I suspect are false positives. I'm using LWP::UserAgent to get() a specific search page from each site, timing out after 30s, and if it hasn't loaded, considering it "down" until the next check. This process runs every 2 minutes, from cron. Quite often a site will fail to load just once, then be back up the next time -- which is as likely to be a transient routing problem at my end as an issue at theirs. Does anyone have experience monitoring the availability of websites, or exciting ideas for better approaches to this data? The (only slightly icky) current collection code is at https://github.com/jkg/cpan-uptime/ and you can see the way I currently present it at http://cpan-uptime.dillig.af/, although I'm trying not to publicise that too widely until there's more data, and I have a better handle on presenting it... All ideas welcome, on-list, off-list, or over beers on Thursday :-) Cheers, James From duncanfyfe at domenlas.com Tue Jul 29 11:45:54 2014 From: duncanfyfe at domenlas.com (Duncan Fyfe) Date: Tue, 29 Jul 2014 19:45:54 +0100 Subject: [Nottingham-pm] Monitoring website uptimes In-Reply-To: References: Message-ID: <53D7EBE2.8020507@domenlas.com> On 29/07/14 18:35, James Green wrote: > Hey folks, > > Rather than my usual meeting-arranging blather on this list ... I have > an actual Perl-related question! OK, it's not very Perl related. > > Following a bunch of recent conversations about the future of > search.cpan.org, and the fact it was seemingly down all the time, I've > started gathering stats on when both it, and metacpan.org, are > unreachable. > > Unfortunately I'm getting a lot of what I suspect are false positives. > I'm using LWP::UserAgent to get() a specific search page from each > site, timing out after 30s, and if it hasn't loaded, considering it > "down" until the next check. This process runs every 2 minutes, from > cron. Quite often a site will fail to load just once, then be back up > the next time -- which is as likely to be a transient routing problem > at my end as an issue at theirs. > > Does anyone have experience monitoring the availability of websites, > or exciting ideas for better approaches to this data? > Quick check, details below, but for starters it looks like there might be a reverse DNS problem with metacpan.org. I'll have a more detailed look later. Have fun, Duncan === DETAILS === nslookup search.cpan.org Server: 194.168.4.100 Address: 194.168.4.100#53 Non-authoritative answer: search.cpan.org canonical name = cpansearch.perl.org. Name: cpansearch.perl.org Address: 207.171.7.59 Name: cpansearch.perl.org Address: 207.171.7.49 wget http://search.cpan.org/search?query=Moose&mode=all OK wget http://207.171.7.59/search?query=Moose&mode=all OK wget http://207.171.7.49/search?query=Moose&mode=all OK nslookup metacpan.org Server: 194.168.4.100 Address: 194.168.4.100#53 Non-authoritative answer: Name: metacpan.org Address: 23.235.37.143 Name: metacpan.org Address: 23.235.33.143 wget https://metacpan.org/search?q=Moose OK wget --no-check-certificate https://23.235.37.143/search?q=Moose --2014-07-29 19:04:46-- https://23.235.37.143/search?q=Moose Connecting to 23.235.37.143:443... connected. The certificate's owner does not match hostname ‘23.235.37.143’ HTTP request sent, awaiting response... 500 Domain Not Found 2014-07-29 19:04:46 ERROR 500: Domain Not Found. wget --no-check-certificate https://23.235.33.143/search?q=Moose --2014-07-29 19:05:15-- https://23.235.33.143/search?q=Moose Connecting to 23.235.33.143:443... connected. The certificate's owner does not match hostname ‘23.235.33.143’ HTTP request sent, awaiting response... 500 Domain Not Found 2014-07-29 19:05:15 ERROR 500: Domain Not Found. nslookup 23.235.37.143 Server: 194.168.4.100 Address: 194.168.4.100#53 ** server can't find 143.37.235.23.in-addr.arpa: NXDOMAIN nslookup 23.235.33.143 Server: 194.168.4.100 Address: 194.168.4.100#53 ** server can't find 143.33.235.23.in-addr.arpa: NXDOMAIN ping -q -c 10 23.235.37.143 PING 23.235.37.143 (23.235.37.143) 56(84) bytes of data. --- 23.235.37.143 ping statistics --- 10 packets transmitted, 10 received, 0% packet loss, time 9014ms rtt min/avg/max/mdev = 25.587/27.657/37.575/3.352 ms ping -q -c 10 23.235.33.143 PING 23.235.33.143 (23.235.33.143) 56(84) bytes of data. --- 23.235.33.143 ping statistics --- 10 packets transmitted, 10 received, 0% packet loss, time 9013ms rtt min/avg/max/mdev = 29.846/32.689/43.296/3.717 ms From auto at jwdt.co.uk Tue Jul 29 12:07:06 2014 From: auto at jwdt.co.uk (=?utf-8?B?YXV0b0Bqd2R0LmNvLnVr?=) Date: Tue, 29 Jul 2014 20:07:06 +0100 Subject: [Nottingham-pm] =?utf-8?q?Monitoring_website_uptimes?= Message-ID: I've done it in bash before a while ago, using wget. Though it was more a "download this page every minute and take some data from it" script, there was a bit of error reporting in there too. If I can find it I might be able to bring something to the table. :-) Sent from my HTC ----- Reply message ----- From: "Duncan Fyfe" To: "The Nottingham Perl Mongers" Subject: [Nottingham-pm] Monitoring website uptimes Date: Tue, Jul 29, 2014 19:45 On 29/07/14 18:35, James Green wrote: > Hey folks, > > Rather than my usual meeting-arranging blather on this list ... I have > an actual Perl-related question! OK, it's not very Perl related. > > Following a bunch of recent conversations about the future of > search.cpan.org, and the fact it was seemingly down all the time, I've > started gathering stats on when both it, and metacpan.org, are > unreachable. > > Unfortunately I'm getting a lot of what I suspect are false positives. > I'm using LWP::UserAgent to get() a specific search page from each > site, timing out after 30s, and if it hasn't loaded, considering it > "down" until the next check. This process runs every 2 minutes, from > cron. Quite often a site will fail to load just once, then be back up > the next time -- which is as likely to be a transient routing problem > at my end as an issue at theirs. > > Does anyone have experience monitoring the availability of websites, > or exciting ideas for better approaches to this data? > Quick check, details below, but for starters it looks like there might be a reverse DNS problem with metacpan.org. I'll have a more detailed look later. Have fun, Duncan === DETAILS === nslookup search.cpan.org Server: 194.168.4.100 Address: 194.168.4.100#53 Non-authoritative answer: search.cpan.org canonical name = cpansearch.perl.org. Name: cpansearch.perl.org Address: 207.171.7.59 Name: cpansearch.perl.org Address: 207.171.7.49 wget http://search.cpan.org/search?query=Moose&mode=all OK wget http://207.171.7.59/search?query=Moose&mode=all OK wget http://207.171.7.49/search?query=Moose&mode=all OK nslookup metacpan.org Server: 194.168.4.100 Address: 194.168.4.100#53 Non-authoritative answer: Name: metacpan.org Address: 23.235.37.143 Name: metacpan.org Address: 23.235.33.143 wget https://metacpan.org/search?q=Moose OK wget --no-check-certificate https://23.235.37.143/search?q=Moose --2014-07-29 19:04:46-- https://23.235.37.143/search?q=Moose Connecting to 23.235.37.143:443... connected. The certificate's owner does not match hostname ‘23.235.37.143’ HTTP request sent, awaiting response... 500 Domain Not Found 2014-07-29 19:04:46 ERROR 500: Domain Not Found. wget --no-check-certificate https://23.235.33.143/search?q=Moose --2014-07-29 19:05:15-- https://23.235.33.143/search?q=Moose Connecting to 23.235.33.143:443... connected. The certificate's owner does not match hostname ‘23.235.33.143’ HTTP request sent, awaiting response... 500 Domain Not Found 2014-07-29 19:05:15 ERROR 500: Domain Not Found. nslookup 23.235.37.143 Server: 194.168.4.100 Address: 194.168.4.100#53 ** server can't find 143.37.235.23.in-addr.arpa: NXDOMAIN nslookup 23.235.33.143 Server: 194.168.4.100 Address: 194.168.4.100#53 ** server can't find 143.33.235.23.in-addr.arpa: NXDOMAIN ping -q -c 10 23.235.37.143 PING 23.235.37.143 (23.235.37.143) 56(84) bytes of data. --- 23.235.37.143 ping statistics --- 10 packets transmitted, 10 received, 0% packet loss, time 9014ms rtt min/avg/max/mdev = 25.587/27.657/37.575/3.352 ms ping -q -c 10 23.235.33.143 PING 23.235.33.143 (23.235.33.143) 56(84) bytes of data. --- 23.235.33.143 ping statistics --- 10 packets transmitted, 10 received, 0% packet loss, time 9013ms rtt min/avg/max/mdev = 29.846/32.689/43.296/3.717 ms _______________________________________________ Nottingham-pm mailing list Nottingham-pm at pm.org http://mail.pm.org/mailman/listinfo/nottingham-pm -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim.a.driscoll at gmail.com Tue Jul 29 12:33:18 2014 From: jim.a.driscoll at gmail.com (Jim Driscoll) Date: Tue, 29 Jul 2014 20:33:18 +0100 Subject: [Nottingham-pm] Monitoring website uptimes In-Reply-To: <53D7EBE2.8020507@domenlas.com> References: <53D7EBE2.8020507@domenlas.com> Message-ID: > On 29 Jul 2014, at 19:45, Duncan Fyfe wrote: > >> On 29/07/14 18:35, James Green wrote: >> Hey folks, >> >> Rather than my usual meeting-arranging blather on this list ... I have >> an actual Perl-related question! OK, it's not very Perl related. >> >> Following a bunch of recent conversations about the future of >> search.cpan.org, and the fact it was seemingly down all the time, I've >> started gathering stats on when both it, and metacpan.org, are >> unreachable. >> >> Unfortunately I'm getting a lot of what I suspect are false positives. >> I'm using LWP::UserAgent to get() a specific search page from each >> site, timing out after 30s, and if it hasn't loaded, considering it >> "down" until the next check. This process runs every 2 minutes, from >> cron. Quite often a site will fail to load just once, then be back up >> the next time -- which is as likely to be a transient routing problem >> at my end as an issue at theirs. >> >> Does anyone have experience monitoring the availability of websites, >> or exciting ideas for better approaches to this data? > > Quick check, details below, but for starters it looks like there might > be a reverse DNS problem with metacpan.org. I'll have a more detailed > look later. Just a misconfiguration on one of the servers, unlikely to be anything to do with reverse DNS at all, and certainly that would explain why it breaks "sometimes". For monitoring purposes there should be a connected IP address and port as properties of HTTP::Response (peeraddr/peerport maybe?), so you should log those on success or failure to identify if there is a bad server. You can persuade LWP::UserAgent to connect to a specified IP address via its proxy functionality I think, so looping over all IP addresses it maps to on each test cycle would be viable and useful. On the subject of eliminating local connectivity problems, just ensure that there is also a control (just some unrelated site or sites) which you're also monitoring at the same time - if the control is also down then it's probably your connection. Jim From duncanfyfe at domenlas.com Tue Jul 29 16:30:22 2014 From: duncanfyfe at domenlas.com (Duncan Fyfe) Date: Wed, 30 Jul 2014 00:30:22 +0100 Subject: [Nottingham-pm] Monitoring website uptimes In-Reply-To: References: <53D7EBE2.8020507@domenlas.com> Message-ID: <53D82E8E.3010707@domenlas.com> On 29/07/14 20:33, Jim Driscoll wrote: > >> On 29 Jul 2014, at 19:45, Duncan Fyfe wrote: >> >>> On 29/07/14 18:35, James Green wrote: >>> Hey folks, >>> >>> Rather than my usual meeting-arranging blather on this list ... I have >>> an actual Perl-related question! OK, it's not very Perl related. >>> >>> Following a bunch of recent conversations about the future of >>> search.cpan.org, and the fact it was seemingly down all the time, I've >>> started gathering stats on when both it, and metacpan.org, are >>> unreachable. >>> >>> Unfortunately I'm getting a lot of what I suspect are false positives. >>> I'm using LWP::UserAgent to get() a specific search page from each >>> site, timing out after 30s, and if it hasn't loaded, considering it >>> "down" until the next check. This process runs every 2 minutes, from >>> cron. Quite often a site will fail to load just once, then be back up >>> the next time -- which is as likely to be a transient routing problem >>> at my end as an issue at theirs. >>> >>> Does anyone have experience monitoring the availability of websites, >>> or exciting ideas for better approaches to this data? >> >> Quick check, details below, but for starters it looks like there might >> be a reverse DNS problem with metacpan.org. I'll have a more detailed >> look later. > > Just a misconfiguration on one of the servers, unlikely to be anything to do with reverse DNS at all, James' problem may not be due to a reverse DNS lookup problem but it still stands that reverse DNS lookups on the metacpan.org IP addresses fail (see the nslookup output in my first reply). James - can you confirm it is the accessibility of search.cpan.org itself that people are concerned about ie. can you check if it is CPAN they have a problem with or a CPAN mirror ? I've had problems before with the CPAN mirrors "automagically" chosen by cpan configuration before. Back to your test script. How frequent are the failures or put another way, how many times would you expect to have to run it before you saw a failure ? Have fun, Duncan From duncanfyfe at domenlas.com Wed Jul 30 07:11:50 2014 From: duncanfyfe at domenlas.com (Duncan Fyfe) Date: Wed, 30 Jul 2014 15:11:50 +0100 Subject: [Nottingham-pm] Monitoring website uptimes In-Reply-To: <53D82E8E.3010707@domenlas.com> References: <53D7EBE2.8020507@domenlas.com> <53D82E8E.3010707@domenlas.com> Message-ID: <53D8FD26.4060500@domenlas.com> On 30/07/14 00:30, Duncan Fyfe wrote: > > > Back to your test script. How frequent are the failures or put another > way, how many times would you expect to have to run it before you saw > a failure ? > Ok. I've modified the script (git patch attached, do as you will with it): a) for fun; b) to get it running on my server (so I don't have to install Config::YAML); c) to dump more HTTP information in the event of a failure; d) to add https://www.google.com as a "really ought to be working" control. Having run every 2 minutes since last night (> 360 tries) and I've not seen any failures. I'll get is up and running on another server (different physical locations and network paths) and see what happens there. Have fun, Duncan -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Replaced-dependence-on-Config-YAML-with-FindBin-my-s.patch Type: text/x-patch Size: 6801 bytes Desc: not available URL: From duncanfyfe at domenlas.com Wed Jul 30 08:56:28 2014 From: duncanfyfe at domenlas.com (Duncan Fyfe) Date: Wed, 30 Jul 2014 16:56:28 +0100 Subject: [Nottingham-pm] Monitoring website uptimes In-Reply-To: <53D8FD26.4060500@domenlas.com> References: <53D7EBE2.8020507@domenlas.com> <53D82E8E.3010707@domenlas.com> <53D8FD26.4060500@domenlas.com> Message-ID: <53D915AC.9070301@domenlas.com> On 30/07/14 15:11, Duncan Fyfe wrote: > On 30/07/14 00:30, Duncan Fyfe wrote: >> >> >> Back to your test script. How frequent are the failures or put another >> way, how many times would you expect to have to run it before you saw >> a failure ? >> > Ok. I've modified the script (git patch attached, do as you will with it): > a) for fun; > b) to get it running on my server (so I don't have to install > Config::YAML); > c) to dump more HTTP information in the event of a failure; > d) to add https://www.google.com as a "really ought to be working" control. > > Having run every 2 minutes since last night (> 360 tries) and I've not > seen any failures. I'll get is up and running on another server > (different physical locations and network paths) and see what happens > there. > Couple more patches as promised. The first just removes an unnecessary dependency. The second adds new tables (results_2 and dumps_2) which have an added hostid column so we can merge results. It also adds a quick bash script, with the necessary SQL, to copy data from the results to results_2 table. Have fun, Duncan -------------- next part -------------- A non-text attachment was scrubbed... Name: 0002-Remove-unnecessary-dependence-on-Data-Dumper.patch Type: text/x-patch Size: 518 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0003-Deprecate-results-and-dumps-tables.patch Type: text/x-patch Size: 6066 bytes Desc: not available URL: From jkg at earth.li Wed Jul 30 14:47:15 2014 From: jkg at earth.li (James Green) Date: Wed, 30 Jul 2014 22:47:15 +0100 Subject: [Nottingham-pm] Monitoring website uptimes In-Reply-To: <53D915AC.9070301@domenlas.com> References: <53D7EBE2.8020507@domenlas.com> <53D82E8E.3010707@domenlas.com> <53D8FD26.4060500@domenlas.com> <53D915AC.9070301@domenlas.com> Message-ID: Hi all, thanks for the responses! On 30 July 2014 16:56, Duncan Fyfe wrote: > On 30/07/14 15:11, Duncan Fyfe wrote: >> On 30/07/14 00:30, Duncan Fyfe wrote: >>> Back to your test script. How frequent are the failures or put another >>> way, how many times would you expect to have to run it before you saw >>> a failure ? Interestingly there haven't been any failures at all in the last 24 hours -- looking back further, I'm averaging 3-4 per day (of 720 attempts) for MetaCPAN and closer to 1 for CPANsearch, almost never at the same time as one another (although if whichever I check first- which I seem to recall is randomised - times out, the other test starts up to 30 seconds later, so any local issue could have cleared up) That makes me wonder if I should do the checks asynchronously and kick them all off as close together as possible. Hmm. >> Ok. I've modified the script (git patch attached, do as you will with it): [some elided] >> c) to dump more HTTP information in the event of a failure; I considered that; but I decided I didn't want data I'd have to manually process. I guess you're right though -- I should capture it somewhere, at least, for later reference... > Couple more patches as promised. The first just removes an unnecessary > dependency. Whoops. Leaving Data::Dumper lying around in production is a bad habit of mine... > The second adds new tables (results_2 and dumps_2) which > have an added hostid column so we can merge results. It also > adds a quick bash script, with the necessary SQL, to copy data from > the results to results_2 table. I'll take a closer look at this (probably at the weekend) but it sounds like a good idea -- more data will hide any minor anomalies, and testing from more locations will rule out any local issues. I had been trying to think of a (fair, sane) way to "smooth" the data by just dropping things that were clearly not real problems -- perhaps re-testing more frequently after a failure and ignoring it if the recovery was quick -- but this is probably a more sensible approach. Thanks again, James From duncanfyfe at domenlas.com Wed Jul 30 15:39:10 2014 From: duncanfyfe at domenlas.com (Duncan Fyfe) Date: Wed, 30 Jul 2014 23:39:10 +0100 Subject: [Nottingham-pm] Monitoring website uptimes In-Reply-To: References: <53D7EBE2.8020507@domenlas.com> <53D82E8E.3010707@domenlas.com> <53D8FD26.4060500@domenlas.com> <53D915AC.9070301@domenlas.com> Message-ID: <53D9740E.5070404@domenlas.com> > > Interestingly there haven't been any failures at all in the last 24 > hours -- looking back further, I'm averaging 3-4 per day (of 720 > attempts) for MetaCPAN and closer to 1 for CPANsearch, almost never at > the same time as one another (although if whichever I check first- > which I seem to recall is randomised - times out, the other test > starts up to 30 seconds later, so any local issue could have cleared > up) > > That makes me wonder if I should do the checks asynchronously and kick > them all off as close together as possible. Hmm. Makes me wonder if metacpan has a low maximum number of concurrent connections and your failed connections just happen to hit a busy period (such as when lots of mirrors are resynching). A concurrency problem might be down to limits on the webserver or backend DB. > >>> Ok. I've modified the script (git patch attached, do as you will with it): > > [some elided] > >>> c) to dump more HTTP information in the event of a failure; > > I considered that; but I decided I didn't want data I'd have to > manually process. I guess you're right though -- I should capture it > somewhere, at least, for later reference... I just want more data from at least one failure. After a few failures the dump code can be disabled. > >> Couple more patches as promised. The first just removes an unnecessary >> dependency. > > Whoops. Leaving Data::Dumper lying around in production is a bad habit > of mine... It was me this time. I have to admit though, I have used Data::Dumper as part of the Logging and Exception handling in production code[1] and would not be afraid to do so again. It is a really powerful tool for debugging subtle problems, but like any powerful tool there are places you can safely use them and there are times you should have known better. [1] https://github.com/DuncanFyfe/application-toolkit-perl Msg.pm and Exception.pm classes. > >> The second adds new tables (results_2 and dumps_2) which >> have an added hostid column so we can merge results. It also >> adds a quick bash script, with the necessary SQL, to copy data from >> the results to results_2 table. > > I'll take a closer look at this (probably at the weekend) but it > sounds like a good idea -- more data will hide any minor anomalies, > and testing from more locations will rule out any local issues. I had > been trying to think of a (fair, sane) way to "smooth" the data by > just dropping things that were clearly not real problems -- perhaps > re-testing more frequently after a failure and ignoring it if the > recovery was quick -- but this is probably a more sensible approach. > I wouldn't invest too much more time in the script. We can easily filter the data as necessary into appropriate subsets. The most interesting data will be that from multiple machines close to a failure on one machine. The script as is should reveal symptoms eg: Are the observed failures are specific to a machine or to particular times (eg. coinciding with multiple mirrors synching) but I fear it will be difficult to get a definitive cause without access to the webserver log files. Have fun, Duncan From duncanfyfe at domenlas.com Thu Jul 31 09:12:52 2014 From: duncanfyfe at domenlas.com (Duncan Fyfe) Date: Thu, 31 Jul 2014 17:12:52 +0100 Subject: [Nottingham-pm] Monitoring website uptimes In-Reply-To: <53D9740E.5070404@domenlas.com> References: <53D7EBE2.8020507@domenlas.com> <53D82E8E.3010707@domenlas.com> <53D8FD26.4060500@domenlas.com> <53D915AC.9070301@domenlas.com> <53D9740E.5070404@domenlas.com> Message-ID: <53DA6B04.8000301@domenlas.com> Final (honest!) patch attached. It just adds a quick script to merge rows from different databases results_2 and dumps_2 tables into one. See you later. Have fun, Duncan -------------- next part -------------- A non-text attachment was scrubbed... Name: 0004-Add-perl-utility-to-merge-results-databases.patch Type: text/x-patch Size: 1605 bytes Desc: not available URL: From jimbo.green at gmail.com Thu Jul 31 09:16:31 2014 From: jimbo.green at gmail.com (James Green) Date: Thu, 31 Jul 2014 17:16:31 +0100 Subject: [Nottingham-pm] Monitoring website uptimes In-Reply-To: <53DA6B04.8000301@domenlas.com> References: <53D7EBE2.8020507@domenlas.com> <53D82E8E.3010707@domenlas.com> <53D8FD26.4060500@domenlas.com> <53D915AC.9070301@domenlas.com> <53D9740E.5070404@domenlas.com> <53DA6B04.8000301@domenlas.com> Message-ID: Hey Duncan On 31 July 2014 17:12, Duncan Fyfe wrote: > > Final (honest!) patch attached. > It just adds a quick script to merge rows from different databases > results_2 and dumps_2 tables into one. Thanks again -- I'll get these merged and apply some of my own prejudices regarding naming etc, and get the new version live including merging in the existing data at the weekend, with any luck Really appreciate it -- got to be worth a beer tonight, if you can still make it :-) Cheers, James