From jimbo.green at gmail.com  Wed Jul  2 08:45:13 2014
From: jimbo.green at gmail.com (James Green)
Date: Wed, 2 Jul 2014 16:45:13 +0100
Subject: [Nottingham-pm] Nottingham.pm July social, plus,
	progress on technical meetings!
Message-ID: <CAP+pNovihwFtXPDeaAXmzYfEX1H2mCjiHFfBFb15i9S6JB=JbQ@mail.gmail.com>

Hi all

First, the important part: July's social will be a return to the
Canalhouse, from 7pm onwards on a date to be confirmed. I've set up
another Doodle to help pick a date, at
http://doodle.com/zcqcx9myy4h7i6g7 -- I'll close it late next week and
announce a definite date.

Secondly: we now seem to have a suitable venue (actually, two options,
but one involves more cost and hassle) for technical meetings!
Hopefully we also still have some willing speakers - although more
volunteers are appreciated, especially since if this goes well, I
intend to make it a semi-regular event :-)

For those who don't know, the general format of these events are
between 2 and 5 (ish) presentations, of anywhere between 10 and 60
(ish) minutes in length - hopefully totalling around 90 minutes to 2
hours. There's then usually an exodus to a nearby (or in our case,
probably, downstairs) pub to chat about the interesting things we've
just heard about.

I've spoken to a few of you individually already, but if anybody would
like to speak, on a topic even tangentially related to Perl, please
let me know -- either by email, or by tracking down me on irc.perl.org
(where I'm usually 'jkg', and where the group channel is
#nottingham.pm)

(As an aside, a common response to the above is "I'm not good enough /
don't know anything interesting enough to talk about." -- this is
usually not the case! While talks on complex, advanced topics are
obviously welcome, we're hoping to attract Perl users of all levels
... which means we need talks of all levels, as well. If you're not
sure of your talent for speaking to an audience... it's possible a
small, local event like this is a good opportunity to try it out --
I'm told it's quite addictive once you start :-) )

All the best,
James

From jimbo.green at gmail.com  Mon Jul  7 12:07:05 2014
From: jimbo.green at gmail.com (James Green)
Date: Mon, 7 Jul 2014 20:07:05 +0100
Subject: [Nottingham-pm] Nottingham.pm July social
Message-ID: <CAP+pNov7rZee0rQnSvGJMhmr_zFt57HZCJD40WO+_oTJ9urKaQ@mail.gmail.com>

On 2 July 2014 16:45, James Green <jimbo.green at gmail.com> wrote:

> First, the important part: July's social will be a return to the
> Canalhouse, from 7pm onwards on a date to be confirmed. I've set up
> another Doodle to help pick a date, at
> http://doodle.com/zcqcx9myy4h7i6g7 -- I'll close it late next week and
> announce a definite date.

Hi all

It's now early next week; I'm going to keep the Doodle open for a few
more days, so get your selections in in the next 48 hours to avoid
disappointment :-)

Cheers,

James

From jimbo.green at gmail.com  Thu Jul 10 03:02:45 2014
From: jimbo.green at gmail.com (James Green)
Date: Thu, 10 Jul 2014 11:02:45 +0100
Subject: [Nottingham-pm] Nottingham.pm July Social - confirmed date
Message-ID: <CAP+pNoudw-GSZkpJzgu4z37nXF9XjTcK=h=KOj+Qe=YTHVfXXg@mail.gmail.com>

Hi all

There were enough responses on the Doodle to pick a clear winner for
the next social meeting date. So I'm very pleased to be able to
announce it :-)

WHEN: Thursday, July 31st, from 7pm
WHERE: Canalhouse, Canal St, Nottingham NG1 7EH
MAP: http://thecanalhouse.co.uk/contact.html

The Canalhouse has an excellent beer garden by the canalside, so
unless the weather turns I imagine we'll be out there, at least until
it gets dark!

We may have Adam Reynolds from Host Europe Group joining us this
month; Host Europe (and Heart, and 123-reg) are significant Perl users
in Nottingham, and he wants to introduce himself. I've suggested that
he bribe us with beer, and he seemed amenable to this.

I'll also be popping across to FM&C next door to clear up some details
ahead of the first technical meeting -- which is looking more and more
like happening in early September.

Hopefully see you there!

All the best,
James

From auto at jwdt.co.uk  Thu Jul 10 09:28:37 2014
From: auto at jwdt.co.uk (John (AUTO))
Date: Thu, 10 Jul 2014 17:28:37 +0100
Subject: [Nottingham-pm] Nottingham.pm July Social - confirmed date
In-Reply-To: <CAP+pNoudw-GSZkpJzgu4z37nXF9XjTcK=h=KOj+Qe=YTHVfXXg@mail.gmail.com>
References: <CAP+pNoudw-GSZkpJzgu4z37nXF9XjTcK=h=KOj+Qe=YTHVfXXg@mail.gmail.com>
Message-ID: <53BEBF35.7030906@jwdt.co.uk>

Hi James,

I must apologise for my lack of being lately. Been rather busy with 
non-programming things and such.

This one I shall, however, attend with certainty! :D

JT

On 10/07/14 11:02, James Green wrote:
> Hi all
>
> There were enough responses on the Doodle to pick a clear winner for
> the next social meeting date. So I'm very pleased to be able to
> announce it :-)
>
> WHEN: Thursday, July 31st, from 7pm
> WHERE: Canalhouse, Canal St, Nottingham NG1 7EH
> MAP: http://thecanalhouse.co.uk/contact.html
>
> The Canalhouse has an excellent beer garden by the canalside, so
> unless the weather turns I imagine we'll be out there, at least until
> it gets dark!
>
> We may have Adam Reynolds from Host Europe Group joining us this
> month; Host Europe (and Heart, and 123-reg) are significant Perl users
> in Nottingham, and he wants to introduce himself. I've suggested that
> he bribe us with beer, and he seemed amenable to this.
>
> I'll also be popping across to FM&C next door to clear up some details
> ahead of the first technical meeting -- which is looking more and more
> like happening in early September.
>
> Hopefully see you there!
>
> All the best,
> James
>
> _______________________________________________
> Nottingham-pm mailing list
> Nottingham-pm at pm.org
> http://mail.pm.org/mailman/listinfo/nottingham-pm


From jkg at earth.li  Thu Jul 10 10:01:56 2014
From: jkg at earth.li (James Green)
Date: Thu, 10 Jul 2014 18:01:56 +0100
Subject: [Nottingham-pm] Nottingham.pm July Social - confirmed date
In-Reply-To: <53BEBF35.7030906@jwdt.co.uk>
References: <CAP+pNoudw-GSZkpJzgu4z37nXF9XjTcK=h=KOj+Qe=YTHVfXXg@mail.gmail.com>
	<53BEBF35.7030906@jwdt.co.uk>
Message-ID: <CAP+pNou92Cy45M8CanF+Cmpj+mo9cnOydVGhP0O+4dBXJ3GM7w@mail.gmail.com>

Hi John

On 10 July 2014 17:28, John (AUTO) <auto at jwdt.co.uk> wrote:

> Hi James,
>
> I must apologise for my lack of being lately. Been rather busy with
> non-programming things and such.
>

I think you've only missed one, since the last one you were at :-)

 This one I shall, however, attend with certainty! :D
>

I look forward to it!

Cheers,

James
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/nottingham-pm/attachments/20140710/d618d7fe/attachment.html>

From jimbo.green at gmail.com  Tue Jul 29 10:26:06 2014
From: jimbo.green at gmail.com (James Green)
Date: Tue, 29 Jul 2014 18:26:06 +0100
Subject: [Nottingham-pm] Reminder: Nottingham.pm social at the Canalhouse,
	Thursday July 31st
Message-ID: <CAP+pNouo8P-19qi70P41WUhyWrjHAcXW+YHi7AHTZTmNWcZrLw@mail.gmail.com>

Hey all

Apparently this is only ~48 hours away. Hopefully see you there!

I'll be a little late, while I pop into the FM&C next door to sort out
some technical meeting details... but I should be there for 7.15 or so
:)

Cheers,
James

From jimbo.green at gmail.com  Tue Jul 29 10:35:55 2014
From: jimbo.green at gmail.com (James Green)
Date: Tue, 29 Jul 2014 18:35:55 +0100
Subject: [Nottingham-pm] Monitoring website uptimes
Message-ID: <CAP+pNou+JXj4TeLspW75wn0iG1+w=Sa3oh1Qq6r_1yhcqDmzqA@mail.gmail.com>

Hey folks,

Rather than my usual meeting-arranging blather on this list ... I have
an actual Perl-related question! OK, it's not very Perl related.

Following a bunch of recent conversations about the future of
search.cpan.org, and the fact it was seemingly down all the time, I've
started gathering stats on when both it, and metacpan.org, are
unreachable.

Unfortunately I'm getting a lot of what I suspect are false positives.
I'm using LWP::UserAgent to get() a specific search page from each
site, timing out after 30s, and if it hasn't loaded, considering it
"down" until the next check. This process runs every 2 minutes, from
cron. Quite often a site will fail to load just once, then be back up
the next time -- which is as likely to be a transient routing problem
at my end as an issue at theirs.

Does anyone have experience monitoring the availability of websites,
or exciting ideas for better approaches to this data?

The (only slightly icky) current collection code is at
https://github.com/jkg/cpan-uptime/ and you can see the way I
currently present it at http://cpan-uptime.dillig.af/, although I'm
trying not to publicise that too widely until there's more data, and I
have a better handle on presenting it...

All ideas welcome, on-list, off-list, or over beers on Thursday :-)

Cheers,

James

From duncanfyfe at domenlas.com  Tue Jul 29 11:45:54 2014
From: duncanfyfe at domenlas.com (Duncan Fyfe)
Date: Tue, 29 Jul 2014 19:45:54 +0100
Subject: [Nottingham-pm] Monitoring website uptimes
In-Reply-To: <CAP+pNou+JXj4TeLspW75wn0iG1+w=Sa3oh1Qq6r_1yhcqDmzqA@mail.gmail.com>
References: <CAP+pNou+JXj4TeLspW75wn0iG1+w=Sa3oh1Qq6r_1yhcqDmzqA@mail.gmail.com>
Message-ID: <53D7EBE2.8020507@domenlas.com>

On 29/07/14 18:35, James Green wrote:
> Hey folks,
> 
> Rather than my usual meeting-arranging blather on this list ... I have
> an actual Perl-related question! OK, it's not very Perl related.
> 
> Following a bunch of recent conversations about the future of
> search.cpan.org, and the fact it was seemingly down all the time, I've
> started gathering stats on when both it, and metacpan.org, are
> unreachable.
> 
> Unfortunately I'm getting a lot of what I suspect are false positives.
> I'm using LWP::UserAgent to get() a specific search page from each
> site, timing out after 30s, and if it hasn't loaded, considering it
> "down" until the next check. This process runs every 2 minutes, from
> cron. Quite often a site will fail to load just once, then be back up
> the next time -- which is as likely to be a transient routing problem
> at my end as an issue at theirs.
> 
> Does anyone have experience monitoring the availability of websites,
> or exciting ideas for better approaches to this data?
> 

Quick check, details below, but for starters it looks like there might
be a reverse DNS problem with metacpan.org. I'll have a more detailed
look later.

Have fun,
Duncan

=== DETAILS ===
nslookup search.cpan.org
Server:		194.168.4.100
Address:	194.168.4.100#53

Non-authoritative answer:
search.cpan.org	canonical name = cpansearch.perl.org.
Name:	cpansearch.perl.org
Address: 207.171.7.59
Name:	cpansearch.perl.org
Address: 207.171.7.49

wget http://search.cpan.org/search?query=Moose&mode=all
OK
wget http://207.171.7.59/search?query=Moose&mode=all
OK
wget http://207.171.7.49/search?query=Moose&mode=all
OK

nslookup metacpan.org
Server:		194.168.4.100
Address:	194.168.4.100#53

Non-authoritative answer:
Name:	metacpan.org
Address: 23.235.37.143
Name:	metacpan.org
Address: 23.235.33.143

wget https://metacpan.org/search?q=Moose
OK

wget --no-check-certificate https://23.235.37.143/search?q=Moose
--2014-07-29 19:04:46--  https://23.235.37.143/search?q=Moose
Connecting to 23.235.37.143:443... connected.
The certificate's owner does not match hostname ‘23.235.37.143’
HTTP request sent, awaiting response... 500 Domain Not Found
2014-07-29 19:04:46 ERROR 500: Domain Not Found.


wget --no-check-certificate https://23.235.33.143/search?q=Moose
--2014-07-29 19:05:15--  https://23.235.33.143/search?q=Moose
Connecting to 23.235.33.143:443... connected.
The certificate's owner does not match hostname ‘23.235.33.143’
HTTP request sent, awaiting response... 500 Domain Not Found
2014-07-29 19:05:15 ERROR 500: Domain Not Found.

nslookup 23.235.37.143
Server:		194.168.4.100
Address:	194.168.4.100#53

** server can't find 143.37.235.23.in-addr.arpa: NXDOMAIN

nslookup 23.235.33.143
Server:		194.168.4.100
Address:	194.168.4.100#53

** server can't find 143.33.235.23.in-addr.arpa: NXDOMAIN

ping -q -c 10 23.235.37.143
PING 23.235.37.143 (23.235.37.143) 56(84) bytes of data.

--- 23.235.37.143 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9014ms
rtt min/avg/max/mdev = 25.587/27.657/37.575/3.352 ms


ping -q -c 10 23.235.33.143
PING 23.235.33.143 (23.235.33.143) 56(84) bytes of data.

--- 23.235.33.143 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9013ms
rtt min/avg/max/mdev = 29.846/32.689/43.296/3.717 ms


From auto at jwdt.co.uk  Tue Jul 29 12:07:06 2014
From: auto at jwdt.co.uk (=?utf-8?B?YXV0b0Bqd2R0LmNvLnVr?=)
Date: Tue, 29 Jul 2014 20:07:06 +0100
Subject: [Nottingham-pm] =?utf-8?q?Monitoring_website_uptimes?=
Message-ID: <E1XCCjt-0000Mh-1W@mailscan7.hi.local>

I've done it in bash before a while ago, using wget. Though it was more a "download this page every minute and take some data from it" script, there was a bit of error reporting in there too. If I can find it I might be able to bring something to the table. :-)

Sent from my HTC

----- Reply message -----
From: "Duncan Fyfe" <duncanfyfe at domenlas.com>
To: "The Nottingham Perl Mongers" <nottingham-pm at pm.org>
Subject: [Nottingham-pm] Monitoring website uptimes
Date: Tue, Jul 29, 2014 19:45

On 29/07/14 18:35, James Green wrote:
> Hey folks,
> 
> Rather than my usual meeting-arranging blather on this list ... I have
> an actual Perl-related question! OK, it's not very Perl related.
> 
> Following a bunch of recent conversations about the future of
> search.cpan.org, and the fact it was seemingly down all the time, I've
> started gathering stats on when both it, and metacpan.org, are
> unreachable.
> 
> Unfortunately I'm getting a lot of what I suspect are false positives.
> I'm using LWP::UserAgent to get() a specific search page from each
> site, timing out after 30s, and if it hasn't loaded, considering it
> "down" until the next check. This process runs every 2 minutes, from
> cron. Quite often a site will fail to load just once, then be back up
> the next time -- which is as likely to be a transient routing problem
> at my end as an issue at theirs.
> 
> Does anyone have experience monitoring the availability of websites,
> or exciting ideas for better approaches to this data?
> 

Quick check, details below, but for starters it looks like there might
be a reverse DNS problem with metacpan.org. I'll have a more detailed
look later.

Have fun,
Duncan

=== DETAILS ===
nslookup search.cpan.org
Server:  194.168.4.100
Address: 194.168.4.100#53

Non-authoritative answer:
search.cpan.org canonical name = cpansearch.perl.org.
Name: cpansearch.perl.org
Address: 207.171.7.59
Name: cpansearch.perl.org
Address: 207.171.7.49

wget http://search.cpan.org/search?query=Moose&mode=all
OK
wget http://207.171.7.59/search?query=Moose&mode=all
OK
wget http://207.171.7.49/search?query=Moose&mode=all
OK

nslookup metacpan.org
Server:  194.168.4.100
Address: 194.168.4.100#53

Non-authoritative answer:
Name: metacpan.org
Address: 23.235.37.143
Name: metacpan.org
Address: 23.235.33.143

wget https://metacpan.org/search?q=Moose
OK

wget --no-check-certificate https://23.235.37.143/search?q=Moose
--2014-07-29 19:04:46--  https://23.235.37.143/search?q=Moose
Connecting to 23.235.37.143:443... connected.
The certificate's owner does not match hostname ‘23.235.37.143’
HTTP request sent, awaiting response... 500 Domain Not Found
2014-07-29 19:04:46 ERROR 500: Domain Not Found.


wget --no-check-certificate https://23.235.33.143/search?q=Moose
--2014-07-29 19:05:15--  https://23.235.33.143/search?q=Moose
Connecting to 23.235.33.143:443... connected.
The certificate's owner does not match hostname ‘23.235.33.143’
HTTP request sent, awaiting response... 500 Domain Not Found
2014-07-29 19:05:15 ERROR 500: Domain Not Found.

nslookup 23.235.37.143
Server:  194.168.4.100
Address: 194.168.4.100#53

** server can't find 143.37.235.23.in-addr.arpa: NXDOMAIN

nslookup 23.235.33.143
Server:  194.168.4.100
Address: 194.168.4.100#53

** server can't find 143.33.235.23.in-addr.arpa: NXDOMAIN

ping -q -c 10 23.235.37.143
PING 23.235.37.143 (23.235.37.143) 56(84) bytes of data.

--- 23.235.37.143 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9014ms
rtt min/avg/max/mdev = 25.587/27.657/37.575/3.352 ms


ping -q -c 10 23.235.33.143
PING 23.235.33.143 (23.235.33.143) 56(84) bytes of data.

--- 23.235.33.143 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9013ms
rtt min/avg/max/mdev = 29.846/32.689/43.296/3.717 ms


_______________________________________________
Nottingham-pm mailing list
Nottingham-pm at pm.org
http://mail.pm.org/mailman/listinfo/nottingham-pm
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/nottingham-pm/attachments/20140729/fc7fa85f/attachment.html>

From jim.a.driscoll at gmail.com  Tue Jul 29 12:33:18 2014
From: jim.a.driscoll at gmail.com (Jim Driscoll)
Date: Tue, 29 Jul 2014 20:33:18 +0100
Subject: [Nottingham-pm] Monitoring website uptimes
In-Reply-To: <53D7EBE2.8020507@domenlas.com>
References: <CAP+pNou+JXj4TeLspW75wn0iG1+w=Sa3oh1Qq6r_1yhcqDmzqA@mail.gmail.com>
	<53D7EBE2.8020507@domenlas.com>
Message-ID: <BE09CDA5-C4D7-4A8E-B827-C280060D4C13@googlemail.com>


> On 29 Jul 2014, at 19:45, Duncan Fyfe <duncanfyfe at domenlas.com> wrote:
> 
>> On 29/07/14 18:35, James Green wrote:
>> Hey folks,
>> 
>> Rather than my usual meeting-arranging blather on this list ... I have
>> an actual Perl-related question! OK, it's not very Perl related.
>> 
>> Following a bunch of recent conversations about the future of
>> search.cpan.org, and the fact it was seemingly down all the time, I've
>> started gathering stats on when both it, and metacpan.org, are
>> unreachable.
>> 
>> Unfortunately I'm getting a lot of what I suspect are false positives.
>> I'm using LWP::UserAgent to get() a specific search page from each
>> site, timing out after 30s, and if it hasn't loaded, considering it
>> "down" until the next check. This process runs every 2 minutes, from
>> cron. Quite often a site will fail to load just once, then be back up
>> the next time -- which is as likely to be a transient routing problem
>> at my end as an issue at theirs.
>> 
>> Does anyone have experience monitoring the availability of websites,
>> or exciting ideas for better approaches to this data?
> 
> Quick check, details below, but for starters it looks like there might
> be a reverse DNS problem with metacpan.org. I'll have a more detailed
> look later.

Just a misconfiguration on one of the servers, unlikely to be anything to do with reverse DNS at all, and certainly that would explain why it breaks "sometimes". For monitoring purposes there should be a connected IP address and port as properties of HTTP::Response (peeraddr/peerport maybe?), so you should log those on success or failure to identify if there is a bad server.

You can persuade LWP::UserAgent to connect to a specified IP address via its proxy functionality I think, so looping over all IP addresses it maps to on each test cycle would be viable and useful.

On the subject of eliminating local connectivity problems, just ensure that there is also a control (just some unrelated site or sites) which you're also monitoring at the same time - if the control is also down then it's probably your connection.

Jim

From duncanfyfe at domenlas.com  Tue Jul 29 16:30:22 2014
From: duncanfyfe at domenlas.com (Duncan Fyfe)
Date: Wed, 30 Jul 2014 00:30:22 +0100
Subject: [Nottingham-pm] Monitoring website uptimes
In-Reply-To: <BE09CDA5-C4D7-4A8E-B827-C280060D4C13@googlemail.com>
References: <CAP+pNou+JXj4TeLspW75wn0iG1+w=Sa3oh1Qq6r_1yhcqDmzqA@mail.gmail.com>	<53D7EBE2.8020507@domenlas.com>
	<BE09CDA5-C4D7-4A8E-B827-C280060D4C13@googlemail.com>
Message-ID: <53D82E8E.3010707@domenlas.com>

On 29/07/14 20:33, Jim Driscoll wrote:
> 
>> On 29 Jul 2014, at 19:45, Duncan Fyfe <duncanfyfe at domenlas.com> wrote:
>>
>>> On 29/07/14 18:35, James Green wrote:
>>> Hey folks,
>>>
>>> Rather than my usual meeting-arranging blather on this list ... I have
>>> an actual Perl-related question! OK, it's not very Perl related.
>>>
>>> Following a bunch of recent conversations about the future of
>>> search.cpan.org, and the fact it was seemingly down all the time, I've
>>> started gathering stats on when both it, and metacpan.org, are
>>> unreachable.
>>>
>>> Unfortunately I'm getting a lot of what I suspect are false positives.
>>> I'm using LWP::UserAgent to get() a specific search page from each
>>> site, timing out after 30s, and if it hasn't loaded, considering it
>>> "down" until the next check. This process runs every 2 minutes, from
>>> cron. Quite often a site will fail to load just once, then be back up
>>> the next time -- which is as likely to be a transient routing problem
>>> at my end as an issue at theirs.
>>>
>>> Does anyone have experience monitoring the availability of websites,
>>> or exciting ideas for better approaches to this data?
>>
>> Quick check, details below, but for starters it looks like there might
>> be a reverse DNS problem with metacpan.org. I'll have a more detailed
>> look later.
> 
> Just a misconfiguration on one of the servers, unlikely to be anything to do with reverse DNS at all,

James' problem may not be due to a reverse DNS lookup problem but it
still stands that reverse DNS lookups on the metacpan.org IP addresses
fail (see the nslookup output in my first reply).


James - can you confirm it is the accessibility of search.cpan.org
itself that people are concerned about ie. can you check if it is CPAN
they have a problem with or a CPAN mirror  ?
I've had problems before with the CPAN mirrors "automagically" chosen by
cpan configuration before.


Back to your test script.  How frequent are the failures or put another
way, how many times would you expect to have to run it before you saw
a failure ?

Have fun,
Duncan

From duncanfyfe at domenlas.com  Wed Jul 30 07:11:50 2014
From: duncanfyfe at domenlas.com (Duncan Fyfe)
Date: Wed, 30 Jul 2014 15:11:50 +0100
Subject: [Nottingham-pm] Monitoring website uptimes
In-Reply-To: <53D82E8E.3010707@domenlas.com>
References: <CAP+pNou+JXj4TeLspW75wn0iG1+w=Sa3oh1Qq6r_1yhcqDmzqA@mail.gmail.com>	<53D7EBE2.8020507@domenlas.com>	<BE09CDA5-C4D7-4A8E-B827-C280060D4C13@googlemail.com>
	<53D82E8E.3010707@domenlas.com>
Message-ID: <53D8FD26.4060500@domenlas.com>

On 30/07/14 00:30, Duncan Fyfe wrote:
> 
> 
> Back to your test script.  How frequent are the failures or put another
> way, how many times would you expect to have to run it before you saw
> a failure ?
> 
Ok. I've modified the script (git patch attached, do as you will with it):
	a) for fun;
	b) to get it running on my server (so I don't have to install
Config::YAML);
	c) to dump more HTTP information in the event of a failure;
	d) to add https://www.google.com as a "really ought to be working" control.

Having run every 2 minutes since last night (> 360 tries) and I've not
seen any failures.  I'll get is up and running on another server
(different physical locations and network paths) and see what happens
there.

Have fun,
Duncan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Replaced-dependence-on-Config-YAML-with-FindBin-my-s.patch
Type: text/x-patch
Size: 6801 bytes
Desc: not available
URL: <http://mail.pm.org/pipermail/nottingham-pm/attachments/20140730/db80f44f/attachment-0001.bin>

From duncanfyfe at domenlas.com  Wed Jul 30 08:56:28 2014
From: duncanfyfe at domenlas.com (Duncan Fyfe)
Date: Wed, 30 Jul 2014 16:56:28 +0100
Subject: [Nottingham-pm] Monitoring website uptimes
In-Reply-To: <53D8FD26.4060500@domenlas.com>
References: <CAP+pNou+JXj4TeLspW75wn0iG1+w=Sa3oh1Qq6r_1yhcqDmzqA@mail.gmail.com>	<53D7EBE2.8020507@domenlas.com>	<BE09CDA5-C4D7-4A8E-B827-C280060D4C13@googlemail.com>	<53D82E8E.3010707@domenlas.com>
	<53D8FD26.4060500@domenlas.com>
Message-ID: <53D915AC.9070301@domenlas.com>

On 30/07/14 15:11, Duncan Fyfe wrote:
> On 30/07/14 00:30, Duncan Fyfe wrote:
>>
>>
>> Back to your test script.  How frequent are the failures or put another
>> way, how many times would you expect to have to run it before you saw
>> a failure ?
>>
> Ok. I've modified the script (git patch attached, do as you will with it):
> 	a) for fun;
> 	b) to get it running on my server (so I don't have to install
> Config::YAML);
> 	c) to dump more HTTP information in the event of a failure;
> 	d) to add https://www.google.com as a "really ought to be working" control.
> 
> Having run every 2 minutes since last night (> 360 tries) and I've not
> seen any failures.  I'll get is up and running on another server
> (different physical locations and network paths) and see what happens
> there.
> 

Couple more patches as promised.  The first just removes an unnecessary
dependency.  The second adds new tables (results_2 and dumps_2) which
have an added hostid column so we can merge results.  It also
adds a quick bash script, with the necessary SQL, to copy data from
the results to results_2 table.

Have fun,
Duncan


-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Remove-unnecessary-dependence-on-Data-Dumper.patch
Type: text/x-patch
Size: 518 bytes
Desc: not available
URL: <http://mail.pm.org/pipermail/nottingham-pm/attachments/20140730/226e206a/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-Deprecate-results-and-dumps-tables.patch
Type: text/x-patch
Size: 6066 bytes
Desc: not available
URL: <http://mail.pm.org/pipermail/nottingham-pm/attachments/20140730/226e206a/attachment-0001.bin>

From jkg at earth.li  Wed Jul 30 14:47:15 2014
From: jkg at earth.li (James Green)
Date: Wed, 30 Jul 2014 22:47:15 +0100
Subject: [Nottingham-pm] Monitoring website uptimes
In-Reply-To: <53D915AC.9070301@domenlas.com>
References: <CAP+pNou+JXj4TeLspW75wn0iG1+w=Sa3oh1Qq6r_1yhcqDmzqA@mail.gmail.com>
	<53D7EBE2.8020507@domenlas.com>
	<BE09CDA5-C4D7-4A8E-B827-C280060D4C13@googlemail.com>
	<53D82E8E.3010707@domenlas.com> <53D8FD26.4060500@domenlas.com>
	<53D915AC.9070301@domenlas.com>
Message-ID: <CAP+pNotHHc9Nsh8CEYW=6CBwHZE_vgkgr7o_puNnBFtgALvBCA@mail.gmail.com>

Hi all, thanks for the responses!

On 30 July 2014 16:56, Duncan Fyfe <duncanfyfe at domenlas.com> wrote:
> On 30/07/14 15:11, Duncan Fyfe wrote:
>> On 30/07/14 00:30, Duncan Fyfe wrote:

>>> Back to your test script.  How frequent are the failures or put another
>>> way, how many times would you expect to have to run it before you saw
>>> a failure ?

Interestingly there haven't been any failures at all in the last 24
hours -- looking back further, I'm averaging 3-4 per day (of 720
attempts) for MetaCPAN and closer to 1 for CPANsearch, almost never at
the same time as one another (although if whichever I check first-
which I seem to recall is randomised - times out, the other test
starts up to 30 seconds later, so any local issue could have cleared
up)

That makes me wonder if I should do the checks asynchronously and kick
them all off as close together as possible. Hmm.

>> Ok. I've modified the script (git patch attached, do as you will with it):

[some elided]

>>       c) to dump more HTTP information in the event of a failure;

I considered that; but I decided I didn't want data I'd have to
manually process. I guess you're right though -- I should capture it
somewhere, at least, for later reference...

> Couple more patches as promised.  The first just removes an unnecessary
> dependency.

Whoops. Leaving Data::Dumper lying around in production is a bad habit
of mine...

> The second adds new tables (results_2 and dumps_2) which
> have an added hostid column so we can merge results.  It also
> adds a quick bash script, with the necessary SQL, to copy data from
> the results to results_2 table.

I'll take a closer look at this (probably at the weekend) but it
sounds like a good idea -- more data will hide any minor anomalies,
and testing from more locations will rule out any local issues. I had
been trying to think of a (fair, sane) way to "smooth" the data by
just dropping things that were clearly not real problems -- perhaps
re-testing more frequently after a failure and ignoring it if the
recovery was quick -- but this is probably a more sensible approach.

Thanks again,

James

From duncanfyfe at domenlas.com  Wed Jul 30 15:39:10 2014
From: duncanfyfe at domenlas.com (Duncan Fyfe)
Date: Wed, 30 Jul 2014 23:39:10 +0100
Subject: [Nottingham-pm] Monitoring website uptimes
In-Reply-To: <CAP+pNotHHc9Nsh8CEYW=6CBwHZE_vgkgr7o_puNnBFtgALvBCA@mail.gmail.com>
References: <CAP+pNou+JXj4TeLspW75wn0iG1+w=Sa3oh1Qq6r_1yhcqDmzqA@mail.gmail.com>	<53D7EBE2.8020507@domenlas.com>	<BE09CDA5-C4D7-4A8E-B827-C280060D4C13@googlemail.com>	<53D82E8E.3010707@domenlas.com>
	<53D8FD26.4060500@domenlas.com>	<53D915AC.9070301@domenlas.com>
	<CAP+pNotHHc9Nsh8CEYW=6CBwHZE_vgkgr7o_puNnBFtgALvBCA@mail.gmail.com>
Message-ID: <53D9740E.5070404@domenlas.com>


> 
> Interestingly there haven't been any failures at all in the last 24
> hours -- looking back further, I'm averaging 3-4 per day (of 720
> attempts) for MetaCPAN and closer to 1 for CPANsearch, almost never at
> the same time as one another (although if whichever I check first-
> which I seem to recall is randomised - times out, the other test
> starts up to 30 seconds later, so any local issue could have cleared
> up)
> 
> That makes me wonder if I should do the checks asynchronously and kick
> them all off as close together as possible. Hmm.

Makes me wonder if metacpan has a low maximum number of concurrent
connections and your failed connections just happen to hit a busy
period (such as when lots of mirrors are resynching).  A concurrency
problem might be down to limits on the webserver or backend DB.

> 
>>> Ok. I've modified the script (git patch attached, do as you will with it):
> 
> [some elided]
> 
>>>       c) to dump more HTTP information in the event of a failure;
> 
> I considered that; but I decided I didn't want data I'd have to
> manually process. I guess you're right though -- I should capture it
> somewhere, at least, for later reference...

I just want more data from at least one failure. After a few failures
the dump code can be disabled.

> 
>> Couple more patches as promised.  The first just removes an unnecessary
>> dependency.
> 
> Whoops. Leaving Data::Dumper lying around in production is a bad habit
> of mine...

It was me this time.  I have to admit though, I have used Data::Dumper
as part of the Logging and Exception handling in production code[1] and
would not be afraid to do so again. It is a really powerful tool for
debugging subtle problems, but like any powerful tool there are places
you can safely use them and there are times you should have known better.

[1] https://github.com/DuncanFyfe/application-toolkit-perl Msg.pm and
Exception.pm classes.

> 
>> The second adds new tables (results_2 and dumps_2) which
>> have an added hostid column so we can merge results.  It also
>> adds a quick bash script, with the necessary SQL, to copy data from
>> the results to results_2 table.
> 
> I'll take a closer look at this (probably at the weekend) but it
> sounds like a good idea -- more data will hide any minor anomalies,
> and testing from more locations will rule out any local issues. I had
> been trying to think of a (fair, sane) way to "smooth" the data by
> just dropping things that were clearly not real problems -- perhaps
> re-testing more frequently after a failure and ignoring it if the
> recovery was quick -- but this is probably a more sensible approach.
> 

I wouldn't invest too much more time in the script.  We can easily
filter the data as necessary into appropriate subsets.   The most
interesting data will be that from multiple machines close to a failure
on one machine.

The script as is should reveal symptoms eg: Are the observed failures
are specific to a machine or to particular times (eg. coinciding with
multiple mirrors synching) but I fear it will be difficult to get a
definitive cause without access to the webserver log files.

Have fun,
Duncan

From duncanfyfe at domenlas.com  Thu Jul 31 09:12:52 2014
From: duncanfyfe at domenlas.com (Duncan Fyfe)
Date: Thu, 31 Jul 2014 17:12:52 +0100
Subject: [Nottingham-pm] Monitoring website uptimes
In-Reply-To: <53D9740E.5070404@domenlas.com>
References: <CAP+pNou+JXj4TeLspW75wn0iG1+w=Sa3oh1Qq6r_1yhcqDmzqA@mail.gmail.com>	<53D7EBE2.8020507@domenlas.com>	<BE09CDA5-C4D7-4A8E-B827-C280060D4C13@googlemail.com>	<53D82E8E.3010707@domenlas.com>	<53D8FD26.4060500@domenlas.com>	<53D915AC.9070301@domenlas.com>	<CAP+pNotHHc9Nsh8CEYW=6CBwHZE_vgkgr7o_puNnBFtgALvBCA@mail.gmail.com>
	<53D9740E.5070404@domenlas.com>
Message-ID: <53DA6B04.8000301@domenlas.com>


Final (honest!) patch attached.
It just adds a quick script to merge rows from different databases
results_2 and dumps_2 tables into one.

See you later.

Have fun,
Duncan


-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0004-Add-perl-utility-to-merge-results-databases.patch
Type: text/x-patch
Size: 1605 bytes
Desc: not available
URL: <http://mail.pm.org/pipermail/nottingham-pm/attachments/20140731/d94fc339/attachment.bin>

From jimbo.green at gmail.com  Thu Jul 31 09:16:31 2014
From: jimbo.green at gmail.com (James Green)
Date: Thu, 31 Jul 2014 17:16:31 +0100
Subject: [Nottingham-pm] Monitoring website uptimes
In-Reply-To: <53DA6B04.8000301@domenlas.com>
References: <CAP+pNou+JXj4TeLspW75wn0iG1+w=Sa3oh1Qq6r_1yhcqDmzqA@mail.gmail.com>
	<53D7EBE2.8020507@domenlas.com>
	<BE09CDA5-C4D7-4A8E-B827-C280060D4C13@googlemail.com>
	<53D82E8E.3010707@domenlas.com> <53D8FD26.4060500@domenlas.com>
	<53D915AC.9070301@domenlas.com>
	<CAP+pNotHHc9Nsh8CEYW=6CBwHZE_vgkgr7o_puNnBFtgALvBCA@mail.gmail.com>
	<53D9740E.5070404@domenlas.com> <53DA6B04.8000301@domenlas.com>
Message-ID: <CAP+pNosoZsd0pEy1T55xfutFpHw_4Jj6H+98BDABScfjY2BNqA@mail.gmail.com>

Hey Duncan

On 31 July 2014 17:12, Duncan Fyfe <duncanfyfe at domenlas.com> wrote:
>
> Final (honest!) patch attached.
> It just adds a quick script to merge rows from different databases
> results_2 and dumps_2 tables into one.

Thanks again -- I'll get these merged and apply some of my own
prejudices regarding naming etc, and get the new version live
including merging in the existing data at the weekend, with any luck

Really appreciate it -- got to be worth a beer tonight, if you can
still make it :-)

Cheers,
James