[pm-h] Marriage License Data
Mike Flannigan
mikeflan at att.net
Wed Jan 17 15:40:47 PST 2018
Thanks so much for looking into that Robert.
I was trying to download this information by doing a
LWP::Simple
my $retcode1 = getstore( $second, "$dir/$first" );
on links like this:
https://web.archive.org/web/20131005142948/http://freepages.genealogy.rootsweb.ancestry.com:80/~caulleyfamilyinfo/MissouriMarriages/Franklin18451864BookBConsolidatedIndex.txt
Which gives me a text file similar to the attached HTM file.
That file has a bunch of HTML in it that produces the data in a
text scroll if you open it in a browser. I am embarrassed to
say that all my efforts to obtain the data straight away were
unsuccessful. I expected
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->agent("Mozilla/8.0"); # pretend we are very capable browser
or
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->agent("$0/0.1 " . $ua->agent);
to work, but they download the same HTML file
as the attached one.
I'll probably figure this out someday, but for the moment I am
trying to limit the time I spend on this embarrassing situation :-)
But I just can't help myself. I am still working on it a little.
This is not a lot of data, so I can certainly get it, but I am
more interested in fixing the problem I have obtaining the data
than actually getting the data. I don't really care too much
about the data, but others do.
Mike
On 1/17/2018 1:21 PM, Robert Stone wrote:
> (resending this without images due to mailing list size limit)
>
> Greetings,
>
> tl;dr - While the form likely connected to a database/datastore and
> there is no way to retrieve that, the wayback machine archived a lot
> (but not all) of the data in another format.
>
> *The Bad News*
>
> So for funsies I took a look at this form and the HTML for it. Turns
> out that the information entered is POST'ed back to the server at
> yearlastwild.asp to handle the request. Just to be absolutely
> certain, I went ahead and submitted a request monitoring the network
> traffic and confirmed the POST request. That ASP script was likely
> connecting to some sort of database to retrieve and then format the
> data for presentation.
>
> Just to be SUPER certain there wasn't a whole huge blob of javascript
> representing the dataset (which would be incredibly unlikely, but you
> never know...) and the largest request is 27.7 KB, and it's for a font.
>
> *The Good News*
> *
> *
> Well, then, let's see if the marriage data is presented in any other
> format on the site, like a big huge list. Crazier things have happened...
>
> https://web.archive.org/web/20030208012802/http://vienici.com:80/abmomarr.html
> <https://web.archive.org/web/20030208012802/http://vienici.com:80/abmomarr.html>
>
>
> If we scroll down we can see Washington County and if we select the
> He- we can see the same entry for Henry S:
>
> https://web.archive.org/web/20030219131906/http://vienici.com:80/moabs/xmarrwash/xhe-j.html
> <https://web.archive.org/web/20030219131906/http://vienici.com:80/moabs/xmarrwash/xhe-j.html>
>
> Which actually matches the data from yearlastwild.asp (although, only
> the name and date are contained here and not the description).
>
> So it seems for washing county there is some data and possibly more
> from the Washing County GenWeb. I do see for other counties there is
> much more data, such as Franklin County:
> https://web.archive.org/web/20030407195843/http://www.vienici.com:80/mofran/vB/p201225.html
> <https://web.archive.org/web/20030407195843/http://www.vienici.com:80/mofran/vB/p201225.html>
>
> With some work and a whole bunch of parsing you could recreate a good
> chunk! Of course, I'd probably hunt high and low to see if someone
> else had this dataset I could use (or buy) but nice to know at least
> parts of it live on.
>
> Hopefully you find the above helpful.
>
> Best Regards,
> Robert Stone
>
> On Tue, Jan 16, 2018 at 8:54 PM, Mike Flannigan <mikeflan at att.net
> <mailto:mikeflan at att.net>> wrote:
>
>
> This is an archive of a website that went dead in 2011:
> https://web.archive.org/web/20090609191130/http://www.vienici.com:80/moabs/lookups.html
> <https://web.archive.org/web/20090609191130/http://www.vienici.com:80/moabs/lookups.html>
>
> The 3rd search box (link) takes you to:
> https://web.archive.org/web/20090306211924/http://www.vienici.com:80/moabs/yearlastwild.asp
> <https://web.archive.org/web/20090306211924/http://www.vienici.com:80/moabs/yearlastwild.asp>
>
> The search does not work on that page, for obvious reasons. I have
> looked at
> the page source and decided the search was run by javascript, but
> I could be
> wrong about that. If you are snowed in and have some time to
> devote to this,
> what I want to know is what format was the marriage license data
> in on this
> guys server. I don't think that can be told from the page source,
> but I thought
> I would ask you guys. Perhaps you would need the ASP file to tell
> that??
> It was not a huge amount of data, so it could have been in almost
> any format.
>
> The reason I am asking is because we are trying to find that data
> 6 years
> after the guy died.
>
> I'm pretty sure he had an account at the Wayback Machine, and he
> may have stored
> the data there, in addition to other places.
>
>
> Mike
> _______________________________________________
> Houston mailing list
> Houston at pm.org <mailto:Houston at pm.org>
> http://mail.pm.org/mailman/listinfo/houston
> <http://mail.pm.org/mailman/listinfo/houston>
> Website: http://houston.pm.org/
>
>
>
>
> _______________________________________________
> Houston mailing list
> Houston at pm.org
> http://mail.pm.org/mailman/listinfo/houston
> Website: http://houston.pm.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/houston/attachments/20180117/9d016d99/attachment.html>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/houston/attachments/20180117/9d016d99/attachment.htm>
More information about the Houston
mailing list