[pm-h] Marriage License Data

Mike Flannigan mikeflan at att.net
Wed Jan 17 15:40:47 PST 2018


Thanks so much for looking into that Robert.

I was trying to download this information by doing a
LWP::Simple
my $retcode1 = getstore( $second, "$dir/$first" );
on links like this:
https://web.archive.org/web/20131005142948/http://freepages.genealogy.rootsweb.ancestry.com:80/~caulleyfamilyinfo/MissouriMarriages/Franklin18451864BookBConsolidatedIndex.txt

Which gives me a text file similar to the attached HTM file.
That file has a bunch of HTML in it that produces the data in a
text scroll if you open it in a browser.  I am embarrassed to
say that all my efforts to obtain the data straight away were
unsuccessful.  I expected
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->agent("Mozilla/8.0"); # pretend we are very capable browser

or

use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->agent("$0/0.1 " . $ua->agent);

to work, but they download the same HTML file
as the attached one.

I'll probably figure this out someday, but for the moment I am
trying to limit the time I spend on this embarrassing situation :-)
But I just can't help myself.  I am still working on it a little.

This is not a lot of data, so I can certainly get it, but I am
more interested in fixing the problem I have obtaining the data
than actually getting the data.  I don't really care too much
about the data, but others do.



Mike



On 1/17/2018 1:21 PM, Robert Stone wrote:
> (resending this without images due to mailing list size limit)
>
> Greetings,
>
> tl;dr - While the form likely connected to a database/datastore and 
> there is no way to retrieve that, the wayback machine archived a lot 
> (but not all) of the data in another format.
>
> *The Bad News*
>
> So for funsies I took a look at this form and the HTML for it.  Turns 
> out that the information entered is POST'ed back to the server at 
> yearlastwild.asp to handle the request.  Just to be absolutely 
> certain, I went ahead and submitted a request monitoring the network 
> traffic and confirmed the POST request. That ASP script was likely 
> connecting to some sort of database to retrieve and then format the 
> data for presentation.
>
> Just to be SUPER certain there wasn't a whole huge blob of javascript 
> representing the dataset (which would be incredibly unlikely, but you 
> never know...) and the largest request is 27.7 KB, and it's for a font.
>
> *The Good News*
> *
> *
> Well, then, let's see if the marriage data is presented in any other 
> format on the site, like a big huge list.  Crazier things have happened...
>
> https://web.archive.org/web/20030208012802/http://vienici.com:80/abmomarr.html 
> <https://web.archive.org/web/20030208012802/http://vienici.com:80/abmomarr.html>
>
>
> If we scroll down we can see Washington County and if we select the 
> He- we can see the same entry for Henry S:
>
> https://web.archive.org/web/20030219131906/http://vienici.com:80/moabs/xmarrwash/xhe-j.html 
> <https://web.archive.org/web/20030219131906/http://vienici.com:80/moabs/xmarrwash/xhe-j.html>
>
> Which actually matches the data from yearlastwild.asp (although, only 
> the name and date are contained here and not the description).
>
> So it seems for washing county there is some data and possibly more 
> from the Washing County GenWeb.  I do see for other counties there is 
> much more data, such as Franklin County:
> https://web.archive.org/web/20030407195843/http://www.vienici.com:80/mofran/vB/p201225.html 
> <https://web.archive.org/web/20030407195843/http://www.vienici.com:80/mofran/vB/p201225.html>
>
> With some work and a whole bunch of parsing you could recreate a good 
> chunk!  Of course, I'd probably hunt high and low to see if someone 
> else had this dataset I could use (or buy) but nice to know at least 
> parts of it live on.
>
> Hopefully you find the above helpful.
>
> Best Regards,
> Robert Stone
>
> On Tue, Jan 16, 2018 at 8:54 PM, Mike Flannigan <mikeflan at att.net 
> <mailto:mikeflan at att.net>> wrote:
>
>
>     This is an archive of a website that went dead in 2011:
>     https://web.archive.org/web/20090609191130/http://www.vienici.com:80/moabs/lookups.html
>     <https://web.archive.org/web/20090609191130/http://www.vienici.com:80/moabs/lookups.html>
>
>     The 3rd search box (link) takes you to:
>     https://web.archive.org/web/20090306211924/http://www.vienici.com:80/moabs/yearlastwild.asp
>     <https://web.archive.org/web/20090306211924/http://www.vienici.com:80/moabs/yearlastwild.asp>
>
>     The search does not work on that page, for obvious reasons. I have
>     looked at
>     the page source and decided the search was run by javascript, but
>     I could be
>     wrong about that.  If you are snowed in and have some time to
>     devote to this,
>     what I want to know is what format was the marriage license data
>     in on this
>     guys server.  I don't think that can be told from the page source,
>     but I thought
>     I would ask you guys.  Perhaps you would need the ASP file to tell
>     that??
>     It was not a huge amount of data, so it could have been in almost
>     any format.
>
>     The reason I am asking is because we are trying to find that data
>     6 years
>     after the guy died.
>
>     I'm pretty sure he had an account at the Wayback Machine, and he
>     may have stored
>     the data there, in addition to other places.
>
>
>     Mike
>     _______________________________________________
>     Houston mailing list
>     Houston at pm.org <mailto:Houston at pm.org>
>     http://mail.pm.org/mailman/listinfo/houston
>     <http://mail.pm.org/mailman/listinfo/houston>
>     Website: http://houston.pm.org/
>
>
>
>
> _______________________________________________
> Houston mailing list
> Houston at pm.org
> http://mail.pm.org/mailman/listinfo/houston
> Website: http://houston.pm.org/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/houston/attachments/20180117/9d016d99/attachment.html>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/houston/attachments/20180117/9d016d99/attachment.htm>


More information about the Houston mailing list