Flash video downloader in perl

Tom Hukins tom at eborcom.com
Tue Jan 27 01:26:00 PST 2009


On Mon, Jan 26, 2009 at 11:14:55PM +0000, Andy Selby wrote:
> 
> I'm in the process of writing a program that downloads flash video
> from sites that, unlike youtube, doesn't cache the file at
> /tmp/Flash*, instead I'm having to visually search (because there are
> no line breaks) /tmp/plugtmp*/plugin-PlaylistInfoService-1.asmx to
> find the download link.

Normally, I'd say "that sounds like a job for WWW::Mechanize, but I
don't think it makes life especially easy for getting at SWF files.

However, you might find LWP::Simple (or perhaps Mech itself) useful
for fetching the HTML rather than grubbing around in /tmp and making
assumptions about where the browser happens to cache the HTML.

I'd use HTML::Tree for parsing the HTML because it has a lovely
interface and makes code more readable and robust than parsing HTML
with regular expressions.

>       6 while(<>)   #This will be replaced by a command to read
> /tmp/plugtmp*/plugin-PlaylistInfoService-1.asmx

I'd use LWP::Simple::get() here.

>       8 while (/\"\&gt\;(http\:\/\/flash\.vx\.roo\.com\/streamingVX\/\d*\/\d*\/(\d*|\d*\-\d*\-\d*)\_\w*.flv)\&lt/g)

A few comments on this regex:

If you find yourself regularly escaping the '/' character use a
different delimiter.  m{http://} looks cleaner than /http:\/\//.

Break your regex into multiple lines using the /x modifier.  You don't
write lots of Perl expressions on one line - you break them up into
multiple lines and comment them.  The /x modifier lets you do the same
with regular expressions:
http://www.perl.com/pub/a/2004/01/16/regexps.html

>      10 print "$1\n";

At this point you could use LWP::Simple::get() to fetch the SWF file,
assuming that's all you need.  If the SWF itself tries to fetch other
files, take a look at Firefox's Firebug plugin and use its 'Net' tab
to see what other files get downloaded.

Finally, I notice you haven't indented your blocks:
if ($x) {
if ($y) {
print "x and y are true\n";
}
}

You may not have done this in your real code, but if you have,
indenting makes your code far easier to read:
if ($x) {
    if ($y) {
        print "x and y are true\n";
    }
}

If you find your code has lots of indentation and moves too far to the
right of the screen, that's a good sign you need to use subroutines.
I'd consider using subroutines even in a short program like this as
they help make each part of the program self-contained and
self-explanatory.

For example, the main shell of the program might look like:
sub main {
    my $url = shift;
    my $html = get($url);
    my $swf_url = find_swf_url($html);
    get_store($swf_url);
}
main($ARGV[0]);

LWP::Simple defines get() and get_store().  I might have got the
get_store() call wrong.  But the point is, you've made the high level
code readable.  Using subroutines makes it easier to write unit tests
for your code, should you decide to do that, but that's probably
another story for another day.  There's no harm in making life easier
for your future self, though, if it takes minimal effort.

I hope you find that useful.  Let us know if you get stuck.

And, of course, I'll see you and everyone else in the pub this
evening.

Tom


More information about the MiltonKeynes-pm mailing list