[Nh-pm] npr2mp3
Kevin D. Clark
kevin_d_clark at access-4-free.com
Tue Feb 22 21:30:52 PST 2005
Version 0.2 of my npr2mp3 script is attached.
Enjoy!
--kevin
--
GnuPG ID: B280F24E And the madness of the crowd
alumni.unh.edu!kdc Is an epileptic fit
-- Tom Waits
-------------- next part --------------
#!/usr/bin/perl -w
# Author: Kevin D. Clark (alumni.unh.edu!kdc)
# Copyright 2005 Kevin D. Clark
# This program makes generates content (for example, mp3 files) from
# content sites (like NPR) which allow access to their content but not in
# a format that is always convenient for their audience.
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# (version 2) as published by the Free Software Foundation.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
###########################################################################
# Usage: npr2mp3 <showname>
# where <showname> is the name of the show that you want
#
# See %url_info for a list of shows.
# ("npr5sum" is a good place to start)
# What you need to run this program:
#
# A Linux box, Perl, a properly setup soundcard, RealPlayer,
# sox, vsound, notlame
###########################################################################
# "An explanation of our rejection of respondents' unprecedented attempt
# to impose copyright liability upon the distributors of copying
# equipment requires a quite detailed recitation of the findings of the
# District Court. In summary, those findings reveal that the average
# member of the public uses a VTR principally to record a program he
# cannot view as it is being televised and then to watch it once at a
# later time. This practice, known as "time-shifting," enlarges the
# television viewing audience. For that reason, a significant amount of
# television programming may be used in this manner without objection
# from the owners of the copyrights on the programs. For the same
# reason, even the two respondents in this case, who do assert
# objections to time-shifting in this litigation, were unable to prove
# that the practice has impaired the commercial value of their
# copyrights or has created any likelihood of future harm. Given these
# findings, there is no basis in the Copyright Act upon which
# respondents can hold petitioners liable for distributing VTR's to the
# general public. The Court of Appeals' holding that respondents are
# entitled to enjoin the distribution of VTR's, to collect royalties on
# the sale of such equipment, or to obtain other relief, if affirmed,
# would enlarge the scope of respondents' statutory monopolies to
# encompass control over an article of commerce that is not the subject
# of copyright protection. Such an expansion of the copyright privilege
# is beyond the limits of the grants authorized by Congress."
#
# Supreme Court Justice John Paul Stevens, writing for the majority,
# SONY CORP. v. UNIVERSAL CITY STUDIOS, INC., 464 U.S. 417 (1984)
# Digital files cannot be made uncopyable, any more than water
# can be made not wet.
# --Bruce Schneier
# Kevin's comment: if you use this script and haven't donated generously
# to your local NPR station, bad karma is coming your way.
###########################################################################
# Version history:
#
# 0.1 - 11-feb-2005
# Initial version.
#
# 0.2 - 22-feb-2005
# Changed to no longer depend on C program built on on the fly; this
# program now uses "vsound" to capture audio. Thanks to Travis for
# telling me about vsound! An added benefit of using vsound is that
# you can encode content without it being directed to your speakers
# (so, you can listen to one program while you encode another).
# I am told that invoking vsound with "--dspout" turns this on/off.
# Another benefit of using vsound is that this eliminates the
# weirdness seen on platforms like Fedora Core 2 (very slow audio).
#
#
#
#
#
###########################################################################
# TODO:
# The code could always be cleaned up a bit more.
#
# Add documentation.
#
# Add option to make RealAudio silent, just save to the raw file
# and generate the mp3 file. This would be useful if you want to listen
# to something else while you're making an mp3. (FIXED IN 0.2)
#
# Invoke sox with the earwax option. (needs some work with vsound)
#
# Investigate weirdness on FC2 (FIXED IN 0.2)
#
# Make sampling rate flexible (FIXED)
#
# Make this work on other interesting platforms, like FreeBSD and MacOSX.
#
# Investigate what it would take to run this on a system without
# a soundcard. (for Travis)
#
# Investigate what it would take to run this on a system
# without X. (for Travis)
#
# Make the whole program even more flexible so that if today's show isn't
# available when you ask for it, yesterday's is retrieved instead.
#
# Dump the MP3 files to a more logical or configurable location.
#
# This might be getting to the size where we could make some of this
# more OO, as well as split out the audio-saving functionality.
#
# Put a failsafe in the code that catches the situation where close()
# isn't called -- at that point the code that encodes the sampling speed
# in the filename isn't called, which means that a stream that we might have
# been downloading for a while could get nuked.
#
# Perhaps we could enode the mp3's at a lower sample rate, to make the
# resulting files smaller?
#
# Come up with a general way to pass arguments to vsound and notlame.
use strict;
use LWP::UserAgent;
use HTML::TokeParser;
use POSIX (qw/dup2/);
use Getopt::Std;
use Date::Calc qw(Today Day_of_Week Add_Delta_Days Date_to_Time
Day_of_Week_to_Text Date_to_Text Localtime);
# Q: How do I install the Date::Calc module on my computer?
# A: One way would be to, as root, type "perl -MCPAN -e shell" and then
# at the prompt type "install Data::Calc".
###########################################################################
#
# GLOBAL DATA STRUCTURES
#
my $DEBUG = 0;
my $show = "npr5sum"; # default show we are interested in
# information about various websites
my $day_of_week_re = qr(monday|tuesday|wednesday|thursday|friday|saturday|sunday)i;
my %url_info = (
cartalktest => {
# Where is the web page that contains the RealAudio link?
# This can either be a sting that contains the URL or a ref to a
# function that returns a string that contains the URL.
url => "http://cartalk.com/Radio/Show/online.html",
# what RealAudio link are we looking for?
# This is a regular expression.
ralinkre => qr(^\s*Segment\s+1\s*:\s*$)i, # matches "Segments 1"
# Is there some function that we want to call to transmogrify
# the RealAudio link? If so, list one here.
# ralinktrans => \&some_code_ref,
},
cartalk => {
url => "http://cartalk.com/Radio/Show/online.html",
ralinkre => qr(Segments\s+1\s*-\s*\d+)i, # matches "Segments 1 - 10"
},
waitwait => {
url => "http://www.npr.org/programs/waitwait/",
ralinkre => qr(Listen to the show)i,
# Is there some function that we want to call to transmogrify
# the RealAudio link? If so, list one here.
ralinktrans => \&npr_js_link_trans,
},
atc => { # all things considered
url => "http://www.npr.org/programs/atc/",
ralinkre => qr(Listen to ${day_of_week_re}'s show)i,
# Is there some function that we want to call to transmogrify
# the RealAudio link? If so, list one here.
ralinktrans => \&npr_js_link_trans,
},
morning => { # morning edition
url => "http://www.npr.org/programs/morning/",
ralinkre => qr(Listen to ${day_of_week_re}\'s show)i,
# Is there some function that we want to call to transmogrify
# the RealAudio link? If so, list one here.
ralinktrans => \&npr_js_link_trans,
},
npr5sum => { # Handy 5 minute summary of news
url => "http://www.npr.org/",
ralinkre => qr(Hourly Newscast)i,
# Is there some function that we want to call to transmogrify
# the RealAudio link? If so, list one here.
ralinktrans => \&npr_js_link_trans,
},
fresh_air => {
url => "http://freshair.npr.org/",
ralinkre => qr(Listen to ${day_of_week_re}\'s show)i,
# Is there some function that we want to call to transmogrify
# the RealAudio link? If so, list one here.
ralinktrans => \&npr_js_link_trans,
},
wesat => { # weekend edition saturday
url => "http://www.npr.org/programs/wesat/",
ralinkre => qr(Listen to ${day_of_week_re}\'s show)i,
# Is there some function that we want to call to transmogrify
# the RealAudio link? If so, list one here.
ralinktrans => \&npr_js_link_trans,
},
wesun => { # weekend edition sunday
url => "http://www.npr.org/programs/wesat/",
ralinkre => qr(Listen to ${day_of_week_re}\'s show)i,
# Is there some function that we want to call to transmogrify
# the RealAudio link? If so, list one here.
ralinktrans => \&npr_js_link_trans,
},
totn => { # talk of the nation
url => "http://www.npr.org/programs/totn/",
ralinkre => qr(Listen to ${day_of_week_re}\'s show)i,
# Is there some function that we want to call to transmogrify
# the RealAudio link? If so, list one here.
ralinktrans => \&npr_js_link_trans,
},
day => {
url => "http://www.npr.org/programs/day/",
ralinkre => qr(Listen to ${day_of_week_re}\'s show)i,
# Is there some function that we want to call to transmogrify
# the RealAudio link? If so, list one here.
ralinktrans => \&npr_js_link_trans,
},
marketplace => {
url => "http://www.marketplace.org/",
ralinkre => qr(Listen to P.M. show)i,
},
herenow => { # here and now
url =>
sub {
# we need a url like this:
# http://www.here-now.org/shows/2005/02/20050203.asp
my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime;
$year += 1900;
sprintf("http://www.here-now.org/shows/%d/%.2d/%d%.2d%.2d.asp",
$year, $mon+1, $year, $mon+1, $mday);
},
ralinkre => qr(Listen to the show)i,
},
prairie => {
url =>
sub {
# we want a url like this
# http://prairiehome.publicradio.org/programs/2005/02/05/
#
# Let's just assume that we're always looking to listen to last
# Saturday's show. If you are running this script on a Saturday
# night and expecting to get today's show, you really need to
# get a life.
my $saturday_dow = 6; # 6 = Saturday
my @today = Today();
my $current_dow = Day_of_Week(@today);
my $delta = (($current_dow == 7) ? 1 : ($current_dow +1));
my @prev_saturday = (Add_Delta_Days(@today, (-1 * $delta)),0,0,0);
my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) =
localtime(Date_to_Time(@prev_saturday));
sprintf("http://prairiehome.publicradio.org/programs/%d/%.2d/%.2d/",
$year + 1900, $mon+1, $mday+1);
},
ralinkre => qr(Listen to the whole show)i,
ralinktrans =>
sub {
# this needs to be fixed, because technically what I am doing
# here isn't correct.
my ($relurl) = @_;
my $result = "http://prairiehome.publicradio.org" . $relurl;
if ($DEBUG) {
print "Transforming link from: $relurl\n";
print " to: $result\n";
}
$result;
},
},
);
my %opts = (
# kindof like perl/sed's -n flag
# if -n is specified, then audio won't be sent to the dsp device.
n => 0,
);
my $tempdir="/tmp/npr2mp3.$$";
my $ratmpfile = "$tempdir/ratmp";
###########################################################################
###########################################################################
###########################################################################
# given a $html document, this routine extracts the $nth link
# identified with $linkre
sub extract_link($$$) {
my($html, $linkre, $nth) = @_;
my $p = HTML::TokeParser->new(\$html);
while (my $token = $p->get_tag("a")) {
my $url = $token->[1]{href} || "-";
my $text = $p->get_trimmed_text("/a");
if ($text =~ /$url_info{$show}{ralinkre}/ && (--$nth <= 0)) {
return $url;
}
}
return undef;
}
###########################################################################
sub get_show_url {
my ($u) = @_;
my $result = undef;
if (! ref($u)) {
$result = $u;
}
elsif (ref($u) eq "CODE") {
$result = &$u;
}
# print "get_show_url returns :$result:\n";
$result;
}
###########################################################################
sub kill_kill_kill(@) {
# One of these days...
# -- Pink Floyd
foreach my $pid (@_) {
kill(0, $pid) && kill(&POSIX::SIGHUP, $pid) && sleep(1);
kill(0, $pid) && kill(&POSIX::TERM, $pid) && sleep(1);
kill(0, $pid) && kill(9, $pid);
}
}
##########################################################################
sub start_ra() {
# kill off any RealPlayer application that is already running
#
# this is kindof gross
system("kill `ps -elf | egrep '[r]ealplay' | awk '{print \$4}'` >/dev/null 2>&1");
start_in_background("vsound -f $tempdir/$show.wav realplay --quit $ratmpfile");
}
##########################################################################
sub start_in_background($) {
my ($cmd) = @_;
my ($pid, $fd);
my @cmd = split(/[ \t\n]/, $cmd); # no point in using $IFS
if (!defined($pid = fork())) {
die "cannot fork: $!";
} elsif ($pid == 0) {
# child
$fd = POSIX::open("/dev/null", &POSIX::O_RDONLY)
|| die "Can't open stdin /dev/null: $!\n";
dup2($fd, 0);
$fd = POSIX::open("/dev/null", &POSIX::O_WRONLY | &POSIX::O_CREAT)
|| die "Can't open stdout /dev/null: $!\n";
dup2 $fd, 1;
# leave stderr alone in case Something Bad happens
exec(@cmd);
die "can't exec '$cmd': $!";
} else {
# parent
}
return $pid;
}
###########################################################################
sub npr_js_link_trans($) {
my ($nprjslink) = @_;
# we want to transform something like this:
# javascript:getStaticMedia('/waitwait/20050108_waitwait','RM,WM')"
# into:
# http://www.npr.org/dmg/dmg.php?mediaURL=/waitwait/20050108_waitwait&mediaType=RM
# also, change:
# javascript:getMedia('ATC','13-Jan-2005','all','WM,RM');
# getMedia(prgCode, showDate, segNum, mediaPreference)
# to
# javascript:getMedia('ATC','13-Jan-2005','all','WM,RM');
$nprjslink =~ s{.*getStaticMedia\('(.*?)'.*}
{http://www.npr.org/dmg/dmg.php?mediaURL=$1&mediaType=RM}x;
$nprjslink =~ s{.*getMedia\('(.*?)'\s*,\s* # "prgCode"
'(.*?)'\s*,\s* # "showDate"
'(.*?)'\s*,\s* # "all"
.*}
{http://www.npr.org/dmg/dmg.php?prgCode=$1&showDate=$2&segNum=&mediaPref=RM&getUnderwriting=1}x;
# /dmg/dmg.php?prgCode=ATC&showDate=13-Jan-2005&segNum=&mediaPref=RM&getUnderwriting=1
# javascript:getMedia('ATC','13-Jan-2005','all','WM,RM');
# prgCode, showDate, segNum, mediaPreference
# "http://www.npr.org/dmg/dmg.php?prgCode=" + prgCode + "&showDate=" + showDate + "&segNum=" + segNum + "&mediaPref=RM", "", "")
# goNewURL("http://www.npr.org/dmg/dmg.php?prgCode=" + prgCode + "&showDate=" + showDate + "&segNum=" + segNum + "&mediaPref=RM", "", "");
# http://www.npr.org/dmg/dmg.php?mediaURL=$1&mediaType=RM}x;
# also, change:
#
# javascript:getNewsCast();
#
# to:
#
# http://www.npr.org/dmg/dmg.php?mediaURL=http://www.npr.org/dmg/dmg.php?getNewsCast=true&mediaType=RM
$nprjslink =~ s{.*getNewsCast\(\).*}
{http://www.npr.org/dmg/dmg.php?getNewsCast=true&NPRMediaPref=RM};
$nprjslink;
}
#
# MAIN ROUTINE
#
getopts('n', \%opts);
$show = shift || "cartalk";
die "Unknown show: $show\n" if (! defined($url_info{$show}));
print "We're getting the audio for this show: $show\n";
my $show_url = get_show_url($url_info{$show}{url});
my $ua = LWP::UserAgent->new;
my $ret = $ua->get($show_url);
mkdir("$tempdir") || die "Unable to mkdir $tempdir: $!\n";
#
# Get the show's main HTML page
#
die "Unable to get $show url: ".$show_url.
": ".$ret->status_line."\n"
if (! $ret->is_success);
my $ralink = extract_link($ret->content, $url_info{$show}{ralink}, 1);
{
no warnings;
die "Unable to find link for $show\n" if ($ralink eq undef);
}
print "Link is '$ralink'\n" if ($DEBUG);
# possibly transform the link
if (defined($url_info{$show}{ralinktrans})) {
$ralink = &{$url_info{$show}{ralinktrans}}($ralink);
print "Link is changed to '$ralink'\n" if ($DEBUG);
}
#
# Get the RealAudio thingie
#
my $rareq = HTTP::Request->new('GET', $ralink);
$ret = $ua->request($rareq, $ratmpfile);
die "Unable to get RealAudio url: ".$ralink.
": ".$ret->status_line."\n"
if (! $ret->is_success);
#
# Run RealAudio application, capturing audio in the background
#
my $rapid = start_ra();
print "Running realplay....\n";
waitpid($rapid, 0);
print "...done\n";
# gross -- I need to research this more
print "Converting streams to mp3 with notlame'...\n";
my $notlamecmd = "notlame $tempdir/$show.wav $tempdir/$show.mp3";
system("echo $notlamecmd");
system( $notlamecmd);
# || die "Problem running notlame!\n";
if (!$DEBUG) {
system("rm -f $tempdir/*.wav $tempdir/ratmp");
}
print "Done. Final file is $tempdir/$show.mp3\n";
More information about the Nh-pm
mailing list