SPUG: Reading a whole file into a scalar.

Adam Monsen haircut at gmail.com
Wed Jul 20 22:29:59 PDT 2005


Not sure if what you mentioned is "best practice".

Here's the obligatory one-liner:

$ perl -0ne 'print "MATCH: $1\n" if m/(line.*that)/s' poo.html
MATCH: line
that

The -0 switch--documented in perldoc perlrun--specifies the input
record separator. I didn't give it any digits, so the input record
separator is the null character. Since there aren't any null
characters in the file, the whole thing is sucked into $_. -n and -e
are also documented in perldoc perlrun.

But... I'm really an IO::All fan.

#!/usr/bin/perl -w
use strict;
use IO::All;
my $contents = io('poo.html')->slurp;
if ($contents =~ m/(line.*that)/s) {
  print "MATCH! ... $1\n"
}

I'm assuming that poo.html contains the XML-like markup example in
your original post. 'poo.html' could be changed to a URL
(http://example.com/poo.html) and IO:All would just do the right
thing. Hm, well, let's just try it.

#!/usr/bin/perl -w
use strict;
use IO::All;
my $contents = io('http://rafb.net/paste/results/GHgbSG94.txt')->slurp;
if ($contents =~ m/(line.*that)/s) {
  print "MATCH! ... $1\n"
}

You'll need IO::All::LWP installed for that one to work.

When doing multiline matches, the only alternative to slurping in the
file that I can think of is to make a mini state machine: when 'line'
is found, switch states until 'that' is found. Good luck matching
across hash or array elements.

Other notes:
* You're right about the 's'. The 's' modifier to the regular
expression makes the . match everything, including newlines. The docs
(perldoc perlop) say 's' causes the regex engine to "treat the entire
string as one line", and in this case it means we can do a multiline
match.
* check out http://perlmonks.org ... I learned a ton from this site.
Plus, it's fun. If you want to search it, use the crawler-friendly
version. in Google, search for: "site:perlmonks.thepen.com keywords"
(substituting "keywords" for your search terms, of course). Some Perl
heavyweights frequent this site.
* are you parsing XML/HTML/etc. markup? If so, there are a ton of
modules to make your life easier.
* always use warnings, always use strict.
* here's an article on IO::All ...
http://www.perl.com/pub/a/2004/03/12/ioall.html

hope this helps,
-Adam

On 7/20/05, Duane Blanchard <dblanchard at gmail.com> wrote:
> I just found this, is this the best practice?
> 
> while ( <COLOURS> )
> {
>   $myfile = $myfile . $_;
> }
> 
> Duane
> 
> On 7/20/05, Duane Blanchard <dblanchard at gmail.com> wrote:
> > Hi gang,
> >
> > I'm too tired to think straight and too tired to keep looking on the
> > 'Net. I want to match things like 'line\s+that' in the example file
> > below.
> >
> > <file>
> > this is a line
> > that is a line
> > </file>
> >
> > What has worn me out today is not realizing that I'll never match
> > across lines of a file if I only read one line at a time. So, I either
> > need a clever way to match across elements of an array or hash table,
> > or (more likely) to read the whole file into a scalar. As I recall,
> > I'll use the 'm' flag to hand the RE more than one line, and '\s'
> > should handle '\n'.
> >
> > Someone, please give a little pointer. Thanks,
> >
> > D
> > --
> > Duane Blanchard
> > 206.934.5873
> >
> > There are 10 kinds of people in the world;
> > those who know binary and those who don't.
> > _____________________________________________________________
> > Seattle Perl Users Group Mailing List
> >     POST TO: spug-list at pm.org
> > SUBSCRIPTION: http://mail.pm.org/mailman/listinfo/spug-list
> >    MEETINGS: 3rd Tuesdays, Location: Amazon.com Pac-Med
> >    WEB PAGE: http://seattleperl.org/
> >
> 
> 
> --
> Duane Blanchard
> 206.934.5873
> 
> There are 10 kinds of people in the world;
> those who know binary and those who don't.
> _____________________________________________________________
> Seattle Perl Users Group Mailing List
>      POST TO: spug-list at pm.org
> SUBSCRIPTION: http://mail.pm.org/mailman/listinfo/spug-list
>     MEETINGS: 3rd Tuesdays, Location: Amazon.com Pac-Med
>     WEB PAGE: http://seattleperl.org/
> 


-- 
Adam Monsen
http://adammonsen.com/


More information about the spug-list mailing list