what are the binary characters of ASCII?

Steve Lane sml at zfx.com
Thu Sep 13 20:15:03 CDT 2001


> > -----Original Message-----
> > From: Steve Lane [mailto:sml at zfx.com]
> > Sent: Thursday, September 13, 2001 1:27 PM
> > To: knoxville-pm-list at pm.org
> > Subject: what are the binary characters of ASCII?
> >
> >
> > from `perldoc perlfunc`:
> >
> >     The `-T' and `-B' switches work as follows.  The
> >     first block or so of the file is examined for odd
> >     characters such as strange control codes or
> >     characters with the high bit set.  If too many
> >     strange characters (>30%) are found, it's a `-B'
> >     file, otherwise it's a `-T' file.
> >
> > i'd like to know what is the set of ascii characters
> > that perl considers "binary".  i haven't been able
> > to find anything online with a quick search.  this
> > is my current best guess:
> >
> > print if /[\000-\010\016-\031\200-\377]/;
> >
> > can anyone give a better set of "binary"
> > ASCII characters?
> >
> > the reason i want to know this is: i'm parsing some
> > large logfiles, and a small fraction of the lines in
> > the files have "binary" characters, and i want to
> > reject those lines as junk.  so i'm looking for
> > a suitable regex that will match such lines, which
> > means i need a character class that matches all
> > "binary" characters and no "text" characters,
> > while understanding whether a character is "binary"
> > may be in the eye of the beholder.


"Martin, David" wrote:
> it would be helpful to know if any of the things that are making perl think
> this is a binary file are in the high part of the ASCII character set.  I
> had a telephone usage log to parse that was presumed binary because of ASCII
> HUL (value '\0') characters at the end of the lines.  (I discovered this by
> dumping the first page or so of it with hexdump) If this is the only thing
> going on, you might be able to fix the files on the fly or pre-emptively by
> trimming out the NULs.

i'm not trying the fix the file.  i just want to
know what Perl's "binary" character set is defined
to be, so i can replicate the behaviour of the -B
test on a string, rather than having to write the
string out to a temporary file and then using -B on
the temporary file.
--
Steve Lane <sml at zfx.com>
http://knoxville.pm.org/




More information about the Knoxville-pm mailing list