what are the binary characters of ASCII?

Martin, David MARTINDT at eetcorp.com
Thu Sep 13 12:59:21 CDT 2001


it would be helpful to know if any of the things that are making perl think
this is a binary file are in the high part of the ASCII character set.  I
had a telephone usage log to parse that was presumed binary because of ASCII
HUL (value '\0') characters at the end of the lines.  (I discovered this by
dumping the first page or so of it with hexdump) If this is the only thing
going on, you might be able to fix the files on the fly or pre-emptively by
trimming out the NULs.

David Martin
EET Corp
martindt at eetcorp.com
865.671.7800
www.eetcorp.com

-- To the tune of "Yellow Submarine"
"In the town where I was born
lived a man who wrote in C
And he told us of his life
in the land of subroutines..."


> -----Original Message-----
> From: Steve Lane [mailto:sml at zfx.com]
> Sent: Thursday, September 13, 2001 1:27 PM
> To: knoxville-pm-list at pm.org
> Subject: what are the binary characters of ASCII?
> 
> 
> from `perldoc perlfunc`:
> 
>     The `-T' and `-B' switches work as follows.  The
>     first block or so of the file is examined for odd
>     characters such as strange control codes or
>     characters with the high bit set.  If too many
>     strange characters (>30%) are found, it's a `-B'
>     file, otherwise it's a `-T' file.
> 
> i'd like to know what is the set of ascii characters
> that perl considers "binary".  i haven't been able
> to find anything online with a quick search.  this
> is my current best guess:
> 
> print if /[\000-\010\016-\031\200-\377]/;
> 
> can anyone give a better set of "binary"
> ASCII characters?
> 
> the reason i want to know this is: i'm parsing some
> large logfiles, and a small fraction of the lines in
> the files have "binary" characters, and i want to
> reject those lines as junk.  so i'm looking for
> a suitable regex that will match such lines, which
> means i need a character class that matches all
> "binary" characters and no "text" characters,
> while understanding whether a character is "binary"
> may be in the eye of the beholder.
> --
> Steve Lane <sml at zfx.com>
> http://knoxville.pm.org/
> 
http://knoxville.pm.org/




More information about the Knoxville-pm mailing list