what are the binary characters of ASCII?

Steve Lane sml at zfx.com
Thu Sep 13 12:26:31 CDT 2001


from `perldoc perlfunc`:

    The `-T' and `-B' switches work as follows.  The
    first block or so of the file is examined for odd
    characters such as strange control codes or
    characters with the high bit set.  If too many
    strange characters (>30%) are found, it's a `-B'
    file, otherwise it's a `-T' file.

i'd like to know what is the set of ascii characters
that perl considers "binary".  i haven't been able
to find anything online with a quick search.  this
is my current best guess:

print if /[\000-\010\016-\031\200-\377]/;

can anyone give a better set of "binary"
ASCII characters?

the reason i want to know this is: i'm parsing some
large logfiles, and a small fraction of the lines in
the files have "binary" characters, and i want to
reject those lines as junk.  so i'm looking for
a suitable regex that will match such lines, which
means i need a character class that matches all
"binary" characters and no "text" characters,
while understanding whether a character is "binary"
may be in the eye of the beholder.
--
Steve Lane <sml at zfx.com>
http://knoxville.pm.org/




More information about the Knoxville-pm mailing list