[pgh-pm] system() not displaying scalar text

Benjamin R. Haskell pm at benizi.com
Tue Sep 4 13:29:09 PDT 2007


On Tue, 4 Sep 2007, Matthew T. Engel wrote:

> I think the problem is that I am trying to use the system function to echo
> text to the terminal.  Something like:

First off, in your example, there's no need to 'cat' the files into your 
script. You can just pass them as extra parameters, since you're using 
'<>', which uses '@ARGV' as its source for filenames to open.


Secondly, why are you using:

system("echo $_");

instead of:

print "$_\n"; # or print $_, $/; # or print; # with the '-l' switch ;)


If all you really want to do is echo data to the terminal, that's what you 
should do.

> #!/usr/bin/perl
>
> while(<>)
>
> {
>
>        chomp;
>
>        system("echo $_");
>
> }
>
>
>
> If I run the script via
>
>
>
> $ cat ascii_text_file | ./above_script.pl    Everything works fine where it
> essentially cats the input file.  However, if I do a $ cat unicode_text_file
> | ./above_script.pl. I get blank lines where the echo'd data should be.
>
>
>
> I think the second file is Unicode because doing a $od -c Unicode_text file,
> shows /0 in front of all the characters, and if I vi the same file it shows
> ^@ before every character.
>
>
>
> I would like to be able to use Unicode and ascii text files
> interchangeabley. please advice. Thank you very much in advance.

If you're worried about being able to use both "ASCII" (as in 
"7-bit ASCII") files and Unicode (as in "UTF-8" [or "UTF-16" == UCS-2]), 
you shouldn't have any problems whatsoever, if you just 'print' it to the 
terminal.

If you're worried about interoperability between ASCII (as in "ISO-8859-*" 
[*=1,2,etc.]), then you'll *have* to do something more complicated. 
In order for perl to deal with ISO-8859-*, if it uses anything outside of 
7-bit ASCII, you must let it know how to interpret the byte stream. (or, 
my recommendation: *convert it*.)

>From your description, it sounds like your unicode_text_file is in UTF-16 
(== UCS-2?), where each character is two bytes. Usually, to distinguish 
UTF-16-LE from -BE (little-/big-endian), there's a BOM (byte-order mark) 
of 0xfeff at the start of the file. I think 'iconv', a very useful program 
for converting character sets will handle that.

Other useful resources might be to 'Super Search' on Perlmonks for 
'Unicode' and UTF-8 or UTF-16.

Best,
Ben


More information about the pgh-pm mailing list