[Kc] converting ms-dos file names to href equivalents

Joe Oppegaard joe at radiojoe.org
Sun Dec 14 07:20:26 CST 2003


On Sat, 13 Dec 2003, Tom Miller wrote:

<-snip->
> The code I am showing is adapted from the code that is converting
> things like tabs into the html equivalent in a perl script called:
> txt2html that a guy who speaks French wrote.  I am aware the
> replacement string is wildly wrong.  So here is my 1st approximation
> of the code:
>
> $TXT =~ s/........\.[zip|ZIP]/<a
> href="........\.zip">........\.zip</a>/g;
>
> Since I don't want to match against anywhite space how about this?
>
> $TXT =~ s/[\d|\w](8)\.[zip|ZIP]/<a
> href="........\.zip">........\.zip</a>/g;
>
> According to the book I am mumbling arround in:  \d is the range of
> numbers, \w is all alphabetic characters, | lets you put two groups
> together so: [\d|\w] should allow all legal ms-dos file name
> characters.  [\d|\w](8) is supposed to find 8 legal ms-dos file
> characters in a row.  [\d|\w](8)\.[zip|ZIP] should find any legal
> ms-dos file name that ends in zip or ZIP?
>
> What (if anything) am I doing wrong on the search string?
>

The way you are using the character classes (the [ ] brackets) is wrong.
A character class means that it will match any characters inside the
brackets. (You can also do ranges like [a-z] will match any characters
between a and z). A character class of [zip|ZIP] will actually match a
single character that is either an upper or lower case z i or p or a
pipe.

Also, parens are used for remembering text that matched. You meant to
use curly braces around the 8. Curly braces are what makes it so a regex
element is able to match the number inside the braces. Just a lone 8
means it will have to match whatever 8 times. To match 1-8 times you
need a comma, see below.

You probably meant:
    /^              # Start of the string ...
      [\w\d]        # A word or digit character
      {1,8}         # 1 to 8 times
      \.            # A literal dot
      zip
    $/ix            # End of string anchor
                    # And case insensitive search

> Once the search string is right, I want to move onto the harder
> question of how to I get this thing to replace the file name with an
> href to that file name.
>
> Once I get past these questions, I have questions about trying to add
> file information (eg. size, date/time created) to this conversion).
> But right now, I want to struggle with this level of the code.
>

>From your question I didn't really understand what you wanted the output
to be. I assume one big html file with links to all the files.

Something like this is what you're looking for I think, it
takes a list of all the filenames (produced by find), loops through them
and prints out a link.

[joe at chatnoir joe]$ find /Users/joe | perl -e 'print "<html>\n"; while(<>){next unless $_ =~ m#^(.*/[\w\d]{1,8}\.[\w\d]{1,3})#; print qq#<a href="$1">$1</a>\n#;} print "</html>\n"

As for adding file information check out the documentation for `stat'
    $ perldoc -f stat

        -Joe Oppegaard



More information about the kc mailing list