Phoenix.pm: Re: SNOBOL

Fri Apr 23 13:29:11 CDT 2004

On  0, Scott Walters <scott at illogics.org> wrote:
> Sure =)
> 
> A simple loop...
> 
>         LOOPS = 10000
>         &STLIMIT = LOOPS * 2 + 10
>         A = 0
> LOOP    A = A + 1
>         LT(A,LOOPS)             :S(LOOP)
> END

My memory is fuzzy... but here goes.

Labels start in column 0. There are really only two columns and the other one isn't
fixed. This is typical of assembler but assemblers tend to have 3 columns ;)
SNOBOL does do subroutines and automatic (stack) variables but I don't think
any of these examples do that. Branching is popular in SNOBOL, though. Each
construct on succed or fail. LOOPS = 10000 is a simple assignment. Like Perl
package variables ("globals"), variables string into existance by their mere
mention. So we don't have to declare LOOPS first. &STRLIMIT is one of many 
special variables and sets the max size of strings for the purpose of detecting
errors. A is set to 0 too. 

> LOOP    A = A + 1

This just increments A but without the benefit of the C ++ operator. LOOP is a
label. Labels are branch targets.

>         LT(A,LOOPS)             :S(LOOP)

I said that everything can succed or fail. Well, this is the primary way of 
doing conditionals. Pattern matches are conditionals. If this succeds 
(A is less than LOOPS) then we branch back to the LOOP label. :S(TARGET)
branches on success, :F(TARGET) on fail, and :(TARGET) conditionally.
You can have both a :S and :F target on the same line - that line will always
branch, it'll just branch to a different location depending ;) Each kind
of statement definds succed/fail differently. This is kind of assemblyish too.

In Perl...

       $loops = 10000;
       $a = 0;
loop:  $a++;
       goto loop if $a < $loops;
       exit;

> Word counting program...
> 
> *   WORDS.SNO -- word counting program
> *
> *       Sample program from Chapter 6 of the Tutorial
> *
> *   A word is defined to be a contiguous run of letters,
> *   digits, apostrophe and hyphen.  This definition of
> *   legal letters in a word can be altered for specialized
> *   text.
> *
> *   If the file to be counted is TEXT.IN, run this program
> *   by typing:
> *       B>SNOBOL4 WORDS /I=TEXT
> *
>         &TRIM   =  1
>         UCASE   = "ABCDEFGHIJLKMNOPQRSTUVWXYZ"
>         LCASE   = "abcdefghijlkmnopqrstuvwxyz"
>         WORD    =  "'-"  '0123456789' UCASE LCASE
>         BP = BREAK(WORD)
>         SP = SPAN(WORD)
>         WPAT    =  BREAK(WORD) SPAN(WORD)
> 
> NEXTL   LINE    =  INPUT                        :F(DONE)
> *       OUTPUT  = '>' LINE
> NEXTW
> *       LINE WPAT =                             :F(NEXTL)
>         LINE BP =
>         LINE SP =                               :F(NEXTL)
> *       OUTPUT  = '>>' LINE
>         N       =  N + 1                        :(NEXTW)
> 
> DONE    OUTPUT =  +N ' words'
> END

         $ucase = 'ABCDEFGHIJLKMNOPQRSTUVWXYZ';
         $lcase = 'abcdefghijlkmnopqrstuvwxyz';
         $word  = join '', "'-", '0123456789', $ucase, $lcase;
         $word  = join '|', split //, $word;  # SNOBOL does essentially this automatically
         $word  = qr/$word/;                  # ditto
         $bp    = qr/.*?(?=$word)/;           # yuck =P match non-greedy until we would match $word
         $sp    = qr/$word+/;
         $wpat  = qr/$bp$sp/;
nextl:   $line  = <STDIN>                     or goto done;
#        print '>', $line;
nextw:   
#        $line =~ s/^$wpat//                  or goto nextl;
         $line =~ s/^$bp//;
         $line =~ s/^$sp//                    or goto nextl;
#        print '>>', $line;
         $n++;                                goto nextw;

done:    print "$n words\n";
         exit;

The default thing to do in SNOBOL is pattern match. The default thing
to do on the right side of an assignment is concatonate. 

>         WORD    =  "'-"  '0123456789' UCASE LCASE

This concats the 4 strings together. Like Perl, SNOBOL can quote with
either "" or '', but SNOBOL doesn't do variable interpolation. Not having
to type the . out, this really doesn't matter though ;)

>         BP = BREAK(WORD)
>         SP = SPAN(WORD)

BREAK and SPAN are opposites. SPAN returns a pattern that matches for as long
as its input pattern or string matches. It's positive. BREAK is negative.
It matches everything up to the pattern or string it took as an argument.
That it, is matches everything but its argument.

>         LINE BP =

Assigning to a pattern match... well, it's a lot more mnemonic than what Perl 
wants you to do ;)

>         N       =  N + 1                        :(NEXTW)

This is an unconditional goto. 

Note that input and output happen by reading from and writing to
special variables.

Also note how SNOBOL makes building up patterns easy. User defined
code can be used as part of a pattern too, and variable assignments
can be done in middle of a pattern with the $ operator. FENCE
prevents a pattern match from backing up past a point (if it made it
this far, it must either finish or else fail), constraint that it be
on a certain column. | takes two patterns and returns a pattern that
will match either. ARB matches any character. ARBNUM matches an arbitrary
number of any other pattern, sort of like * in regex. FAIL matches
nothing ;)  FAIL is extremely useful for forcing the pattern matcher
to try all posibilities, which is useful for finding a best match
rather than a first match. Some logic wired into the match could
be saving the value that matched at some portion of it. This would
be used to find the longest word in a string, for example. TAB
will match everything up to a column, RTAB everything up to a column
meassured relative the right.

SNOBOL has an EVAL to evaluate more SNOBOL source code. It does
complex data structures, such as hashes of multidim arrays of patterns.
It can store function pointers (one of these days I'd like to hack
on closures...)

I'm half way buttering SNOBOL up and half way just drawing parallels
to Perl here. I don't mean to say that it is better because it has
certain features Perl has. That would be ignorant ;) But the simple
syntax and user extensible, powerful pattern matching is *fun*.

> 
> 
> And some sample pattern matching... pattern matching is 
> SNOBOL's "thing"...
> 
> RSI_2   SS   '<' ARB . NM '>::='  =
>         RSENT_TBL<NM>  =  '|'  SS
>         IDENT(S,'END')                          :S(RSENTENCE_END)
>         SS  =  S                                :(RSI_1)

This was taken out of context... I just wanted some meaty pattern
matching action, so this Perl won't do anything useful.

rsi_2:   $ss =~ s/<(.*)(?{ $nm = $1 })>::='//;
         $rsent_tbl[$nm] = '|' . $ss;
         goto rsentence_end if $s eq 'END';
         $ss = $s;

> 
> Sorry this isn't very coherent... I'll put together something 
> better later ;)

And this would probably be that ;)

The Perl versions are litteral translations. They could be made shorter,
especially since SNOBOL is restrictive about what constitutes a line.

The word matching thing could be shorted in SNOBOL quite a bit, probably
down to 3 or 4 lines, but I'm pretty sure that Perl could do it in one.
Still, SNOBOL *is* expressive and pretty clean, atleast in some ways ;)

-scott

> 
> -scott

> 
> 
> On  0, Eden Li <Eden.Li at asu.edu> wrote:
> > 
> > I've never seen SNOBOL code.  Can you grace the list with a few lines?
> > 
> > On Wed, 21 Apr 2004, Scott Walters wrote:
> > > SNOBOL would be dead if Philip Budne hadn't rewritten the 360 bits
> > > in portable C 7 years back, but now, fools like me run it ;)
> >