[Neworleans-pm] Fwd: Solutions and Discussion for Perl Quiz of the Week #19 (Expert Edition)

Wed Jul 21 08:50:36 CDT 2004

=====
http://www.brettsbsd.net/~estrabd

__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

----- Original message -----
From: "Jerrad Pierce" <belg4mit at MIT.EDU>, belg4mit at MIT.EDU
To: perl-qotw at plover.com
Date: Tue, 20 Jul 2004 17:42:34 -0400
Subject: Solutions and Discussion for Perl Quiz of the Week #19 (Expert
Edition)

Sample solutions and discussion
Perl Expert Quiz of The Week #19 (20040712)

        banner(1) and cursive(1) are old UNIX utilities that take
        ASCII input (0-127) and render the input in a hard-coded font
        like the examples below. Later, FIGlet was created to use a
        plain text format font file to allow a wider variety of output
        styles.

          % banner japh

                 #    #    ######  #     #
                 #   # #   #     # #     #
                 #  #   #  #     # #     #
                 # #     # ######  #######
           #     # ####### #       #     #
           #     # #     # #       #     #
            #####  #     # #       #     #

           % cursive japh

                           /
               o __.  _   /_
              /_(_/|_/_)_/ /_
             /      /
           -'      '

        Write a program to do "optical" character recognition of
        ASCII-Art text.  Given an input string provided on STDIN print
        the text represented by the ASCII-art to STDOUT.

        There are three milestones for this puzzle.

        1) Process any output produced by banner(1) and print the
           result to STDOUT.  Your program should obviously print
           "JAPH" or "japh" at your discretion, if given the output
           from banner above; but not "jAph".

           If you do not have banner(1) or your banner prints
           vertically instead of horizontally use the banner font for
           FIGlet.

           Both the font and FIGlet itself are available at
             http://figlet.org/
           OR
             ftp://pthbb.org/mirror-mirror/figlet/

           Instead, you could also use one of the many web interfaces
           for generating FIGlet output such as
           http://wfiglet.handalak.com/

        For the next two milestones you need to download some FIGlet
        fonts available at 
          ftp://ftp.figlet.org/pub/figlet/fonts/contributed.tar.gz
        OR
          ftp://pthbb.org/mirror-mirror/figlet/fonts/contributed.tar.gz

        The FIGlet font file format is rather intuitive but described
        at length in the FIGlet man page widely available online
        including
          http://www.redstone.army.mil/documents/figlet-2.1.1.man.html

        A tarball with a special issue of the Text::FIGlet module is
        available at

             ftp://pthbb.org/pub/misc/qotw.tgz
        It includes perl versions bof figlet and banner, and the
        figlet man page

        2) Process any output produced by FIGlet with the basic and
           drpepper fonts.  Your output should be as before.

           In addition to the text to process from STDIN you must
           accept an argument -f=<font> specifying the path to the
           font the input was created with.

           % figlet -f basic -A basic
              d88b  .d8b.  d8888b. db   db
              `8P' d8' `8b 88  `8D 88   88
               88  88ooo88 88oodD' 88ooo88
               88  88~~~88 88~~~   88~~~88
           db. 88  88   88 88      88   88
           Y8888P  YP   YP 88      YP   YP

           % !! | yourprogram.pl -f=figlib/basic.flf
           JAPH
           % figlet -f drpepper -A japh
             _            _
            <_> ___  ___ | |_
            | |<_> || . \| . |
            | |<___||  _/|_|_|
           <__'     |_|
           % !! | yourprogram.pl -f=figlib/drpepper.flf
           japh

        3) Process any output produced by FIGlet with the argument
           -m-1 and a font from the library listed above. You should
           accept an argument -d=<dir> which specifies the location of
           the font library. Programatically determine the font used
           to render the provided input and process accordingly. Your
           output should be as before with the name of the font chosen
           printed to STDERR before the results are printed to STDOUT.

                                               /
                   #                         #/
                  ###                        ##
                   #                         ##
                                             ##
                 ###      /###       /###    ##  /##
                  ###    / ###  /   / ###  / ## / ###
                   ##   /   ###/   /   ###/  ##/   ###
                   /   ##    ##   ##    ##   ##     ##
                  /    ##    ##   ##    ##   ##     ##
                 ###   ##    ##   ##    ##   ##     ##
                  ###  ##    ##   ##    ##   ##     ##
                   ### ##    /#   ##    ##   ##     ##
                    ### ####/ ##  #######    ##     ##
                     ##  ###   ## ######      ##    ##
                     ##           ##                /
                     /            ##               /
                    /             ##              /
                   /               ##            /

        PS> For the interminably curious to see why the condition of
            -m-1 is imposed compare the output of the font slscript
            with and wihtout it.

----------------------------------------------------------------

This goal of this weeks quiz was to write a program to perform "OCR" on
ASCII-Art and determine what text the image represents. For details see
the original lengthy problem statement at
http://perl.plover.com/qotw/e/019

The quiz had three milestones, and the four particpants' entries were as
follows

Milestone 1 (M1):
   Jurgen Pletinckx         
   http://perl.plover.com/~alias/list.cgi?1:mss:1892
   Patrick LeBoutillier     
   http://perl.plover.com/~alias/list.cgi?1:mss:1888
   Jerrad Pierce     http://perl.plover.com/~alias/list.cgi?1:mss:1878

Milestone 2 (M2):
   Jurgen Pletinckx          Same as above
   Jerrad Pierce     Same as above

Milestone 3 (M3):
   Jurgen Pletinckx          Same as above
   Jerrad Pierce     http://perl.plover.com/~alias/list.cgi?1:mss:1891
   Ronald J Kimball         
   http://perl.plover.com/~alias/list.cgi?1:mss:1889

I expected most people to use the same program for each milestone, or
at least M1 and M2, it seemed easier to develop a script and enhance
it to comply with subsequent requirements.

Not to get ahead of myself, but I think it's interesting, and speaks
to the nature of the challenge that all of the submissions require
perfect input in order to work. My own submission for M3 fails to
recognize "hello" in fraktur for the mismatch of a single "pixel".

All of the programs could be described on some level as "brute force",
though some more than others. Jurgen and Ronald's solutions for M3
process the input with all available fonts, and they have the false
positives of rot13 and term as a result.

There were four approaches to the guts of character matching, substr,
shift, regexp and serialization. Jurgen used substr for each row of
input to extract the section of input corresponding to a figlet
character for comparison.  Patrick used a combination of substr and
serialized figlet characters (a one line representation for easy
storage and retrieval in a lookup table on disk).  Ronald showed his
expert knowledge by using /\G//g against a string of serialized input
that had been effectively rotated the text 90 degrees
clockwise. Jerrad had a similarly interesting strategy of rotating the
input with Text::Orientation 90 degrees clockwise in order to swap
columns for rows; making it easy to shift lines off the resulting
array as they are used in a match.

Jerrad's program also included several optimizations including a
character frequency table and short-circuiting once a sole potential
match remained.  Jurgen's program had an internal requirement that
characters be tested widest first and others used ASCII order. Jerrad
and Jurgen's programs both performed many more tests to accquire a
match due to the placement of tests for equality inside several levels
of loops.

I (Jerrad) had experimented with something more "liberal" that would
have alleviated this and made the engine less finicky about input,
"low-resolution" matching. I checked the 0th, 3rd, and 6th columns for
equality. It worked rather well however h, m and n were often
indistinguishable. That could have been fixed with a progressive scan
scheme but at the expense of readily avoiding an infinite loop for
fonts such as basic where lowercase and uppercase have the same
representation.

Jurgen and Ronald's M3 submissions were the only ones that worked out
of the box. All of the font-matching engines excluded fonts whose
height was not a multiple of the input height; fonts 4 and 8 line high
allowed for 16 line high input, but not 6, 9 or 10 line high
fonts. Jurgen and Ronald both then proceeded to process the text
against the remaining fonts. Jerrad adapted Patrick's fingerprint
cache on disk, eliminating fonts that did not include all of the
characters used in the input; helas this high pay-off optimization is
another example of requiring perfect input.

[ Thanks to Jerrad Pierce for running the QOTW this week.  New quiz
  tomorrow. -MJD ]