[Neworleans-pm] Fwd: Solutions and Discussion for Perl Quiz of the
Week #19 (Expert Edition)
Brett D. Estrade
estrabd at yahoo.com
Wed Jul 21 08:50:36 CDT 2004
=====
http://www.brettsbsd.net/~estrabd
__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com
----- Original message -----
From: "Jerrad Pierce" <belg4mit at MIT.EDU>, belg4mit at MIT.EDU
To: perl-qotw at plover.com
Date: Tue, 20 Jul 2004 17:42:34 -0400
Subject: Solutions and Discussion for Perl Quiz of the Week #19 (Expert
Edition)
Sample solutions and discussion
Perl Expert Quiz of The Week #19 (20040712)
banner(1) and cursive(1) are old UNIX utilities that take
ASCII input (0-127) and render the input in a hard-coded font
like the examples below. Later, FIGlet was created to use a
plain text format font file to allow a wider variety of output
styles.
% banner japh
# # ###### # #
# # # # # # #
# # # # # # #
# # # ###### #######
# # ####### # # #
# # # # # # #
##### # # # # #
% cursive japh
/
o __. _ /_
/_(_/|_/_)_/ /_
/ /
-' '
Write a program to do "optical" character recognition of
ASCII-Art text. Given an input string provided on STDIN print
the text represented by the ASCII-art to STDOUT.
There are three milestones for this puzzle.
1) Process any output produced by banner(1) and print the
result to STDOUT. Your program should obviously print
"JAPH" or "japh" at your discretion, if given the output
from banner above; but not "jAph".
If you do not have banner(1) or your banner prints
vertically instead of horizontally use the banner font for
FIGlet.
Both the font and FIGlet itself are available at
http://figlet.org/
OR
ftp://pthbb.org/mirror-mirror/figlet/
Instead, you could also use one of the many web interfaces
for generating FIGlet output such as
http://wfiglet.handalak.com/
For the next two milestones you need to download some FIGlet
fonts available at
ftp://ftp.figlet.org/pub/figlet/fonts/contributed.tar.gz
OR
ftp://pthbb.org/mirror-mirror/figlet/fonts/contributed.tar.gz
The FIGlet font file format is rather intuitive but described
at length in the FIGlet man page widely available online
including
http://www.redstone.army.mil/documents/figlet-2.1.1.man.html
A tarball with a special issue of the Text::FIGlet module is
available at
ftp://pthbb.org/pub/misc/qotw.tgz
It includes perl versions bof figlet and banner, and the
figlet man page
2) Process any output produced by FIGlet with the basic and
drpepper fonts. Your output should be as before.
In addition to the text to process from STDIN you must
accept an argument -f=<font> specifying the path to the
font the input was created with.
% figlet -f basic -A basic
d88b .d8b. d8888b. db db
`8P' d8' `8b 88 `8D 88 88
88 88ooo88 88oodD' 88ooo88
88 88~~~88 88~~~ 88~~~88
db. 88 88 88 88 88 88
Y8888P YP YP 88 YP YP
% !! | yourprogram.pl -f=figlib/basic.flf
JAPH
% figlet -f drpepper -A japh
_ _
<_> ___ ___ | |_
| |<_> || . \| . |
| |<___|| _/|_|_|
<__' |_|
% !! | yourprogram.pl -f=figlib/drpepper.flf
japh
3) Process any output produced by FIGlet with the argument
-m-1 and a font from the library listed above. You should
accept an argument -d=<dir> which specifies the location of
the font library. Programatically determine the font used
to render the provided input and process accordingly. Your
output should be as before with the name of the font chosen
printed to STDERR before the results are printed to STDOUT.
/
# #/
### ##
# ##
##
### /### /### ## /##
### / ### / / ### / ## / ###
## / ###/ / ###/ ##/ ###
/ ## ## ## ## ## ##
/ ## ## ## ## ## ##
### ## ## ## ## ## ##
### ## ## ## ## ## ##
### ## /# ## ## ## ##
### ####/ ## ####### ## ##
## ### ## ###### ## ##
## ## /
/ ## /
/ ## /
/ ## /
PS> For the interminably curious to see why the condition of
-m-1 is imposed compare the output of the font slscript
with and wihtout it.
----------------------------------------------------------------
This goal of this weeks quiz was to write a program to perform "OCR" on
ASCII-Art and determine what text the image represents. For details see
the original lengthy problem statement at
http://perl.plover.com/qotw/e/019
The quiz had three milestones, and the four particpants' entries were as
follows
Milestone 1 (M1):
Jurgen Pletinckx
http://perl.plover.com/~alias/list.cgi?1:mss:1892
Patrick LeBoutillier
http://perl.plover.com/~alias/list.cgi?1:mss:1888
Jerrad Pierce http://perl.plover.com/~alias/list.cgi?1:mss:1878
Milestone 2 (M2):
Jurgen Pletinckx Same as above
Jerrad Pierce Same as above
Milestone 3 (M3):
Jurgen Pletinckx Same as above
Jerrad Pierce http://perl.plover.com/~alias/list.cgi?1:mss:1891
Ronald J Kimball
http://perl.plover.com/~alias/list.cgi?1:mss:1889
I expected most people to use the same program for each milestone, or
at least M1 and M2, it seemed easier to develop a script and enhance
it to comply with subsequent requirements.
Not to get ahead of myself, but I think it's interesting, and speaks
to the nature of the challenge that all of the submissions require
perfect input in order to work. My own submission for M3 fails to
recognize "hello" in fraktur for the mismatch of a single "pixel".
All of the programs could be described on some level as "brute force",
though some more than others. Jurgen and Ronald's solutions for M3
process the input with all available fonts, and they have the false
positives of rot13 and term as a result.
There were four approaches to the guts of character matching, substr,
shift, regexp and serialization. Jurgen used substr for each row of
input to extract the section of input corresponding to a figlet
character for comparison. Patrick used a combination of substr and
serialized figlet characters (a one line representation for easy
storage and retrieval in a lookup table on disk). Ronald showed his
expert knowledge by using /\G//g against a string of serialized input
that had been effectively rotated the text 90 degrees
clockwise. Jerrad had a similarly interesting strategy of rotating the
input with Text::Orientation 90 degrees clockwise in order to swap
columns for rows; making it easy to shift lines off the resulting
array as they are used in a match.
Jerrad's program also included several optimizations including a
character frequency table and short-circuiting once a sole potential
match remained. Jurgen's program had an internal requirement that
characters be tested widest first and others used ASCII order. Jerrad
and Jurgen's programs both performed many more tests to accquire a
match due to the placement of tests for equality inside several levels
of loops.
I (Jerrad) had experimented with something more "liberal" that would
have alleviated this and made the engine less finicky about input,
"low-resolution" matching. I checked the 0th, 3rd, and 6th columns for
equality. It worked rather well however h, m and n were often
indistinguishable. That could have been fixed with a progressive scan
scheme but at the expense of readily avoiding an infinite loop for
fonts such as basic where lowercase and uppercase have the same
representation.
Jurgen and Ronald's M3 submissions were the only ones that worked out
of the box. All of the font-matching engines excluded fonts whose
height was not a multiple of the input height; fonts 4 and 8 line high
allowed for 16 line high input, but not 6, 9 or 10 line high
fonts. Jurgen and Ronald both then proceeded to process the text
against the remaining fonts. Jerrad adapted Patrick's fingerprint
cache on disk, eliminating fonts that did not include all of the
characters used in the input; helas this high pay-off optimization is
another example of requiring perfect input.
[ Thanks to Jerrad Pierce for running the QOTW this week. New quiz
tomorrow. -MJD ]
More information about the NewOrleans-pm
mailing list