[Purdue-pm] Perl Challenge 96.2: Edit Distance

Dave Jacoby jacoby.david at gmail.com
Mon Jan 18 17:58:47 PST 2021


I'm writing because I do not currently understand my next step. I may blog
this. This is interesting because for most of the PWC challenges, there may
be clever here or there, but mostly it's a simple matter of programming.

The Weekly Challenge - 096 (perlweeklychallenge.org)
<https://perlweeklychallenge.org/blog/perl-weekly-challenge-096/>


*You are given two strings $S1 and $S2.*



*Write a script to find out the minimum operations required to convert $S1
into $S2.The operations can be insert, remove or replace a character.Please
check out Wikipedia page (*Edit distance - Wikipedia
<https://en.wikipedia.org/wiki/Edit_distance>*) for more information.*

The first example in the challenge and the wikipedia page is "kitten" and
"sitting". There are three changes necessary to turn one to the other,
shown by capitals.

    kitten
    Sitten
    sittIn
    sittinG

As mentioned in the Wikipedia article, we can get to the count of steps
using the Levenshtein Distance. I know about that because I saw it in
perlbrew internals: This is how it knows to
suggest "exec" when you accidentally type "perlbrew esec", but I hadn't
looked into HOW it works inside. So, I can say "there are three operations"
but I can't, within the program, say we have to insert a G, replace K with
S, and replace E with I.

So I finally read to understand the LD algorithm, rather than restate it
from Wikipedia and say that's that. To demonstrate, we'll start comparing
"test" with "test".  In the below examples, the first row and the first
column are preset, counting up from zero.

     t e s t
  [0,1,2,3,4]
t [1,0,1,2,3]
e [2,1,0,1,2]
s [3,2,1,0,1]
t [4,3,2,1,0]

If nothing is different in position [i,j], it simply copies the value of
position i-1, j-i, and that transfers down,so that position [-1,-1], the
last column of the last row, is 0, so there's no difference.

     t e s t
  [0,1,2,3,4]
t [1,0,1,2,3]
e [2,1,0,1,2]
x [3,2,1,1,2]
t [4,3,2,2,1]

Here we see that [3,3] is 1. It got that way because there is a difference,
and took the minimum value of [2,2], [2,3], and [3,2], then added one.

     l i g h t s
  [0,1,2,3,4,5,6]
s [1,1,2,3,4,5,5]
l [2,1,2,3,4,5,6]
i [3,2,1,2,3,4,5]
g [4,3,2,1,2,3,4]
h [5,4,3,2,1,2,3]
t [6,5,4,3,2,1,2]

This compares "slight" with "lights", which, given the add, remove or
change rules, give us two. This would be removing the S from the beginning
and adding S to the end. 2 changes, as [-1,-1] indicates.

I'm *thinking* that the way forward is to travel up and to the left with
the minimum of those three values, making no change if the minimum equals
the current and

This Looks Like A Job For *RECURSION!* , but I believe most challenges look
like that, until I think harder. Anyone else have thoughts about a better
way to solve this?
-- 
Dave Jacoby
jacoby.david at gmail.com

“There is nothing obvious”
    — Theo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/purdue-pm/attachments/20210118/b8b87cea/attachment.html>


More information about the Purdue-pm mailing list