[San-Diego-pm] accents

Tkil tkil-sdpm at scrye.com
Thu Oct 28 14:18:26 CDT 2004

>>>>> "Tkil" == Tkil <tkil at scrye.com> writes:

Tkil> For more generic cases, what you want to find is something that
Tkil> will "canonicalize" the unicode into one of two base forms (but
Tkil> preferably "C" form, which uses combining marks whenever
Tkil> possible).  Fortunately, there is a standard Unicode::Normalize
Tkil> module to do this for you.  First, I have to justify
Tkil> it:...............

>>>>> "Joel" == Joel Fentin <joel at fentin.com> writes:

Joel> From that point in your email onward, I stopped understanding.

I suspect that you actually stopped reading, or stopped trying to

I was trying to explain *why* it was a problem in the first place.
The fact that your mail arrived butchered was a great example of why
it's a problem.  But if you don't understand encodings, then you're
going to lose.

Joel> One thing seemed clear is that it didn't reek of fell-swoop.  I
Joel> didn't see anything cookbook-ish that I could build upon.  Thank
Joel> you anyhow.

I was trying to explain the problem; you wanted an instant solution,
which is not what I was providing.  Put another way: I was trying to
teach you how to fish.  You were looking for a fish handout.

Joel> My goal is similar to that of a search engine. Take a word or a
Joel> phrase and check it against a longer hunk of text. Yes there is
Joel> a match or no there isn't. The i modifier to m// takes care of
Joel> case. And it seems Convert::Translit takes care of accents.

Glad that your current problem is solved.  Consider the following
situations, though:

1. In Spanish, "ll" and "ch" are sometimes treated as "one character"
   (e.g. for collating purposes).

2. In German, there is a single lower-case character (ess-zet, the one
   that looks like a beta)... but in capital letters, it's written
   "SS".  What searches should work here?

And your comment of "there is a match or there isn't" is itself vague.
You have to more carefully specify what makes a match and what
doesn't.  You might know -- but we don't, so we're left to guess.

I guess I'm just venting some frustration that you are asking for a
solution, but seem uninterested in learning about the basics that
would help you form your own solution.


More information about the San-Diego-pm mailing list