tkil-sdpm at scrye.com
Thu Oct 28 14:18:26 CDT 2004
>>>>> "Tkil" == Tkil <tkil at scrye.com> writes:
Tkil> For more generic cases, what you want to find is something that
Tkil> will "canonicalize" the unicode into one of two base forms (but
Tkil> preferably "C" form, which uses combining marks whenever
Tkil> possible). Fortunately, there is a standard Unicode::Normalize
Tkil> module to do this for you. First, I have to justify
>>>>> "Joel" == Joel Fentin <joel at fentin.com> writes:
Joel> From that point in your email onward, I stopped understanding.
I suspect that you actually stopped reading, or stopped trying to
I was trying to explain *why* it was a problem in the first place.
The fact that your mail arrived butchered was a great example of why
it's a problem. But if you don't understand encodings, then you're
going to lose.
Joel> One thing seemed clear is that it didn't reek of fell-swoop. I
Joel> didn't see anything cookbook-ish that I could build upon. Thank
Joel> you anyhow.
I was trying to explain the problem; you wanted an instant solution,
which is not what I was providing. Put another way: I was trying to
teach you how to fish. You were looking for a fish handout.
Joel> My goal is similar to that of a search engine. Take a word or a
Joel> phrase and check it against a longer hunk of text. Yes there is
Joel> a match or no there isn't. The i modifier to m// takes care of
Joel> case. And it seems Convert::Translit takes care of accents.
Glad that your current problem is solved. Consider the following
1. In Spanish, "ll" and "ch" are sometimes treated as "one character"
(e.g. for collating purposes).
2. In German, there is a single lower-case character (ess-zet, the one
that looks like a beta)... but in capital letters, it's written
"SS". What searches should work here?
And your comment of "there is a match or there isn't" is itself vague.
You have to more carefully specify what makes a match and what
doesn't. You might know -- but we don't, so we're left to guess.
I guess I'm just venting some frustration that you are asking for a
solution, but seem uninterested in learning about the basics that
would help you form your own solution.
More information about the San-Diego-pm