[tpm] ucfirst() and unicode

Stuart Watt stuart at morungos.com
Tue Apr 6 12:33:51 PDT 2010


Digimer wrote:
> From reading perldoc perlunicode, I was able to figure out why 
> ucfirst() wasn't doing anything; The data I am altering is coming from 
> a UTF8-encoded database. I also see the example of creating UTF8 
> compatible ToUpper(), ToLower(), etc.
>
>   There isn't an example of a compatible ucfirst() alternative, and as 
> I read it, I'd need to create a custom function listing the 
> source->destination unicodes to convert... This seems tedious so, 
> given that laziness is the source of all code, I am guessing someone 
> has come up with another way. Failing that, is there such a function 
> already?
>
>   My CPAN search for 'ucfirst unicode' failed (though it's always 
> possible that there is a PEBCAK).
>
> tl;dr - need a ucfirst() variant that works with Unicode strings.
I think some of this is locale-specific, which is why it isn't obvious. 
i.e., what actually happens can vary from locale to locale. For example, 
é can be uppercased to E and É depending on which region you are in. See 
http://search.cpan.org/~dapm/perl-5.10.1/pod/perllocale.pod#Category_LC_CTYPE:_Character_Types 
<http://search.cpan.org/%7Edapm/perl-5.10.1/pod/perllocale.pod#Category_LC_CTYPE:_Character_Types> 
for some stuff.

Just putting "use locale;" in your script might be a good place to start.

All the best
Stuart


More information about the toronto-pm mailing list