[tpm] ucfirst() and unicode

Wed Apr 7 12:12:31 PDT 2010

On 10-04-06 03:33 PM, Stuart Watt wrote:
> Digimer wrote:
>> From reading perldoc perlunicode, I was able to figure out why
>> ucfirst() wasn't doing anything; The data I am altering is coming from
>> a UTF8-encoded database. I also see the example of creating UTF8
>> compatible ToUpper(), ToLower(), etc.
>>
>> There isn't an example of a compatible ucfirst() alternative, and as I
>> read it, I'd need to create a custom function listing the
>> source->destination unicodes to convert... This seems tedious so,
>> given that laziness is the source of all code, I am guessing someone
>> has come up with another way. Failing that, is there such a function
>> already?
>>
>> My CPAN search for 'ucfirst unicode' failed (though it's always
>> possible that there is a PEBCAK).
>>
>> tl;dr - need a ucfirst() variant that works with Unicode strings.
> I think some of this is locale-specific, which is why it isn't obvious.
> i.e., what actually happens can vary from locale to locale. For example,
> é can be uppercased to E and É depending on which region you are in. See
> http://search.cpan.org/~dapm/perl-5.10.1/pod/perllocale.pod#Category_LC_CTYPE:_Character_Types
> <http://search.cpan.org/%7Edapm/perl-5.10.1/pod/perllocale.pod#Category_LC_CTYPE:_Character_Types>
> for some stuff.
>
> Just putting "use locale;" in your script might be a good place to start.
>
> All the best
> Stuart
>

This got me going in the right direction, thank you!

I had already been using 'use locale', but while looking into it I saw 
that is plays with what perl interprets \l, \u, \L and \U to mean. From 
that, I was able to create this little function that seems to work:

sub uppercase_first_letter
{
	my ($word)=@_;

	my $new="";
	foreach my $char (split//, $word)
	{
		$char=~s/^(\w)/\l$1/;
		$new.=$char;
	}
	$new=~s/^(\w)/\u$1/;
	$word=$new;

	return($word);
}

-- 
Digimer
E-Mail:         linux at alteeve.com
AN!Whitepapers: http://alteeve.com
Node Assassin:  http://nodeassassin.org