[tpm] Irritation problem - regex French character set
Liam R E Quin
liam at holoweb.net
Tue Apr 10 18:52:41 PDT 2012
On Tue, 2012-04-10 at 20:10 -0400, Chris Jones wrote:
> Having successfully untainted one file while
> reading it in, I am now faced with untainting a
> file containing two languages, English and French.
>
> File - tagnames2.dat
> key English value French value
> p1a_help1 Getting Help Obtenir de l'aide
> p2a_type Building Type: Type de bâtiment:
> p3a_error_less must be no less than ne peut pas être inférieur à
>
> As well, this file contains some math like symbols: >, =, <, ~
>
> My initial regex is:
> if( $tagLine =~
> /([\w]+)\t([-\w\/.]+)\t([-\w\/.]+)$/) # key and two values the same format
> {
> my $tag = $1;
> my $phraseE = $2;
> my $phraseF = $3;
> my $tmpref = {
> english => "$phraseE",
> francais => "$phraseF" };
> $tags{ $tag } = $tmpref;
> $count++;
> }
It sounds like you might want this instead:
if ($tagLine =~ m{^([^\t]+)\t([^\t]+)\t(.+)$}) {
$tags{$1} = {
english => $2, française => $3
};
++$count;
} else {
# maybe log an error here? be careful not to
# show the untrusted data in an error message that
# goes to the user, though!
}
since you want to match based on tabs, not on what's between them.
Liam
--
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
More information about the toronto-pm
mailing list