From joel at fentin.com Sat Oct 9 00:02:16 2004 From: joel at fentin.com (Joel Fentin) Date: Sat Oct 9 00:02:20 2004 Subject: [San-Diego-pm] ssi problem Message-ID: <416770D8.6070208@fentin.com> If I put the address of a file in the browser that has a line like this: It displays what is in the main file as well as what is in top.html. ============================= But if I have perl code that looks like this: print "Content-type: text/html\n\n"; foreach(@Buf) #all lines in file { print $_; #display a line } Shows up in the "view source" but the contents of top.html do not. Any cure? -- Joel Fentin tel: 760-749-8863 FAX: 760-749-8864 Contact me: http://fentin.com/me/ContactMe.html Biz: http://fentin.com Personal: http://fentin.com/me/ From rkleeman at energoncube.net Mon Oct 11 13:17:41 2004 From: rkleeman at energoncube.net (Bob Kleemann) Date: Mon Oct 11 13:17:49 2004 Subject: [San-Diego-pm] Meeting Next Week Message-ID: <20041011181741.GC21275@energoncube.net> Just a friendly heads up folks, we're going to have a meeting next Tuesday, October 19, 7PM at Callahan's in Mira Mesa. From menolly at mib.org Wed Oct 13 19:00:38 2004 From: menolly at mib.org (Menolly) Date: Wed Oct 13 19:00:49 2004 Subject: [San-Diego-pm] Job Opportunity Message-ID: POSITION TITLE: Software Engineer Consultant REPORTS TO: Sr. Development Mgr LOCATION: San Diego OPENING DATE: October 13, 2004 SALARY GRADE: Negotiable Duration: Through December 2004 Summary: This individual will be responsible for the design, development, and maintenance of existing Internet applications written with mod_perl and MySQL. This individual will also be responsible for integrating these applications with additional Internet applications written with ASP, VB, C++, and SQL Server 2000. The candidate will analyze software requirements to determine feasibility of design to provide accurate development estimates. The candidate will be familiar with n-tier architecture and have experience developing Internet based client/server applications under the two different environments. The candidate will create technical specification documents in support of development tasks and assist in the creation of project plans. Responsibilities: * Full technical knowledge of all phases of application development. * Experience developing client/server web applications * Experience with highly available, scalable, distributed systems utilizing n-tier architecture * Knowledge of and hands-on experience with Perl, mod_perl, Linux, Apache, and MySQL * Knowledge of and hands-on experience with Windows, C/C++, and MS SQL * Knowledge of browser limitations & capabilities, as well as knowledge of browser/server interaction * Must have good communication skills and work well in a team environment Education/Experience: Qualified candidates will have a Bachelors degree, preferably in Computer Science (BSCS) or Engineering. Five years of application development and two years of experience designing and implementing client/server applications using mod_perl and MySQL. Qualified candidates please send resumes to tduran@plato.com. From joel at fentin.com Thu Oct 14 10:26:08 2004 From: joel at fentin.com (Joel Fentin) Date: Thu Oct 14 12:14:23 2004 Subject: [San-Diego-pm] Displaying tags instead of executing them Message-ID: <416E9A90.6020609@fentin.com> I am working on a project which examines a number of files and displays a line from each file in a table in the browser. If the line from the file has an html tag, it gets executed rather than displayed. The only way around it that I have discovered is to replace with ?tag?. Is there something better? -- Joel Fentin tel: 760-749-8863 FAX: 760-749-8864 Contact me: http://fentin.com/me/ContactMe.html Biz: http://fentin.com Personal: http://fentin.com/me/ From christopher.hahn at peregrine.com Thu Oct 14 12:19:43 2004 From: christopher.hahn at peregrine.com (Christopher Hahn) Date: Thu Oct 14 12:22:07 2004 Subject: [San-Diego-pm] Displaying tags instead of executing them Message-ID: Joel, Pardon, but what exactly do you mean by "executed" below? (I guess that you do not want to stuff the html as-is into your table, but you could....) Good Luck, chahn -----Original Message----- From: Joel Fentin [mailto:joel@fentin.com] Sent: Thursday, October 14, 2004 8:26 AM To: San Diego Perl Mongers Subject: [San-Diego-pm] Displaying tags instead of executing them I am working on a project which examines a number of files and displays a line from each file in a table in the browser. If the line from the file has an html tag, it gets executed rather than displayed. The only way around it that I have discovered is to replace with . Is there something better? -- Joel Fentin tel: 760-749-8863 FAX: 760-749-8864 Contact me: http://fentin.com/me/ContactMe.html Biz: http://fentin.com Personal: http://fentin.com/me/ _______________________________________________ San-Diego-pm mailing list San-Diego-pm@mail.pm.org http://www.pm.org/mailman/listinfo/san-diego-pm From chris_radcliff at mac.com Thu Oct 14 12:33:15 2004 From: chris_radcliff at mac.com (Chris Radcliff) Date: Thu Oct 14 12:33:27 2004 Subject: [San-Diego-pm] Displaying tags instead of executing them In-Reply-To: <416E9A90.6020609@fentin.com> References: <416E9A90.6020609@fentin.com> Message-ID: <210D018E-1E07-11D9-B232-00039301A6E2@mac.com> The simplest (if not most correct) thing to do is replace all occurrences of < with <, like so: $line =~ s/ I am working on a project which examines a number of files and > displays a line from each file in a table in the browser. If the line > from the file has an html tag, it gets executed rather than displayed. > > The only way around it that I have discovered is to replace with > ?tag?. Is there something better? From rkleeman at energoncube.net Thu Oct 14 12:35:39 2004 From: rkleeman at energoncube.net (Bob Kleemann) Date: Thu Oct 14 12:35:50 2004 Subject: [San-Diego-pm] Displaying tags instead of executing them In-Reply-To: <416E9A90.6020609@fentin.com> References: <416E9A90.6020609@fentin.com> Message-ID: <20041014173539.GA12973@energoncube.net> If I understand what you want to do, then I think you want to look at CGI::escapeHTML(). On Thu, Oct 14, 2004 at 08:26:08AM -0700, Joel Fentin wrote: > I am working on a project which examines a number of files and displays > a line from each file in a table in the browser. If the line from the > file has an html tag, it gets executed rather than displayed. > > The only way around it that I have discovered is to replace with > ?tag?. Is there something better? > -- > Joel Fentin tel: 760-749-8863 FAX: 760-749-8864 > Contact me: http://fentin.com/me/ContactMe.html > Biz: http://fentin.com > Personal: http://fentin.com/me/ > > _______________________________________________ > San-Diego-pm mailing list > San-Diego-pm@mail.pm.org > http://www.pm.org/mailman/listinfo/san-diego-pm From joel at fentin.com Thu Oct 14 14:53:48 2004 From: joel at fentin.com (Joel Fentin) Date: Thu Oct 14 19:22:20 2004 Subject: [San-Diego-pm] Displaying tags instead of executing them In-Reply-To: <210D018E-1E07-11D9-B232-00039301A6E2@mac.com> References: <416E9A90.6020609@fentin.com> <210D018E-1E07-11D9-B232-00039301A6E2@mac.com> Message-ID: <416ED94C.8040503@fentin.com> Chris Radcliff wrote: > The simplest (if not most correct) thing to do is replace all > occurrences of < with <, like so: > > $line =~ s/ > That will make all the HTML tags inactive but still render them correctly. Thank you Chris. That and > did the trick. -- Joel Fentin tel: 760-749-8863 FAX: 760-749-8864 Contact me: http://fentin.com/me/ContactMe.html Biz: http://fentin.com Personal: http://fentin.com/me/ From joel at fentin.com Sun Oct 17 01:10:06 2004 From: joel at fentin.com (Joel Fentin) Date: Sun Oct 17 01:10:08 2004 Subject: [San-Diego-pm] LWP::Simple Message-ID: <41720CBE.3060808@fentin.com> I can't get LWP::Simple to do anything. I've tried variations of the following. There is no error message, but $Buff is always empty. What don't I understand? #!/perl/bin/perl -w use strict; BEGIN{use CGI::Carp qw(carpout fatalsToBrowser);carpout(\*STDOUT);$|=1;} use LWP::Simple; my $Buff; $Buff = get("http://127.0.0.1/cgi-bin/PacoWeb/SiteSearch.htm"); #personal server #$Buff = get("http://fentin.com.index.html"); #internet #die "Couldn?t get it!" unless defined $Buff; die "could not get page: $LWP::Simple::error" if not defined $Buff; die $Buff; ============================ PERSONAL SERVER ERROR LOG: [Sat Oct 16 23:02:43 2004] [error] [client 127.0.0.1] C:/Apache2/cgi-bin/PacoWeb/SiteSearch.htm is not executable; ensure interpreted scripts have "#!" first line [Sat Oct 16 23:02:43 2004] [error] [client 127.0.0.1] (9)Bad file descriptor: don't know how to spawn child process: C:/Apache2/cgi-bin/PacoWeb/SiteSearch.htm -- Joel Fentin tel: 760-749-8863 FAX: 760-749-8864 Contact me: http://fentin.com/me/ContactMe.html Biz: http://fentin.com Personal: http://fentin.com/me/ From david_roe at mac.com Sun Oct 17 14:36:02 2004 From: david_roe at mac.com (Dave Roe) Date: Sun Oct 17 14:35:15 2004 Subject: [San-Diego-pm] LWP::Simple In-Reply-To: <41720CBE.3060808@fentin.com> References: <41720CBE.3060808@fentin.com> Message-ID: On Oct 16, 2004, at 11:10 PM, Joel Fentin wrote: > $Buff = get("http://127.0.0.1/cgi-bin/PacoWeb/SiteSearch.htm"); > #personal server try putting your .htm file outside of /cgi-bin? > #$Buff = get("http://fentin.com.index.html"); #internet try fentin.com/index.html? /dave From joel at fentin.com Sun Oct 17 16:47:10 2004 From: joel at fentin.com (Joel Fentin) Date: Sun Oct 17 16:47:11 2004 Subject: [San-Diego-pm] LWP::Simple In-Reply-To: References: <41720CBE.3060808@fentin.com> Message-ID: <4172E85E.9080403@fentin.com> Dave Roe wrote: > try putting your .htm file outside of /cgi-bin? > try fentin.com/index.html? Thank you. Both suggestions worked. I had spent so much time experimenting and searching google. -- Joel Fentin tel: 760-749-8863 FAX: 760-749-8864 Contact me: http://fentin.com/me/ContactMe.html Biz: http://fentin.com Personal: http://fentin.com/me/ From rkleeman at energoncube.net Mon Oct 18 18:55:02 2004 From: rkleeman at energoncube.net (Bob Kleemann) Date: Mon Oct 18 18:55:12 2004 Subject: [San-Diego-pm] Meeting Tuesday Night Message-ID: <20041018235502.GB23814@energoncube.net> Hey Folks, Just a reminder, there is a meeting tommorow (Tuesday) night. 7PM at Callahan's in Mira Mesa. If you're planning on coming, drop me a line and I'll make sure there is enough seating for the group. From chahn at peregrine.com Wed Oct 20 14:29:44 2004 From: chahn at peregrine.com (Christopher Hahn) Date: Wed Oct 20 14:29:53 2004 Subject: [San-Diego-pm] book suggestion Message-ID: <4176BCA8.4060202@peregrine.com> Hello all, It was nice to meet with you all last night. I am afraid that I cannot recall the name of the member who wanted to know about industrial strength perl apps, but I saw reference to this Oreilly book and thought that it probably describes usef ul techniques. http://www.oreilly.com/catalog/hpmysql/index.html Yes, it was perl you were asking about, but it seemed that you intended to use MySQL and so I thought to comment. Just a thought. Take care, Christopher From merlyn at stonehenge.com Wed Oct 20 14:57:58 2004 From: merlyn at stonehenge.com (Randal L. Schwartz) Date: Wed Oct 20 14:58:16 2004 Subject: [San-Diego-pm] book suggestion In-Reply-To: <4176BCA8.4060202@peregrine.com> References: <4176BCA8.4060202@peregrine.com> Message-ID: <863c0940xl.fsf@blue.stonehenge.com> >>>>> "Christopher" == Christopher Hahn writes: Christopher> Yes, it was perl you were asking about, but it seemed that you Christopher> intended to use Christopher> MySQL and so I thought to comment. Slashdot's been all over that. The real name for "High performance MySQL" is "PostgreSQL". :) MySQL was perfect in its time, but its time has past. The continuum is now: flat files - SQLite - PostgreSQL - Oracle with enough overlap in each to make the pain painless. The only reason to use MySQL today is "legacy".... either in brainspace or existing apps that haven't yet been modernized. If .info and .org run on PostgreSQL (and *not* MySQL), that's good enough for me. -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training! From allengil at sdf.lonestar.org Wed Oct 20 15:38:39 2004 From: allengil at sdf.lonestar.org (Allen Gilson) Date: Wed Oct 20 15:39:07 2004 Subject: [San-Diego-pm] book suggestion In-Reply-To: <863c0940xl.fsf@blue.stonehenge.com> References: <4176BCA8.4060202@peregrine.com> <863c0940xl.fsf@blue.stonehenge.com> Message-ID: Thanx Chris & Randal. I'm the one starting an open source knowledge management system from the meeting last night. I will look into converting over to SQLite or PostgreSQL. Allen Gilson On Wed, 20 Oct 2004, Randal L. Schwartz wrote: > Date: 20 Oct 2004 12:57:58 -0700 > From: Randal L. Schwartz > To: Christopher Hahn > Cc: Perl Mongers > Subject: Re: [San-Diego-pm] book suggestion > >>>>>> "Christopher" == Christopher Hahn writes: > > Christopher> Yes, it was perl you were asking about, but it seemed that you > Christopher> intended to use > Christopher> MySQL and so I thought to comment. > > Slashdot's been all over that. The real name for > "High performance MySQL" is "PostgreSQL". :) > > MySQL was perfect in its time, but its time has past. The continuum > is now: > > flat files - SQLite - PostgreSQL - Oracle > > with enough overlap in each to make the pain painless. > > The only reason to use MySQL today is "legacy".... either in brainspace > or existing apps that haven't yet been modernized. > > If .info and .org run on PostgreSQL (and *not* MySQL), that's good > enough for me. > > -- > Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 > > Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. > See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training! > _______________________________________________ > San-Diego-pm mailing list > San-Diego-pm@mail.pm.org > http://www.pm.org/mailman/listinfo/san-diego-pm > +*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*+ Allen Gilson allengil@freeshell.org +*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*+ From allengil at sdf.lonestar.org Wed Oct 20 16:06:18 2004 From: allengil at sdf.lonestar.org (Allen Gilson) Date: Wed Oct 20 16:06:39 2004 Subject: [San-Diego-pm] Perl Book Data Message-ID: Joel, About your concern expressed at last night's meeting on the popularity of Perl. The research I did on the sale of books for Perl book while working on my BSIT showed it to be very popular, with respect to the sale of books on VB and C#. I sent the same question the San Diego Technical Books (now That Technical Book Store) today, and below was their response. I hope this helps with your concerns. Hello Allen, SDTB had to declare bankruptcy on 9/23. We have started this new business, but all our sales data for the old store was left behind along with our inventory. The top ten Perl books we sell are as follows: Learning Perl Programming Perl Perl Cookbook, 2nd Edition Learning Perl Objects Mastering Perl/TK Advanced Perl Programming Perl Pocket Reference Perl By Example Learning Perl on Win32 Systems Programming Perl DBI Learning Perl is our bestseller. I can tell you that Perl is most popular at Qualcomm. In the last three months we sold 29 copies of Programming Perl, 18 copies of Learning Perl and 15 copies of the Perl Cookbook to Qualcomm alone. Those three titles are in the ten bestseller list for Qualcomm Employees. They have not purchased any Visual Basic books or C# books in that same timeframe. I hope what little information I was able to provide will be helpful. Thanks, Suzi +*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*+ Allen Gilson allengil@freeshell.org +*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*+ From merlyn at stonehenge.com Wed Oct 20 16:22:17 2004 From: merlyn at stonehenge.com (Randal L. Schwartz) Date: Wed Oct 20 16:22:27 2004 Subject: [San-Diego-pm] Perl Book Data In-Reply-To: References: Message-ID: <86fz492igm.fsf@blue.stonehenge.com> >>>>> "Allen" == Allen Gilson writes: Allen> About your concern expressed at last night's meeting on the popularity Allen> of Perl. Another ad-hoc data point... go look at search.cpan.org/recent There are more "recent uploads" daily *right now* than there ever were even in the height of the dot-com bubble. In other words, more stuff (and I mean *cool* stuff) is happening with Perl today than ever before. -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training! From joel at fentin.com Wed Oct 20 19:39:45 2004 From: joel at fentin.com (Joel Fentin) Date: Wed Oct 20 20:02:55 2004 Subject: [San-Diego-pm] Perl Book Data In-Reply-To: References: Message-ID: <41770551.3000803@fentin.com> Allen Gilson wrote: > Joel, > > About your concern expressed at last night's meeting on the popularity > of Perl. We can't get a quarum at the San Diego Perlmonger meetings. Dozens of people go to the local VB meetings. The Perl Journal can't stay afloat. Not so with the VB organs. On a more personal level I am hearing my clients lamenting the fact that I am not into PhP and won't get into it. They insist PhP's star is rising while Perl has stalled. Those are some of the reasons I raise the issue. I have no opinion about this - and - I have no opinion about that. ======================== > Learning Perl is our bestseller. I can tell you that Perl is most popular > at Qualcomm.... This came as a surprize to me. The VB group held it's meetings at Qualcomm/Ericson for years. I presumed them to be a M$ house. -- Joel Fentin tel: 760-749-8863 FAX: 760-749-8864 Contact me: http://fentin.com/me/ContactMe.html Biz: http://fentin.com Personal: http://fentin.com/me/ From emileaben at yahoo.com Sat Oct 23 18:41:51 2004 From: emileaben at yahoo.com (Emile Aben) Date: Sat Oct 23 18:41:59 2004 Subject: [San-Diego-pm] Web App Developer/Sys Admin position available Message-ID: <20041023234151.20562.qmail@web60807.mail.yahoo.com> The company I work for has Web App Developer positions available that involve a lot of perl programming. Mail me your resume if you're interested. ======= SIPphone, a free SIP-based VoIP service provider from the founder of Lindows and MP3.com, is looking for two talented engineers with the ambition and skills to join our fast growing and aggressive startup. Qualified candidates will have all or most of the following skills and experience: * demonstrated programming experience with scripting languages such as perl, mod_perl, python and programming techniques such as OOD/OOP * demonstrated systems admin experience with Linux, MySQL and Apache * demonstrated knowledge of TCP/IP * demonstrated experience with web technologies (HTML, XML) * strong troubleshooting and anlytical problem solving skills * knowledge of SIP, C/C++, Java are very helpful Previous startup experience preferred. Job location is San Diego it's NOT ok to contact this poster with services or other commercial interests Compensation: $50k-$100k Principals only. Recruiters, please don't contact this job poster. Please, no phone calls about this job! Please do not contact job poster about other services, products or commercial interests. Reposting this message elsewhere is NOT OK. _______________________________ Do you Yahoo!? Declare Yourself - Register online to vote today! http://vote.yahoo.com From joel at fentin.com Tue Oct 26 16:15:07 2004 From: joel at fentin.com (Joel Fentin) Date: Tue Oct 26 17:17:37 2004 Subject: [San-Diego-pm] accents Message-ID: <417EBE5B.2060001@fentin.com> I need to see if what the Spanish language operator enters is contained in a long hunk of text. Something like this: if($X =~ /$Y/){[Do something]} The operator might enter jos?, JOS?, or jose. He might enter ni?o, NI?O, or nino. An i modifier to m// will take care of case. Is there any fell swoop way of taking care of accents? -- Joel Fentin tel: 760-749-8863 FAX: 760-749-8864 Contact me: http://fentin.com/me/ContactMe.html Biz: http://fentin.com Personal: http://fentin.com/me/ From dgwilson1 at cox.net Tue Oct 26 21:32:17 2004 From: dgwilson1 at cox.net (Douglas Wilson) Date: Tue Oct 26 21:31:30 2004 Subject: [San-Diego-pm] accents In-Reply-To: <417EBE5B.2060001@fentin.com> References: <417EBE5B.2060001@fentin.com> Message-ID: <417F08B1.7020908@cox.net> Joel Fentin wrote: > I need to see if what the Spanish language operator enters is contained > in a long hunk of text. Something like this: > if($X =~ /$Y/){[Do something]} > > The operator might enter jos?, JOS?, or jose. He might enter ni?o, NI?O, > or nino. > > An i modifier to m// will take care of case. Is there any fell swoop way > of taking care of accents? Those characters all have the high order bit on, so here is a crude way: my $str = "jos?, JOS?, or jose. He might enter ni?o, NI?O"; my @funny_chars = $str =~ /([^\x00-\x7F])/g; print "@funny_chars\n"; Cheers, Doug From joel at fentin.com Tue Oct 26 23:02:26 2004 From: joel at fentin.com (Joel Fentin) Date: Tue Oct 26 23:05:14 2004 Subject: [San-Diego-pm] accents In-Reply-To: <417F08B1.7020908@cox.net> References: <417EBE5B.2060001@fentin.com> <417F08B1.7020908@cox.net> Message-ID: <417F1DD2.3000405@fentin.com> Douglas Wilson wrote: > > > Joel Fentin wrote: > >> I need to see if what the Spanish language operator enters is >> contained in a long hunk of text. Something like this: >> if($X =~ /$Y/){[Do something]} >> >> The operator might enter jos?, JOS?, or jose. He might enter ni?o, >> NI?O, or nino. >> >> An i modifier to m// will take care of case. Is there any fell swoop >> way of taking care of accents? > > > Those characters all have the high order bit on, so here is a crude > way: > my $str = "jos?, JOS?, or jose. He might enter ni?o, NI?O"; > > my @funny_chars = $str =~ /([^\x00-\x7F])/g; > > print "@funny_chars\n"; Doug, Although what you say is interesting, it gets me no closer to my solution. Knocking the high order bit from ? does not turn it into n. The one is a binary 11110001 and the other is 01101110. In order to do a language neutral look into $Y to see if $X is within, I will probably have to knock the accents off all such characters in both $X & $Y. What I am hoping is that Perl has a fell-swoop method for this. I looked in my books, and on the Internet, but much of what I saw I couldn't understand (Unicode) and none of it seemed on target. -- Joel Fentin tel: 760-749-8863 FAX: 760-749-8864 Contact me: http://fentin.com/me/ContactMe.html Biz: http://fentin.com Personal: http://fentin.com/me/ From dgwilson1 at cox.net Wed Oct 27 01:11:30 2004 From: dgwilson1 at cox.net (Douglas Wilson) Date: Wed Oct 27 01:10:42 2004 Subject: [San-Diego-pm] accents In-Reply-To: <417F1DD2.3000405@fentin.com> References: <417EBE5B.2060001@fentin.com> <417F08B1.7020908@cox.net> <417F1DD2.3000405@fentin.com> Message-ID: <417F3C12.5030507@cox.net> Joel Fentin wrote: > > In order to do a language neutral look into $Y to see if $X is within, I > will probably have to knock the accents off all such characters in both > $X & $Y. What I am hoping is that Perl has a fell-swoop method for this. I misunderstood the question. I thought you just wanted to see IF there were accented, etc. characters in the string. If you want to convert the characters to 7-bit characters, I think this'll work: use strict; use warnings; use Convert::Translit; my $str = "jos?, JOS?, or jose. He might enter ni?o, NI?O"; my $t = Convert::Translit->new('Latin2', 'ascii'); print $t->transliterate($str),"\n"; ___OUTPUT___ jose, JOSE, or jose. He might enter nino, NINO -Doug From tkil-sdpm at scrye.com Wed Oct 27 01:20:05 2004 From: tkil-sdpm at scrye.com (Tkil) Date: Wed Oct 27 01:20:13 2004 Subject: [San-Diego-pm] accents In-Reply-To: <417EBE5B.2060001@fentin.com> (Joel Fentin's message of "Tue, 26 Oct 2004 14:15:07 -0700") References: <417EBE5B.2060001@fentin.com> Message-ID: >>>>> "Joel" == Joel Fentin writes: Joel> I need to see if what the Spanish language operator enters is Joel> contained in a long hunk of text. Something like this: Joel> if($X =~ /$Y/){[Do something]} Joel> The operator might enter jos?, JOS?, or jose. He might enter ni?o, Joel> NI?O, or nino. Joel> An i modifier to m// will take care of case. Is there any fell Joel> swoop way of taking care of accents? If it's a character you know might have an accent, you can use \X (which matches any base character plus possible combining characters). For more generic cases, what you want to find is something that will "canonicalize" the unicode into one of two base forms (but preferably "C" form, which uses combining marks whenever possible). Fortunately, there is a standard Unicode::Normalize module to do this for you. First, I have to justify it: So, you have "manana". You might store it that way in your match variable, but the actual entry data might be any one of: Pure ASCII, no tilde: 6D 61 6E 61 6E 61 ISO-8859-1. Note that IE under windows often lies; it might claim it's sending ISO-8859-1, but it's really sending CP1252. In this case, note that the mapping of "LATIN SMALL LETTER N WITH TILDE" is to code point 0xF1: 6D 61 F1 61 6E 61 Since Unicode adopted U+0080 through U+00FF from ISO-8859-1, it is entirely reasonable to represent that 0xF1 by the UTF-8 expansion of C3B1: 6D 61 6E C3B1 6E 61 Finally, this string can also be represented with seven unicode code points: 'm', 'a', 'n', COMBINING TILDE, 'a', 'n', 'a': 6D 61 6E 6E CC83 6E 61 So these are a sampling of the ways that you might get incoming data. The question is, what do you want to match it against? You can use "\X" somewhat like this: if ( $input =~ /ma\Xana/ ) { ... } More info in "perldoc perlre". If you want to be a bit cleverer, take a look at Unicode::Normalize. Something like this should give you the "base characters": use Encode qw( decode ); use Unicode::Normalize qw( NFD reorder ); # take a raw byte stream and interpret it as though it were in # ISO-8859-1. my $raw = decode "iso-8859-1", "man\xf1na"; # normalize it in fully decomposed form ("Normalized Form D") # see: http://www.unicode.org/reports/tr15/ my $norm = reorder NFD $raw; # remove any characters that aren't ascii: $norm =~ tr/\x00-\x7f//cd; Given the above examples, you can look at the result like so: | $ perl -MEncode=decode \ | > -MUnicode::Normalize=NFD,reorder -lwe ' | > $m = reorder NFD decode "iso-8859-1", "man\xf1na"; | > $m =~ tr/\x00-\x7f//cd; | > print uc unpack "H*", $m' | 6D616E6E6E61 I have no idea how well stuff like this works when you start talking about non-roman alphabets (kanji, katakana, arabic, etc), though. t. p.s. Please be warned that this is an area of perl that I'm still just poking around the fringes of -- the above might explode in your face... From joel at fentin.com Wed Oct 27 12:04:21 2004 From: joel at fentin.com (Joel Fentin) Date: Wed Oct 27 12:17:28 2004 Subject: [San-Diego-pm] accents In-Reply-To: <417F3C12.5030507@cox.net> References: <417EBE5B.2060001@fentin.com> <417F08B1.7020908@cox.net> <417F1DD2.3000405@fentin.com> <417F3C12.5030507@cox.net> Message-ID: <417FD515.8060204@fentin.com> Douglas Wilson wrote: > use Convert::Translit; > > my $str = "jos?, JOS?, or jose. He might enter ni?o, NI?O"; > > my $t = Convert::Translit->new('Latin2', 'ascii'); > print $t->transliterate($str),"\n"; Doug, This is the closest to fell-swoop I have found yet. Thank you. I PPMed and downloaded Convert-Translit. Your code worked on the first try. I hope and presume the host involved is willing to install Convert-Translit also. -- Joel Fentin tel: 760-749-8863 FAX: 760-749-8864 Contact me: http://fentin.com/me/ContactMe.html Biz: http://fentin.com Personal: http://fentin.com/me/ From joel at fentin.com Wed Oct 27 12:14:38 2004 From: joel at fentin.com (Joel Fentin) Date: Wed Oct 27 12:17:34 2004 Subject: [San-Diego-pm] accents In-Reply-To: References: <417EBE5B.2060001@fentin.com> Message-ID: <417FD77E.5060502@fentin.com> Tkil wrote: > For more generic cases, what you want to find is something that will > "canonicalize" the unicode into one of two base forms (but preferably > "C" form, which uses combining marks whenever possible). Fortunately, > there is a standard Unicode::Normalize module to do this for you. > First, I have to justify it:............... Tony, From that point in your email onward, I stopped understanding. One thing seemed clear is that it didn't reek of fell-swoop. I didn't see anything cookbook-ish that I could build upon. Thank you anyhow. My goal is similar to that of a search engine. Take a word or a phrase and check it against a longer hunk of text. Yes there is a match or no there isn't. The i modifier to m// takes care of case. And it seems Convert::Translit takes care of accents. -- Joel Fentin tel: 760-749-8863 FAX: 760-749-8864 Contact me: http://fentin.com/me/ContactMe.html Biz: http://fentin.com Personal: http://fentin.com/me/ From tkil-sdpm at scrye.com Thu Oct 28 14:18:26 2004 From: tkil-sdpm at scrye.com (Tkil) Date: Thu Oct 28 14:18:35 2004 Subject: [San-Diego-pm] accents In-Reply-To: <417FD77E.5060502@fentin.com> References: <417EBE5B.2060001@fentin.com> <417FD77E.5060502@fentin.com> Message-ID: <16769.17922.779013.368331@brand.scrye.com> >>>>> "Tkil" == Tkil writes: Tkil> For more generic cases, what you want to find is something that Tkil> will "canonicalize" the unicode into one of two base forms (but Tkil> preferably "C" form, which uses combining marks whenever Tkil> possible). Fortunately, there is a standard Unicode::Normalize Tkil> module to do this for you. First, I have to justify Tkil> it:............... >>>>> "Joel" == Joel Fentin writes: Joel> From that point in your email onward, I stopped understanding. I suspect that you actually stopped reading, or stopped trying to understand. I was trying to explain *why* it was a problem in the first place. The fact that your mail arrived butchered was a great example of why it's a problem. But if you don't understand encodings, then you're going to lose. Joel> One thing seemed clear is that it didn't reek of fell-swoop. I Joel> didn't see anything cookbook-ish that I could build upon. Thank Joel> you anyhow. I was trying to explain the problem; you wanted an instant solution, which is not what I was providing. Put another way: I was trying to teach you how to fish. You were looking for a fish handout. Joel> My goal is similar to that of a search engine. Take a word or a Joel> phrase and check it against a longer hunk of text. Yes there is Joel> a match or no there isn't. The i modifier to m// takes care of Joel> case. And it seems Convert::Translit takes care of accents. Glad that your current problem is solved. Consider the following situations, though: 1. In Spanish, "ll" and "ch" are sometimes treated as "one character" (e.g. for collating purposes). 2. In German, there is a single lower-case character (ess-zet, the one that looks like a beta)... but in capital letters, it's written "SS". What searches should work here? And your comment of "there is a match or there isn't" is itself vague. You have to more carefully specify what makes a match and what doesn't. You might know -- but we don't, so we're left to guess. I guess I'm just venting some frustration that you are asking for a solution, but seem uninterested in learning about the basics that would help you form your own solution. t. From chris_radcliff at mac.com Thu Oct 28 14:53:30 2004 From: chris_radcliff at mac.com (Chris Radcliff) Date: Thu Oct 28 14:53:40 2004 Subject: [San-Diego-pm] accents In-Reply-To: <16769.17922.779013.368331@brand.scrye.com> References: <417EBE5B.2060001@fentin.com> <417FD77E.5060502@fentin.com> <16769.17922.779013.368331@brand.scrye.com> Message-ID: <0AF4E39C-291B-11D9-8B31-00039301A6E2@mac.com> Hi everyone, Just my $0.02: This list prides itself on having a good attitude toward questions; we're all champions of the idea that anyone can ask anything and receive a civil answer. Specifically, we avoid the kind of dismissive or hurtful comment that one would find when asking a newbie or "obvious" question on other lists. It's time to extend that courtesy to answers, as well. Tkil took the time and effort to lay out a detailed, thoughtful approach to Joel's problem, and it was dismissed rather rudely. If it was me, I'd think twice before answering any plea in the future, and that's probably not where we want this group to head. Cheers, ~chris >>>>>> "Tkil" == Tkil writes: > > Tkil> For more generic cases, what you want to find is something that > Tkil> will "canonicalize" the unicode into one of two base forms (but > Tkil> preferably "C" form, which uses combining marks whenever > Tkil> possible). Fortunately, there is a standard Unicode::Normalize > Tkil> module to do this for you. First, I have to justify > Tkil> it:............... > >>>>>> "Joel" == Joel Fentin writes: > > Joel> From that point in your email onward, I stopped understanding. From joel at fentin.com Thu Oct 28 16:19:34 2004 From: joel at fentin.com (Joel Fentin) Date: Thu Oct 28 16:19:50 2004 Subject: [San-Diego-pm] accents In-Reply-To: <0AF4E39C-291B-11D9-8B31-00039301A6E2@mac.com> References: <417EBE5B.2060001@fentin.com> <417FD77E.5060502@fentin.com> <16769.17922.779013.368331@brand.scrye.com> <0AF4E39C-291B-11D9-8B31-00039301A6E2@mac.com> Message-ID: <41816266.8030903@fentin.com> Chris Radcliff wrote: > Hi everyone, > > Just my $0.02: This list prides itself on having a good attitude toward > questions; we're all champions of the idea that anyone can ask anything > and receive a civil answer. Specifically, we avoid the kind of > dismissive or hurtful comment that one would find when asking a newbie > or "obvious" question on other lists. > > It's time to extend that courtesy to answers, as well. Tkil took the > time and effort to lay out a detailed, thoughtful approach to Joel's > problem, and it was dismissed rather rudely. If it was me, I'd think > twice before answering any plea in the future, and that's probably not > where we want this group to head. > > Cheers, > ~chris I don't intend rudeness and I certainly don't intend to do anything to discourage people from answering my questions. I am very thankful for the answers I get. ================================== Tkil wrote: >>>>>>"Joel" == Joel Fentin writes: > > > Joel> From that point in your email onward, I stopped understanding. > > I suspect that you actually stopped reading, or stopped trying to > understand. To some extent, that is true. It seemed overwhelming. ================================== ITEMS: 1. I do not ask a question of this list unless I have been struggling with an issue at least two hours. Usually much more. I try experiments with my code, I look in my books, and on the web. 2. When I am stuck, there several other questions that arise: A. Do I invent my way out of the problem, or find out if there is already a mainstream way of solving this? B. How much time do I spend educating myself on a specialty just to see if I can even use that specialty? C. Do I stop all forward motion on my project and in essence learn a new specialty? D. If I stop my project, and learn the new specialty, will I ever use what I learn again? If not, I won't REALLY learn it. Not without daily doses of it. 3. I am constantly learning new things. Sometimes my projects corner me into learning them. Sometimes I pick an area of interest to me. And sometimes I just glaze over. 4. We all have different ways of learning and of keeping our interest. Mine tends to start with the application example and to learn the underlying rules from it. Others do this In reverse. ================================== My issue revolved around 12 conversions: a = ? A = ? e = ? E = ? i = ? I = ? o = ? O = ? u = ? U = ? n = ? N = ? 1. Since I am not dealing with Portuguese or German, umlauts and left leaning accents are not my issue. 2. The ideal solution would have been a modifier that does to accents what the i modifier does to case. It doesn't seem to exist. 3. One solution someone sent me was to loop through a hash 12 (or 6) times making substitutions. As simple as that sounds, I was quite slow to grasp the concept. 4. Another suggested solution was very fell-swoop if the host will install a Perl module. (They don't always.) 5. And one "solution" involved learning about ISO-8859-1, CP1252, Unicode, Unicode::Normalize qw( NFD reorder ), and more. I confess, I felt that if I went down that road, I would be a long time returning. And perhaps returning with very little. It seemed overwhelming. ================================= Much of what I am saying is that my own personal psychology comes into play here. -- Joel Fentin tel: 760-749-8863 FAX: 760-749-8864 Email me: http://fentin.com/me/ContactMe.html Biz Website: http://fentin.com Personal Website: http://fentin.com/me