<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">

<HTML>

<HEAD>

  <META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">

  <META NAME="GENERATOR" CONTENT="GtkHTML/3.18.3">

</HEAD>

<BODY>

On Tue, 2008-10-28 at 01:54 -0400, Donnie Cameron wrote:<BR>

<BLOCKQUOTE TYPE=CITE>

    David,<BR>

    <BR>

    The split function is not going to make things any faster. In fact, without resorting to the use of another language, I can't think of a faster way of doing it than you have suggested. Even if you were to split on something like a quote followed by a space (/&quot; /) and then reattach the quote to the end of each resulting element (work that is vastly simpler than regex matching), the process would end up being slower than regex matching because the regex maching happens in machine language and the more efficient work happens in Perl. I'm convinced also that even if you were to use the index function, your Perl code would still be slower than the regex-based solution you described.<BR>

    <BR>

    In the past, I have tried a number of tricks to try to beat simple regex matching for this type of work and I've seldom been able to beat the regex matching. (When I write &quot;this type of work&quot;, I am of course excluding regular Apache-like log files and other files that are designed to be easy and fast to parse. I'm talking about more thoughtless file designs, such as the one you described.) <BR>

    <BR>

    You could roll out your own C extension, but that's just ridiculous because the hardware to process the slower and more general Perl regex would be less expensive than your time. <BR>

    <BR>

    I don't know how you timed the split function, but I suspect that it was much faster because its regex was probably much simpler. If you try the split function with a more complicated regex, I'm sure you'll find that split isn't so fast any more. <BR>

    <BR>

    You do need the /g at the end, of course.<BR>

    <BR>

    --Donnie<BR>

    <BR>

</BLOCKQUOTE>

Thanks Donnie.&nbsp; I can live with that.&nbsp; :)<BR>

<BR>

David

<BR>

<BR>

</BODY>

</HTML>