[Purdue-pm] Efficiency, was: my 2015-08-12 challenge problem solution

Mon Aug 10 05:20:54 PDT 2015

Mark.  

Thanks for pointing out the subtleties of using split() without the -1.  For my example I was assuming actual data in the 3rd field but certainly in non-example real-life code you would want to take care of the edge case either before or during the split().

Your rewrite of #1 into the more simple #4 and the rewrite of #2 into the one line #5 are both valid constructs.  However they are not radically different in function from the original construct.   What I was trying get at, and still am, is that programmers should think about what their code does instead of just writing any old set of statements and seeing what works.

In #1 — reverse grep split (which is code I have seen in a real working program) — the program has to (a) make an array, (b) work on each element in the array, (c) make another array, (d) extract the scalar.  

In #2 — last index of split — the program has to (a) make an array, (b) extract the scalar.

In #3 — position of element then substr — the program has to (a) walk through the original string, (b) extract the scalar.

Which construct is faster?  If we know what CPU-effort it takes to create arrays or walk through strings (and, as programmers, we should)  do we even have to spend a second thinking about the question?  

> As far as optimization goes, "first get it working, then make it
> faster---if needed”.

I still think that statement is a crutch for sloppy programming.  Sort of akin to “code first, comment later.”  Yeah, it can be done but it isn’t good practice.  IMO it takes no more effort to think about the underlying data structures and data manipulation and choose the path of close to minimal (not least) CPU time than to not think about what the computer is doing.   This thought process should be second nature to any seasoned programmer just as adding explanatory comments to ones’ code — especially for regexps, IMO -- should be second nature as one is programming.

Certainly one wants to avoid optimization until the code is complete.  Just as one should avoid full documentation — user guide, etc. — until the code is complete.    Doing either early will just be a waste of time.

People keep quoting Knuth’s statement “… premature optimization is the root of all evil…” without quoting the first part of that statement “… we should forget about small efficiencies, say about 97% of the time …”

The keyword is “small”.   Knuth does not give us a license to be sloppy.

--
Rick Westerman
westerman at purdue.edu

> On Aug 9, 2015, at 8:54 PM, Mark Senn <mark at ecn.purdue.edu> wrote:
> 
> Rick Westerman <westerman at purdue.edu> suggested:
> |  my $data = 'aaa:b:cccc' ;
> |  
> |  my ($one) = reverse grep { $_ } split ':', $data ;
> |  
> |  my @twoarr = split ':', $data ;
> |  my $two = $twoarr[-1] ;
> |  
> |  my $pos   = rindex $data , ':' ;
> |  my $three = substr($data, $pos + 1) ;
> 
> 
> EXECUTIVE SUMMARY
> 
> Use the code for $three.
> The split function is subtle and it's easy to make a mistake.
> As far as optimization goes, "first get it working, then make it
> faster---if needed".
> 
> 
> DETAILS
> 
> I am assuming that you want all characters after the second ":" in $data.
> Call this the third field.
> 
> The code for $one doesn't work if the third field is "" or "0".
> 
> The code for $two doesn't work if the third field is "" unless
> a split with -1 is done.
> 
> I added the code for $four, which is simpler than $one,
> and works if the third field is "" or "0", if a split with -1 is done.
> 
> I added the code for $five which I find more understandable than $two---it
> must do a split with -1.
> 
> 
> RUN THIS CODE AND STUDY THE OUTPUT
> 
> for my $data ('aaa:b:', 'aaa:b:0', 'aaa:b:cccc')
> {
>    print "data          ($data)\n";
> 
>    my ($one) = reverse grep { $_ } split ':', $data ;
>    my @twoarr = split ':', $data ;
>    my $two = $twoarr[-1] ;
>    my $pos   = rindex $data , ':' ;
>    my $three = substr($data, $pos + 1) ;
>    my ($four) = reverse split ':', $data ;
>    my $five = (split ':', $data)[-1];
>    print "split         ($one)  ($two)  ($three)  ($four)  ($five)\n";
> 
>    ($one) = reverse grep { $_ } split ':', $data, -1 ;
>    @twoarr = split ':', $data, -1 ;
>    $two = $twoarr[-1] ;
>    $pos   = rindex $data , ':' ;
>    $three = substr($data, $pos + 1) ;
>    ($four) = reverse split ':', $data, -1 ;
>    $five = (split ':', $data, -1)[-1];
>    print "split with -1 ($one)  ($two)  ($three)  ($four)  ($five)\n";
> 
>    print "----------------------------------------\n";
> }
> 
> # -mark