[Purdue-pm] Efficiency, was: my 2015-08-12 challenge problem solution

Mon Aug 10 08:35:21 PDT 2015

Rick Westerman <westerman at purdue.edu> wrote on 2015-08-10 at 08:20:54 -04:
> Thanks for pointing out the subtleties of using split() without the -1.
> For my example I was assuming actual data in the 3rd field but certainly
> in non-example real-life code you would want to take care of the edge
> case either before or during the split().

"" or "0" can be actual data or introduced unexpectedly on projects I
work on.  I learned about the weird but documented behaviour of split
when debugging code if I remember right.

> Your rewrite of #1 into the more simple #4 and the rewrite of #2 into
> the one line #5 are both valid constructs.  However they are not
> radically different in function from the original construct.  What I was
> trying get at, and still am, is that programmers should think about what
> their code does instead of just writing any old set of statements and
> seeing what works.

#1: my ($one) = reverse grep { $_ } split ':', $data ;

#4: my ($four) = reverse split ':', $data ;

>From the grep function documentation
at http://perldoc.perl.org/functions/grep.html
    Evaluates the BLOCK or EXPR for each element of LIST (locally setting $_
    to each element) and returns the list value consisting of those elements
    for which the expression evaluated to true.

The grep in #1 is gratuitous and just takes extra time if all elements
in $data evaluate to true.  If $data is "a:b:0" then $one is set to $b
which would be wrong for software I write.  I don't do fancy logic
in colon separated lists---too complicated.

My guess is #4 runs faster because grep is not run.  But there is no way
to tell for sure without testing it.

#4 works better than #1, is simpler, has 2/3 the number of function
calls, and probably runs faster---I call that radically different in
function.  But like I said in my previous message, I liked your #3 the
best.  It is clear to other humans what you're trying to do and doesn't
depend on weird details about how split works.

> In #1 — reverse grep split (which is code I have seen in a real working
> program) — the program has to (a) make an array, (b) work on each
> element in the array, (c) make another array, (d) extract the scalar.

In #4 - reverse split - the problematic and relatively time consuming
grep (see above) doesn't have to run at all.

> In #2 — last index of split — the program has to (a) make an array, (b)
> extract the scalar.
> 
> In #3 — position of element then substr — the program has to (a) walk
> through the original string, (b) extract the scalar.
> 
> Which construct is faster?  If we know what CPU-effort it takes to
> create arrays or walk through strings (and, as programmers, we should)
> do we even have to spend a second thinking about the question?

There is no way to be sure what will run faster without measuring it.
Modern software can do lots of weird stuff internally.  I once read
about a computer manufacturer---I forget which one---that optimized
programs that didn't do any I/O down to a noop to run benchmarks faster.

But, when writing each statement I try to make a good guess for what
will be understandable to humans and also run well.  And avoid code that
is overly complex.  And ask myself if each step of the code is needed so
I don't do a gratuitous grep if one is not needed.  I sometimes like
to string several functions together on a line to finish a thought
instead of artificially breaking the line into multiple lines.
Some people don't like that but it works well for me, especially in
Perl 6 that lets you put function calls in left to right order.

On a larger scale I try to make code understandable to a human and also
to the computer.  And as you suggest, think about order-N issues.
When I start a program the first thing I think of is what the overall
structure of the program will be and if will satisfy the run time
constraints it has.

> > As far as optimization goes, "first get it working, then make it
> > faster---if needed”.
> 
> I still think that statement is a crutch for sloppy programming.  Sort
> of akin to “code first, comment later.”  Yeah, it can be done but it
> isn’t good practice.  IMO it takes no more effort to think about the
> underlying data structures and data manipulation and choose the path of
> close to minimal (not least) CPU time than to not think about what the
> computer is doing.  This thought process should be second nature to any
> seasoned programmer just as adding explanatory comments to ones’ code —
> especially for regexps, IMO -- should be second nature as one is
> programming.
> 
> Certainly one wants to avoid optimization until the code is complete.
> Just as one should avoid full documentation — user guide, etc. — until
> the code is complete.  Doing either early will just be a waste of time.

I like to do statement level optimization of code along with making the
code understandable to other people when I type it in.  It is faster to
do development that way.  If I'm writing code to demonstrate an idea
sometimes I don't optimize the code---it can mmke it harder to
see what I'm trying to demonstrate.

> People keep quoting Knuth’s statement “… premature optimization is the
> root of all evil…” without quoting the first part of that statement “…
> we should forget about small efficiencies, say about 97% of the time …”
> 
> The keyword is “small”.   Knuth does not give us a license to be sloppy.

And here's same more context if anyone is interested.
>From https://en.wikiquote.org/wiki/Donald_Knuth
    Programmers waste enormous amounts of time thinking about, or worrying
    about, the speed of noncritical parts of their programs, and these
    attempts at efficiency actually have a strong negative impact when
    debugging and maintenance are considered. We should forget about small
    efficiencies, say about 97% of the time: premature optimization is the
    root of all evil. Yet we should not pass up our opportunities in that
    critical 3%.
See the web page for the original citation.  This quote is from the 1970s
according to the citation.

(Now, at least for me, because I run so many different programs, the easiest
and cheapest way to get better software performance, is get a new computer.
The 32 GB, Intel Core i7 computer I use now has much better performance
that the 28 KB, Digital Equipment Corporation PDP 11V03 I used many years
ago.  In practice, software performance has been noticably better each
time a computer I use has been upgraded.)

-mark