[Toulouse-pm] OSCON, tutorials

Tue Jul 8 15:49:07 CDT 2003

Salut,

Bon, comme promis (et non sans mal, le reseau sans-fil est plante et la
conf manque serieusement de prises electriques dans les salles, donc en
general mon portable ne tiens pas jusqu'a la fin des sessions et je tape
la fin d'apres mes notes, grrr... pour YAPC vous tracasser pas les gars,
ca pourra pas etre pire!)

  Making Programs Faster
    MJD

    Of course this morning the tutorial I had signed up for,
    Advanced DBI, is cancelled, my next choice, XSLT is
    sold-out... I decide sit in MJD's "Making Programs Faster".
    MJD is always an entertaining speaker. Plus I am terrible at
    optimizing, the archetypal premature optimizer one might say.

    MJD shows the Schwartzian transform.

    He then shows a simple case (from a post on the newsgroup)
    where it really doesn't make much sense to use it: "lc" is
    fast, you can use it directly in a sort, the overhead of the
    ST is not worth it in this case, the ST version is slower
    than the naive one.

    Now he gives us some examples of I/O bound, CPU bound and
    memory bound code and adds some hints about how to optimize
    them: parallelize I/O bound code (or switch from CGI to
    mod_perl), optimize the code for CPU bound code, try reducing
    the memory used or buy more memory for memory bound code. The
    important is to figure out which factor impacts your code the
    most.

   tools for optimizing
   Timing
    The shell "time" is the easiest way to quickly figure out how
    much usr/CPU time a process takes. There are often 2 versions
    of "time" on a system: the built-in and one in
    "/usr/bin/time" or equivalent, with different output.

    In Perl "time()" can be used, and "use Time::Hires;" to get
    "time()" to work with a resolution better than 1s.

    The usual way he writes benchmark: an empty loop, then the
    various options, so he has a base value to compare the
    different options to.

    He does not use "Benchmark.pm": the results are different
    from his simple benchmarks, for some unknown reason, they are
    not consistent if he repeats the test, he even gets results
    where the test takes... a negative time to complete! Plus the
    machinery for a benchmark should be a simple as possible, and
    you should be able to understand it.

   Profiling
    Profile your code before optimizing it, or you will probably
    optimized the wrong functions.

    Use "Devel::DProf":

      perl -d:DProf toto.pl <args> > /dev/null
      dprofpp

    Send the output to "/dev/null" so output related problems
    don't interfere

    This will give you a list of functions, and how much time the
    process spent in each one.

    The usual rule here is 90-10: 10% of the code accounts for
    90% of the run time. So focus on the 10%!

    "Devel::Smallprof" gives you an even more detailed report,
    line by line.

   Examples
   Generic advice
    A very important piece of advice: TEST THE OPTIMIZED VERSION!
    Make sure you don't break things in the process: *You
    probably don't live in a World where you want to get the
    wrong answer as fast as possible*

    Also think about the big picture, is the decrease in
    maintainability worth it? Always remember that hardware is
    cheap. We still live under the impression that hardware is
    expensive and precious. This is no longer true.

    MJD has a cute cat! And nice jokes about him that he somehow
    manage to tie to the subject at hand.

    Perception is important too: sometimes removing warning
    messages that say "this will take a long time" actually makes
    the users happier and stop them complaining, as in fact they
    don't notice the "long time"!

    *Optimize for the common case*.

   Speeding up a mailbox analyzer
    - run and save the output (so we can check the optimized
    version)

    - profile

    - look at the most used functions and see what can be done

    In this case he found out that "Mail::Header" was taking a
    lot of time. So he replaced it by custom code. This is risky
    as mail protocols, (like most protocols IMHO) are hard to get
    right.

    As it turned out, in this case, when checking the output of
    the program had changed... for the best! It turns out there
    was a minor bug in "Mail::Header" that the simpler optimized
    code fixed!

    So the optimized version is better, and 81% faster!

    Yipee!

    A look at the profiler results, followed by a detailed
    economic analyzis shows that he can stop here. That's
    actually a very interesting analyzis: if he spends 20 minutes
    optimizing the next function, and actually get it to improve
    its speed by 20%, he will need to run the code 25 million
    times to get a positive return on investment! I should do
    this more often, that would save me a lot of time.

   Speeding up "pod2man"
    This is useful because it is run quite often, any time you
    install a new module for example.

    - run and save the output (so we can check the optimized
    version)

    - profile

    Here he finds out that he can optimize a POD tokenizer, by
    making it faster in the common case ( "I<text>") at the
    expense of the less common case "I<< text >>" (man is it a
    pain to type POD exemples... in POD as I am doing right now!
    I even have to use the "Z<>" escape for the first time!)

    To get a better idea of what's going on he needs more
    detailed output than provided by "Devel::DProf"

    "Devel::Smallprof" gives too much output, so he now teaches
    us how to write our own profiling module, using the hooks
    available for "Devel::" modules: "@{"::_<toto.pl"}",
    %DB::sub, "DB::DB()" and "caller()".

    It is not that difficult actually!

   CGI (SOAP) application
    A real-life example: speeding up a CGI application that was
    accessed in burst (several hundred times per minute for a
    while, then nothing for a long time). It received XML data,
    parsed it, updated a DB, then signaled succes or failure.

    The solution: as the succes depended only on whether the XML
    parsed or not, just parse the XML, return success/error and
    save it to a file. A batch process then reads the files and
    updates the DB, at its own pace. The client gets a return
    much faster, and the DB gets updated

    Plus he got rid of a couple of modules that were used. He
    doesn't like modules that do very simple stuff... but with an
    OO interface (here the object was a string and the method
    used was 2 lines of code).

   Blunders
    Now some examples of failed attempts at optimization.

    He starts with pseudo-hashes. That's too easy! ;--) It's a
    quite well-known story

    He actually gives a detailed, and interesting, explanation of
    how they work and why they make slower than regular hashes.

    Then an exemple from the newsgroup where someone replaced an
    "eval "$string"" by an "eval { $code }". This made the
    code_much_ faster! That would be because "eval { $code }"
    does not actually eval the code in $code.

    Beware of benchmarks! Check that the different versions
    return the same result first, and then benchmark.

    Then he shows how the old trick of pre-allocating arrays was
    found not to speed up execution. A benchmark showed that it
    actually slowed things down.

    Darn! I thought I had guessed why the benchmark was wrong but
    no!

    It's a classic : Perl (or the OS) already optimizes things,
    uses caches, pre-allocate arrays based on previous calls to a
    function... so your optimization might be redundant, and thus
    counter-productive.

    He finishes with a similar exemple from "Tie::File", where he
    write a real fancy caching algorithm f
    Darn! I thought I had guessed why the benchmark was wrong but
    no!

    It's a classic : Perl (or the OS) already optimizes things,
    uses caches, pre-allocate arrays based on previous calls to a
    function... so your optimization might be redundant, and thus
    counter-productive.

    He finishes with a similar exemple from "Tie::File", where he
    write a real fancy caching algorithm for lines in the file...
    that turned out not to be used in practice.

    The next example is from a thread on Perlmonks... with tons
    of silly advice on how to optimize a (rather simple)
    numerical problem ([id://134419]). His morals: don't
    micro-optimize, and *There is plenty of crappy optimization
    advice*.

    He finishes with some general advice that boils down to
    "THINK before you start optimizing!" (then think some more).

    Basically optimizing is rarely worth it!

   Conclusion
    Overall a good tutorial stressing the dangers and potential
    pitfalls of optimization. As most good tutorials I liked the
    fact that it showed how to go about optimizing code, from
    non-optimized to the final version, through analyzis,
    refinements, mistakes and knowing when to stop. I find that
    this is the most important thing you can get from such a
    class: seeing how the instructor mind works.or lines in the file...
    that turned out not to be used in practice.

    The next example is from a thread on Perlmonks... with tons
    of silly advice on how to optimize a (rather simple)
    numerical problem ([id://134419]). His morals: don't
    micro-optimize, and *There is plenty of crappy optimization
    advice*.

    He finishes with some general advice that boils down to
    "THINK before you start optimizing!" (then think some more).

    Basically optimizing is rarely worth it!

  Conclusion
    Overall a good tutorial stressing the dangers and potential
    pitfalls of optimization. As most good tutorials I liked the
    fact that it showed how to go about optimizing code, from
    non-optimized to the final version, through analyzis,
    refinements, mistakes and knowing when to stop. I find that
    this is the most important thing you can get from such a
    class: seeing how the instructor mind works.

 Efficient SQL
    by Greg Sabino Mullane

    The tutorial will focus on PostgreSQL and how to make DB
    applications faster (do you see a trend in the tutorials I
    attend?)

    PostgreSQL 7.4

    *SQL is usually the weakest link in the chain*: when someone
    (usually not you) comes and tells you that your application
    is too slow, the SQL code is usually where you can optimize.

    How to speed things up? 6 ways: hardware, OS, DBMS,
    Application, DB design and Query tuning.

    Hardware and OS
        RAM is the most important, fast disks can be useful too.

    DBMS
        By default some of the settings for PostgreSQL are set
        way too low (in order to run out-of-the-box in low-end
        machines), "sort_mem", "shared_buffers" for example, see
        section 3.4.2 of the admin guide.

    Application
        Not that important unless there is a major flaw in the
        code.

        He advises to keep the data as objects in Perl, with the
        SQL in a separate module, isolated from the main code.
        This way optimizing the SQL code is easier.

        Try to leave as much as possible on the DB side, it will
        be faster and will make it easier to enforce constraints
        on the DB.

    DB Design
        He thinks that normalization is really important, and
        doesn't necessarily impact speed. *That's what a Data
        Base does: JOINTs*. Column order can impact the speed of
        the DB.

    Query Tuning
        That's what the tutorial is all about!

    When thinking about optimizing, first figure out whether the
    problem is that all queries are slow (in which case you'd
    better look at the previous items) or whether some specific
    queries are too slow, in which case you can start working on
    them.

    He now describes how a SQL query is parsed, optimized and
    executed by the DBMS.

    "EXPLAIN" is used to show how the query will be parsed and
    executed (without running it), the number of estimated rows
    returned by each step and the estimated cost of each one.

    "EXPLAIN ANALYZE" also runs the query and provide the actual
    time spent in each step, number of rows returned by each
    step.

    At this point the tutorial became quite boring for a while as
    he listed the various operators and their cost.

   Indexes
    Now we see how "ANALYZE" generates statistical data on the db
    (in the "pg_stat" table). Frm there we can figure out which
    columns should be indexed: anytime a slow operation
    (typically a sequence scan) shows up we can add an index. The
    results are as spectacular, as expected: from 27s to 5.5ms in
    the example shown.

    PostgreSQL can build indexes on functions, eg on
    "lc(column)", to avoid having to recompute the function for
    each row.

    Using a "WHERE" clause to build partial indexes that can be a
    lot more efficient than full indexes. For exemple "NULL"
    values can be excluded from the index. The results of
    "ANALYZE" are important to figure out if it is worth to
    narrow down the index.

    "CLUSTER" can also be used to by moving the data physically
    on the disks, which increases the access speed.

   Miscellaneous tidbits
    *   previous to PostgreSQL 7.4 "EXISTS" should be favoured
        over "IN",

    *   when doing "UPDATE"s, use "VACUUM", as "UPDATE"does a
        "DELETE" and then "INSERT",

    *   do not use "max(toto)" and "min(tata)" use "ORDER BY toto
        DESC LIMIT 1" and "ORDER BY tata ASC LIMIT 1"
        respectively

    *   The "ctid", which is the physical address of a record (a
        page/index in page) is the quickest way to access a
        record

   Conclusion
        Overall the tutorial was very deep and thourough, showing
        the process used to analyze and optimize SQL queries with
        PostgreSQL. It was also thouroughly boring at times: I
        found it hard to get excited by the fight to stave off a
        couple of milliseconds from a query (how to get from
        6.07ms to 0.76ms in 5 painful^Heasy steps ;--)

Voili!

Michel Rodriguez
Perl &amp; XML
http://www.xmltwig.com