Random thoughts: performance

Showing posts with label performance. Show all posts

Tuesday, April 03, 2012

Performance regression in perl with precompiled regexps

Some time ago I wrote about preparing regexpes earlier for cases where you can not use /o modifier. For example you have:

    my $re = qr{\d+};

And often you need to match:

    $str =~ /^$re$/;

You can not use /o modifier if $re may change. In the blog post I've described precompiling and showed that it works as fast as regexp with /o modifier:

    my $re = qr{...};
    my $re_prepared = /^$re$/;
    $str =~ $re_prepared;

Guess my surprize when I executed the same script with perl 5.14.1. Any method is at least 2 times faster than prepared, even ... =~ /^$re$/!

Results for 5.8.9:

    $ perl5.8.9 dynamic_regexp_with_o_effect.pl
                         Rate      no_o         o no_o_pre no_o_pre_braces    static
    no_o            1581903/s        --      -10%     -25%            -32%      -50%
    o               1764430/s       12%        --     -16%            -24%      -44%
    no_o_pre        2104366/s       33%       19%       --             -9%      -34%
    no_o_pre_braces 2323549/s       47%       32%      10%              --      -27%
    static          3174754/s      101%       80%      51%             37%        --

Results for 5.10.0 and newer:

    $ perl5.10.0 dynamic_regexp_with_o_effect.pl
                         Rate no_o_pre no_o_pre_braces      no_o         o    static
    no_o_pre         873813/s       --             -0%      -52%      -68%      -75%
    no_o_pre_braces  877714/s       0%              --      -52%      -68%      -75%
    no_o            1816840/s     108%            107%        --      -34%      -49%
    o               2759410/s     216%            214%       52%        --      -22%
    static          3542486/s     305%            304%       95%       28%        --

Wednesday, March 28, 2012

Impact of index only scans in mysql

Introduction

Index only scans is an access method when information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row.

It's a well known concept that is implemented in mysql, Oracle, Pg 9.2 has patches.

So I've decided to see for myself how big impact can be.

Goal

We have a big table 'CGM' with g, m and disabled columns. Users table with id and name.

We want to find all users who are member of a group X:

    SELECT u.* FROM CGM JOIN Users u ON u.id = CGM.m
    WHERE CGM.g = X

Sometimes we add 'disabled = 0' condition to the query.

I wrote a script in perl that performes profiling.

First results

First results looked like this:

                    Rate            g   g,disabled          g,m g,m,disabled
    g            13307/s           --          -1%          -3%          -3%
    g,disabled   13474/s           1%           --          -2%          -2%
    g,m          13714/s           3%           2%           --          -0%
    g,m,disabled 13714/s           3%           2%          -0%           --

Not impressive, but it's clear that there is no I/O at all and script is testing CPU intensive task.

Impressive results

When I/O comes to scene situation is completly different. To achive this I've lowered InnoDB buffer pool size to its minimum (1M)

                    Rate            g   g,disabled g,m,disabled          g,m
    g             5531/s           --          -0%         -57%         -58%
    g,disabled    5547/s           0%           --         -57%         -57%
    g,m,disabled 12799/s         131%         131%           --          -2%
    g,m          13033/s         136%         135%           2%           --

In above test we don't check disabled column and two indexes are suitable. Next test adds condition on disabled column.

                    Rate            g          g,m   g,disabled g,m,disabled
    g             5531/s           --          -0%          -5%         -57%
    g,m           5547/s           0%           --          -4%         -57%
    g,disabled    5802/s           5%           5%           --         -55%
    g,m,disabled 12800/s         131%         131%         121%           --

Memory considerations

To cover with indexes all possible access patterns on CGM table we will need two indexes: (g, m, disabled) and (m, g, disabled). The latter required when query starts from users table and fetches groups. This means that you need additional space.

Let's asume we use 4 bytes long integers for all three columns in CGM. Row size is 12 bytes. The following situations are possible:

From the list, one can assume that covering index needs 1.8 times more memory than minimal indexing. However, mysql don't need to access table, so 36MB turns into 24MB. It's 20% more than minimal.

Conclusions

As you could see covering indexes improve performance when data doesn't fit into memory. Otherwise benefit is close to zero. Longer indexes need more memory, but this additional footprint can be lowered.

Recomendations would be. Don't bother as long as data fits into memory. Create only required indexes at this point, smaller indexes would give you more room to grow until you hit memory limit. If you decided to use covering idexes then cover all possible access patterns or you will need additional space in buffer pool for table data.

That's it.

Monday, April 05, 2010

A challenge for a XS hacker

Recently we at Best Practical was discussing penalties of callbacks in RT. Callbacks are proved to be very good way to extend the web UI. We call a function from component, function checks for registered customizations for this particular place and calls them, so custom code can either change arguments or inject something into result stream.

Number of places where callbacks are called gets bigger and bigger. And an idea came to me. What if we can say to the caller to stop calling us. I believe it's possible to do.

For example a function gets called and at some point we understand that all the time for a live of the process we are going to escape or return the same value. In this case we call function that finds caller's optree and rebuilds it by injecting a constant. So it's runtime consting of function's result in a particular place.

I hope that experienced XS hackers may be interested, as there are plenty of users of this on the CPAN. Class::Trigger comes to mind.

Anyone?

Sunday, October 25, 2009

Faster composite regular expressions

Regular expressions is a powerful tool, but they quickly become too long to be readable. Some people use //x modifier. I prefer split into many smaller regular expressions, for example:

    my $re_num = qr/.../;
    my $re_quoted = qr/.../;
    my $re_value = qr/$re_num|$re_quoted/;

It works just fine and usually I compile them in package space beforehead and then use in functions with //o:

    my $re_foo = ...;
    sub foo {
        ...
        if ( /^$re_foo/o ) {
            ...
        }
        ...
    }

Doesn't matter what exactly you do, the question is how much speed do you loose if you need these REs to be dynamic. I've decided to make a simple test to understang which one is faster:

    use Benchmark qw(cmpthese);
    my $count = -60;

    my $re = qr/\d+/;
    my $re_pre = qr/^\d+$/;

    cmpthese($count, {
        static => sub { return "123456789" =~ /^\d+$/ },
        o => sub { return "123456789" =~ /^$re$/o },
        no_o => sub { return "123456789" =~ /^$re$/ },
        no_o_pre => sub { return "123456789" =~ $re_pre },
    });

    cmpthese($count, {
        static => sub { return "123456w789" =~ /^\d+$/ },
        o => sub { return "123456w789" =~ /^$re$/o },
        no_o => sub { return "123456w789" =~ /^$re$/ },
        no_o_pre => sub { return "123456w789" =~ $re_pre },
    });

Just compare four different variants: just plain old static regexp, regexp in a variable with some additions, the same with //o and finally another RE with all additions and use it without any quotes. Here are results:

                  Rate     no_o no_o_pre        o   static
    no_o      851115/s       --     -30%     -41%     -47%
    no_o_pre 1222940/s      44%       --     -15%     -24%
    o        1443941/s      70%      18%       --     -11%
    static   1613818/s      90%      32%      12%       --
                  Rate     no_o no_o_pre        o   static
    no_o      923012/s       --     -33%     -37%     -46%
    no_o_pre 1376153/s      49%       --      -6%     -19%
    o        1471770/s      59%       7%       --     -14%
    static   1705241/s      85%      24%      16%       --

Results are consistent with my hopes. I'll try to describe them, but can not say I do know everything about this. In 'no_o' case perl have to compile regular expression each time you run the code. Time spent in compilation is enough to give up 40% to next variant. 'o' and 'no_o_pre' are very close and I expected something like that. In 'o' case perl have to compile once at runtime and each time check cache. In 'no_o_pre' perl have to check each time that thing on the right hand is an RE object. It's probably possible to make //o case very close to static by rebuilding op_tree, however that will disappoint some deparse modules. Static case is the fastest and it's understandable.

Should you use this? Yes. All the time? No. For example if you write a parser for apache log, not simple one, but parser that takes log file format strings and builds regular expressions for this particular format. In this case I would think twice about design and the way REs are used.