Thursday, April 19, 2012

Compiling perl for debugging with gdb

If you ever looked inside of perl source then you know code uses tons of C preprocessor macroses. GDB has a nice feature to expand them for you. However, you have to compile program to contain information about macroses and it took me a while to figure it out how to do it.

Short answer

    ./Configure -des -Dprefix=/Users/ruz/perl/blead -Dusedevel \
        -Doptimize='-O3 -g3' -DEBUGGING \
        -Dusemymalloc -Dusethreads -Dinc_version_list=none

Explanation

I find INSTALL very misleading, so filed a bug report and may be it will be explained in docs or Configure script will be improved.

Anyway, here is explanation. -DEBUGGING=-g3 doesn't work, it always puts -g flag. It's not clear from the doc that -Doptimize is different thing and you can put -g3 there. There is more to this, but to be short if -DDEBUGGING is set and -g* is in -Doptimize then things are not adjusted and you get what you want. However, setting -Doptimize overrides default optimization flags and you have to repeat them.

See also

Dumping perl data structures in GDB

Monday, April 16, 2012

Improving development environment on MacOS Snow Leopard

Spent weekend improving terminal, bash and vim configs and learning some new tricks.

vim config

I was using astrail's dotvim solution. It worked good, but braids are terrible. So I was playing with git submodules to see if they can do better. They can not, but during investigation I've spotted vundle.

It was good surprise to see that astrails switched to vundles. Now, dotvim project is much better. Clone it. Edit vundels.vim to get plugins you need from github, vim-scripts or any other git repo. Run make install. In five you'll get everything installed.

shell

I don't use apple's bash, but one from gentoo prefix, but it was configured in terminal app to start after open. It works for primary user, but sudo su - still was colorless, without bash completion.

To change shell in MacOS you have to use chsh command and also full path to the shell script must be /etc/shells.

bash history

I'm used to one liners up to 500 characters long :). Always have 5-15 shell sessions running for months. My goal was to make history longer, shared acros sessions and less bloated with often commands.

Found helpful discussion on StackOverflow, but decided to stick to variant when history file is updated after every command, but session's is not refreshed from the file:

    export HISTSIZE=10000
    export HISTFILESIZE=10000
    export HISTCONTROL="ignorespace:erasedups"
    export PROMPT_COMMAND="history -a; $PROMPT_COMMAND"
    export HISTIGNORE="cd *:ls *:mplayer *"

I'm going to use history -r when I need some random command.

Colors in Terminal.app

While playing with vim config I've noticed that solarized theme doesn't look like on public screenshots. Terminal app on Snow Leopard supports only 8 colors. 256 colors only supported on Lion. I've switched to iTerm2 application and set TERM to xterm-256color.

Beside colors it has full screen mode and other nice features I'm going to use.

Friday, April 13, 2012

Fixing broken macports after lib upgrade

Many package maintaing systems upgrade a lib and leave existing applications and libs with broken dependecy on lib that doesn't exist anymore.

Macports is not an exclusion. Recent upgrade of libpng12 to libpng14 result in many broken ports. When you try to install/upgrade/run some application you get:

    dyld: Library not loaded: /opt/local/lib/libpng12.0.dylib

This was the last drop. I'm tired of looking for ports I should rebuild, so I wrote a script to find broken ports.

Usage

It's a good headstart to fix broken ports. Here is what I did to fix my system:

    ./port-broken-libs /opt/local/lib/*.dylib

Above command lists ports which have broken dependcy. Then you can pipe it to port upgrade:

    ./port-broken-libs /opt/local/lib/*.dylib \
    | xargs sudo port -pnfuc upgrade --force

A little about port's command line options:

Don't stop upgrade if some port fails. This may happen for variouse reasons.

Required part. If you omit this option then upgrade of port X may try to upgrade port Y first and that may fail.

Tells to ignore previouse attempts to build a port.

It's up to you to decide if you want to clean things after and unistall replaced ports.

Tells to force rebuild even if there is no newer version available.

It took me a few rounds to rebuild most ports. Still a few ports were broken and packager couldn't fix them, but list was much smaller. I just uninstalled them with force applied and installed once again.

After fixing all files in /opt/local/lib dir. I fixed /opt/local/{bin,sbin} as well.

That's it. Enjoy.

Thursday, April 05, 2012

Lovely week connecting LDAP client on RHEL to Windows AD over TLS/SSL

A client contacted us to enable SSL connection in Request Tracker's integration with LDAP. It was epic battle.

We had

First of all, Windows Server 2008 R2 with Active Directory, but it has an "add-on" installed that hardens security. On other side RHEL 6, Apache running RT with mod_perl and serving it via HTTPS.

Problem

Everything works just fine as long as you don't use secure connection to LDAP. If start_tls is enabled or you try to connect via ldaps:// then AD drops connection after first packet and client throws "connection reset by peer" error.

Investigation

From perl modules we moved down to `openssl s_client` command and got the same results. AD admins failed to provide any helpful information, so we captured ldp.exe session with wireshark. No surprise microsoft client worked just fine. Comparison of handshake packets ldp and s_client send showed that MS's LDAP client announce EDHCE group of cipher suits, openssl doesn't. Openssl project implemented this family of ciphers based on elliptic curves, but for whatever reason RHEL ships openssl without them.

Solution

So we compiled the latest openssl and installed it in its own location. Compiled Net::SSLeay against new openssl and installed it into custom location. Client was using Apache with mod_perl, but we had to switch to FastCGI. This is required as apache uses openssl library for HTTPS and we don't want perl and apache to load different versions of the library into the same process. mod_fcgid is not available in RHEL, but you can get it from EPEL repository. Simple patch to RT to put special library paths into @INC.

Hope it would help somebody.

Tuesday, April 03, 2012

Performance regression in perl with precompiled regexps

Some time ago I wrote about preparing regexpes earlier for cases where you can not use /o modifier. For example you have:

    my $re = qr{\d+};

And often you need to match:

    $str =~ /^$re$/;

You can not use /o modifier if $re may change. In the blog post I've described precompiling and showed that it works as fast as regexp with /o modifier:

    my $re = qr{...};
    my $re_prepared = /^$re$/;
    $str =~ $re_prepared;

Guess my surprize when I executed the same script with perl 5.14.1. Any method is at least 2 times faster than prepared, even ... =~ /^$re$/!

Results for 5.8.9:

    $ perl5.8.9 dynamic_regexp_with_o_effect.pl
                         Rate      no_o         o no_o_pre no_o_pre_braces    static
    no_o            1581903/s        --      -10%     -25%            -32%      -50%
    o               1764430/s       12%        --     -16%            -24%      -44%
    no_o_pre        2104366/s       33%       19%       --             -9%      -34%
    no_o_pre_braces 2323549/s       47%       32%      10%              --      -27%
    static          3174754/s      101%       80%      51%             37%        --

Results for 5.10.0 and newer:

    $ perl5.10.0 dynamic_regexp_with_o_effect.pl
                         Rate no_o_pre no_o_pre_braces      no_o         o    static
    no_o_pre         873813/s       --             -0%      -52%      -68%      -75%
    no_o_pre_braces  877714/s       0%              --      -52%      -68%      -75%
    no_o            1816840/s     108%            107%        --      -34%      -49%
    o               2759410/s     216%            214%       52%        --      -22%
    static          3542486/s     305%            304%       95%       28%        --

Wednesday, March 28, 2012

Impact of index only scans in mysql

Introduction

Index only scans is an access method when information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row.

It's a well known concept that is implemented in mysql, Oracle, Pg 9.2 has patches.

So I've decided to see for myself how big impact can be.

Goal

We have a big table 'CGM' with g, m and disabled columns. Users table with id and name.

We want to find all users who are member of a group X:

    SELECT u.* FROM CGM JOIN Users u ON u.id = CGM.m
    WHERE CGM.g = X

Sometimes we add 'disabled = 0' condition to the query.

I wrote a script in perl that performes profiling.

First results

First results looked like this:

                    Rate            g   g,disabled          g,m g,m,disabled
    g            13307/s           --          -1%          -3%          -3%
    g,disabled   13474/s           1%           --          -2%          -2%
    g,m          13714/s           3%           2%           --          -0%
    g,m,disabled 13714/s           3%           2%          -0%           --

Not impressive, but it's clear that there is no I/O at all and script is testing CPU intensive task.

Impressive results

When I/O comes to scene situation is completly different. To achive this I've lowered InnoDB buffer pool size to its minimum (1M)

                    Rate            g   g,disabled g,m,disabled          g,m
    g             5531/s           --          -0%         -57%         -58%
    g,disabled    5547/s           0%           --         -57%         -57%
    g,m,disabled 12799/s         131%         131%           --          -2%
    g,m          13033/s         136%         135%           2%           --

In above test we don't check disabled column and two indexes are suitable. Next test adds condition on disabled column.

                    Rate            g          g,m   g,disabled g,m,disabled
    g             5531/s           --          -0%          -5%         -57%
    g,m           5547/s           0%           --          -4%         -57%
    g,disabled    5802/s           5%           5%           --         -55%
    g,m,disabled 12800/s         131%         131%         121%           --

Memory considerations

To cover with indexes all possible access patterns on CGM table we will need two indexes: (g, m, disabled) and (m, g, disabled). The latter required when query starts from users table and fetches groups. This means that you need additional space.

Let's asume we use 4 bytes long integers for all three columns in CGM. Row size is 12 bytes. The following situations are possible:

From the list, one can assume that covering index needs 1.8 times more memory than minimal indexing. However, mysql don't need to access table, so 36MB turns into 24MB. It's 20% more than minimal.

Conclusions

As you could see covering indexes improve performance when data doesn't fit into memory. Otherwise benefit is close to zero. Longer indexes need more memory, but this additional footprint can be lowered.

Recomendations would be. Don't bother as long as data fits into memory. Create only required indexes at this point, smaller indexes would give you more room to grow until you hit memory limit. If you decided to use covering idexes then cover all possible access patterns or you will need additional space in buffer pool for table data.

That's it.