Showing posts with label perl. Show all posts
Showing posts with label perl. Show all posts

Friday, September 06, 2013

Some details on perl's `my` internals

Recently I've been asked why using my when you unwrap arguments is slower than unwrap them into variables outside the function:

    sub foo { my $bar = shift; ... } # slower
    # vs.
    my $bar;
    sub foo { $bar = shift; ... } # faster

Also, on other hand the following shows opposite behaviour:

    sub foo { my ($bar, $baz) = (@_); ... } # faster
    # vs.
    my ($bar, $baz);
    sub foo { ($bar, $baz) = (@_); ... } # slower

Short answer

A function has its scope and my declares variable with lexical scope, so when scope ends perl has to take additional action to close the scope.

In second case the same rule applies and second code may win, but list assignment without my on left hand side (LHS) and @_ on right hand side (RHS) has to deal with possibility that both left and right sides share the same variable, like ($foo, $bar) = ($bar, $foo). So it has to copy arguments on RHS first and this penaly is larger than the win.

Long answer and how to draw conclusions on your own

Let's look at the following code

    my $bar;
    sub foo { $bar = shift; ... } # faster

It has problem of not being recursive ready. You can not call foo from inside foo as you use "global" variable to store argument. More correct code would be:

    our $bar;
    sub foo { local $bar = shift; ... }

It sure gets slower than any of two variants.

How to see difference

You can see difference without looking into perl source code, just look at optree of your code with B::Concise module:

    $ perl -MO=Concise,foo -E 'my $aa; sub foo { $aa = shift; undef }'
    ...
    4        <2> sassign vKS/2 ->5
    2           <0> shift s* ->3
    3           <0> padsv[$aa:FAKE:] sRM* ->4
    ...

I left out other lines to highlight assignment code. Compare above to:

    $ perl -MO=Concise,foo -E 'sub foo { my $aa = shift; undef}'
    ...
    4        <2> sassign vKS/2 ->5
    2           <0> shift s* ->3
    3           <0> padsv[$aa:47,48] sRM*/LVINTRO ->4
    ...

As you can see primary difference is in LVINTRO flag on padsv operation. pad* operations fetch a variable declared with 'my' from special lists for such variables (called PADs or PADLISTs). It's way harder to figure out what LVINTRO means without looking into source code, but in most cases it means that operation should be "localized". Whatever localization means for the operation.

How to find it in the perl code

Let's look at the code of padsv. You can find code by looking for pp_padsv in pp_*.c files:

    $ ack pp_padsv -A 30 pp_*.c
    pp_hot.c
    392:PP(pp_padsv)
    393-{
    ...
    406-    if (op->op_flags & OPf_MOD) {
    407-        if (op->op_private & OPpLVAL_INTRO)
    408-            if (!(op->op_private & OPpPAD_STATE))
    409-                save_clearsv(padentry);
    ...

Once again highlighted relevant code that says: if padsv is in lvalue context, localizing and it's not state declarator then save (not safe) our target variable from the PAD to be cleared when scope ends.

Excercise for readers

Now you have all the tools to figure out differences in other mentioned situations, here commands to get you there quickier, so you don't have excuse to skip:

    perl -MO=Concise,foo -E 'sub foo { local $aa = shift; undef}'

    perl -MO=Concise,foo -E 'my ($aa, $bb); sub foo { ($aa, $bb) = (@_); undef }'

    perl -MO=Concise,foo -E 'sub foo { my ($aa, $bb) = (@_); undef }'

    perl -MO=Concise,foo -E 'my ($aa, $bb); sub foo { ($aa, $bb) = (shift, shift); undef }'

A comment with explanation of list assignment would be nice :)

Thursday, February 21, 2013

My play-perl and perl5 core quests

If you don't know about play-perl then check this out. I've decided how I'm going to use this site.

Participating in Perl5 core development

I would love to participate more in perl5 core development. From time to time I read perl5-porters mailing list, sometimes participate in discussions. However, nobody likes people who says "A" and then disappears for a week or more. The only thing I can do is code tiny things on my own schedule. I don't want to run an idea by p5p list and block somebody from implementing it. Often I don't know if an idea is implementable or not, how big would impact, how hard it would be to maintain thing... So there is a big gap between idea and contacting p5p list for review and assessment.

My play perl

Here comes play-perl to the scene. I'm going to push my ideas as quests with link to this blog post and some common wording in the comment. The goals are:

  • sharing ideas very early
  • discussion at early stage
  • assessing importance with likes
  • hooking in contributors

You are free to take over a quest

At the moment play-perl is not very multiplayer, but there are plans. For now to coordinate collaboration on a quest you can do the following besides usual discussion:

  • ask for status update
  • ask for more details
  • claim a quest for a day - a week

Let's see where it would take us. Looking forward to experience.

Thursday, April 19, 2012

Compiling perl for debugging with gdb

If you ever looked inside of perl source then you know code uses tons of C preprocessor macroses. GDB has a nice feature to expand them for you. However, you have to compile program to contain information about macroses and it took me a while to figure it out how to do it.

Short answer

    ./Configure -des -Dprefix=/Users/ruz/perl/blead -Dusedevel \
        -Doptimize='-O3 -g3' -DEBUGGING \
        -Dusemymalloc -Dusethreads -Dinc_version_list=none

Explanation

I find INSTALL very misleading, so filed a bug report and may be it will be explained in docs or Configure script will be improved.

Anyway, here is explanation. -DEBUGGING=-g3 doesn't work, it always puts -g flag. It's not clear from the doc that -Doptimize is different thing and you can put -g3 there. There is more to this, but to be short if -DDEBUGGING is set and -g* is in -Doptimize then things are not adjusted and you get what you want. However, setting -Doptimize overrides default optimization flags and you have to repeat them.

See also

Dumping perl data structures in GDB

Thursday, April 05, 2012

Lovely week connecting LDAP client on RHEL to Windows AD over TLS/SSL

A client contacted us to enable SSL connection in Request Tracker's integration with LDAP. It was epic battle.

We had

First of all, Windows Server 2008 R2 with Active Directory, but it has an "add-on" installed that hardens security. On other side RHEL 6, Apache running RT with mod_perl and serving it via HTTPS.

Problem

Everything works just fine as long as you don't use secure connection to LDAP. If start_tls is enabled or you try to connect via ldaps:// then AD drops connection after first packet and client throws "connection reset by peer" error.

Investigation

From perl modules we moved down to `openssl s_client` command and got the same results. AD admins failed to provide any helpful information, so we captured ldp.exe session with wireshark. No surprise microsoft client worked just fine. Comparison of handshake packets ldp and s_client send showed that MS's LDAP client announce EDHCE group of cipher suits, openssl doesn't. Openssl project implemented this family of ciphers based on elliptic curves, but for whatever reason RHEL ships openssl without them.

Solution

So we compiled the latest openssl and installed it in its own location. Compiled Net::SSLeay against new openssl and installed it into custom location. Client was using Apache with mod_perl, but we had to switch to FastCGI. This is required as apache uses openssl library for HTTPS and we don't want perl and apache to load different versions of the library into the same process. mod_fcgid is not available in RHEL, but you can get it from EPEL repository. Simple patch to RT to put special library paths into @INC.

Hope it would help somebody.

Tuesday, April 03, 2012

Performance regression in perl with precompiled regexps

Some time ago I wrote about preparing regexpes earlier for cases where you can not use /o modifier. For example you have:

    my $re = qr{\d+};

And often you need to match:

    $str =~ /^$re$/;

You can not use /o modifier if $re may change. In the blog post I've described precompiling and showed that it works as fast as regexp with /o modifier:

    my $re = qr{...};
    my $re_prepared = /^$re$/;
    $str =~ $re_prepared;

Guess my surprize when I executed the same script with perl 5.14.1. Any method is at least 2 times faster than prepared, even ... =~ /^$re$/!

Results for 5.8.9:

    $ perl5.8.9 dynamic_regexp_with_o_effect.pl
                         Rate      no_o         o no_o_pre no_o_pre_braces    static
    no_o            1581903/s        --      -10%     -25%            -32%      -50%
    o               1764430/s       12%        --     -16%            -24%      -44%
    no_o_pre        2104366/s       33%       19%       --             -9%      -34%
    no_o_pre_braces 2323549/s       47%       32%      10%              --      -27%
    static          3174754/s      101%       80%      51%             37%        --

Results for 5.10.0 and newer:

    $ perl5.10.0 dynamic_regexp_with_o_effect.pl
                         Rate no_o_pre no_o_pre_braces      no_o         o    static
    no_o_pre         873813/s       --             -0%      -52%      -68%      -75%
    no_o_pre_braces  877714/s       0%              --      -52%      -68%      -75%
    no_o            1816840/s     108%            107%        --      -34%      -49%
    o               2759410/s     216%            214%       52%        --      -22%
    static          3542486/s     305%            304%       95%       28%        --

Monday, May 17, 2010

Краткое интервью для DevConf

Как тебе идея DevConf?

Идея отличная. Все современные языки обладают интересными технологиями. Постараюсь посетить как можно больше не-Perl-овых докладов. Уже распечатал расписание. "Пришлось" округлить свой доклад и еще над одним, из той же колонки, поставить вопрос, а так все в других секциях.

Для меня это возможность познакомиться с интересными людьми. Также увидится и пообщаться в офлайне с теми, с кем до этого общался только в онлайне.

О чем твой доклад на конференции?

Мой доклад в основном о Open Source бизнесе и только потом уже о Perl. Наша компания смогла сделать свой открытый продукт основным источником дохода. После спонтанного рассказа о нашей компании на YAPC::Russia, мне сказали, что это интересно и предложили сделать доклад.

Я расскажу о нашем опыте и попробую показать, что закрытые компании могут заработать деньги на том, что им сейчас денег не приносит. Я не смогу рассказать про особый секретный игридиент, который принесет вам баснословную прибыль сразу после открытия кода, потому что "Секретного ингридиента не существует. Чтобы сделать что-то особенное, надо просто поверить, что это что-то особенное." (с)

Лапаша как лапша. Приходите поку^Wпослушать.

На кого ориентирован твой доклад?

На всех тех, кто не знает, где бесплатные программы берут деньги. На тех, кто в свобоное время пишет открытые программы для себя и думает о заработке. На тех, кто работает в закрытых компаниях и интересуется откртыми разработками. На тех, кто просто интересуется.

Monday, April 05, 2010

A challenge for a XS hacker

Recently we at Best Practical was discussing penalties of callbacks in RT. Callbacks are proved to be very good way to extend the web UI. We call a function from component, function checks for registered customizations for this particular place and calls them, so custom code can either change arguments or inject something into result stream.

Number of places where callbacks are called gets bigger and bigger. And an idea came to me. What if we can say to the caller to stop calling us. I believe it's possible to do.

For example a function gets called and at some point we understand that all the time for a live of the process we are going to escape or return the same value. In this case we call function that finds caller's optree and rebuilds it by injecting a constant. So it's runtime consting of function's result in a particular place.

I hope that experienced XS hackers may be interested, as there are plenty of users of this on the CPAN. Class::Trigger comes to mind.

Anyone?

Monday, March 01, 2010

Мастер-класс Request Tracker в Москве, Май 2010

Уникальная возможность лучше познакомиться с системой Request Tracker и узнать то, что Вы еще не знаете. В Москве, в мае пройдет мастер-класс. Это возможности из первых рук, от одного из разработчиков системы, получить ответы на интересующие Вас вопросы. Участие в мастер-классе позволит повысить свою квалификацию и успешно справляться с задачами администрирования и расширения RT под нужды вашей компании.

Заполни анкету прямо сейчас. 15го Марта будет известна стоимость (еще есть возможность повлиять на эту цифру), точные даты и место проведения. Все подробности на странице о мероприятии.

Wednesday, November 25, 2009

New features on rt.cpan.org

We implemented and deployed two new features on rt.cpan.org.

Subject Tags

People with many distributionss will love it, as it allows youto add a custom string into subject of emails per distribution. Subject of emails will be something like [rt.cpan.org YourTokenHere #123].

We decided to leave rt.cpan.org there, but I'm pretty sure therewill be people who will try to abuse the feature. Be smart, don'tuse two short and too long tags, remember that reporters also recieveemails.

Maintainers list with links

It's just UI sugar, each maintainer in the list is now wrapped intoa link that lead to all modules this author maintains.

Sunday, October 25, 2009

Faster composite regular expressions

Regular expressions is a powerful tool, but they quickly become too long to be readable. Some people use //x modifier. I prefer split into many smaller regular expressions, for example:

    my $re_num = qr/.../;
    my $re_quoted = qr/.../;
    my $re_value = qr/$re_num|$re_quoted/;

It works just fine and usually I compile them in package space beforehead and then use in functions with //o:

    my $re_foo = ...;
    sub foo {
        ...
        if ( /^$re_foo/o ) {
            ...
        }
        ...
    }

Doesn't matter what exactly you do, the question is how much speed do you loose if you need these REs to be dynamic. I've decided to make a simple test to understang which one is faster:

    use Benchmark qw(cmpthese);
    my $count = -60;

    my $re = qr/\d+/;
    my $re_pre = qr/^\d+$/;

    cmpthese($count, {
        static => sub { return "123456789" =~ /^\d+$/ },
        o => sub { return "123456789" =~ /^$re$/o },
        no_o => sub { return "123456789" =~ /^$re$/ },
        no_o_pre => sub { return "123456789" =~ $re_pre },
    });

    cmpthese($count, {
        static => sub { return "123456w789" =~ /^\d+$/ },
        o => sub { return "123456w789" =~ /^$re$/o },
        no_o => sub { return "123456w789" =~ /^$re$/ },
        no_o_pre => sub { return "123456w789" =~ $re_pre },
    });

Just compare four different variants: just plain old static regexp, regexp in a variable with some additions, the same with //o and finally another RE with all additions and use it without any quotes. Here are results:

                  Rate     no_o no_o_pre        o   static
    no_o      851115/s       --     -30%     -41%     -47%
    no_o_pre 1222940/s      44%       --     -15%     -24%
    o        1443941/s      70%      18%       --     -11%
    static   1613818/s      90%      32%      12%       --
                  Rate     no_o no_o_pre        o   static
    no_o      923012/s       --     -33%     -37%     -46%
    no_o_pre 1376153/s      49%       --      -6%     -19%
    o        1471770/s      59%       7%       --     -14%
    static   1705241/s      85%      24%      16%       --

Results are consistent with my hopes. I'll try to describe them, but can not say I do know everything about this. In 'no_o' case perl have to compile regular expression each time you run the code. Time spent in compilation is enough to give up 40% to next variant. 'o' and 'no_o_pre' are very close and I expected something like that. In 'o' case perl have to compile once at runtime and each time check cache. In 'no_o_pre' perl have to check each time that thing on the right hand is an RE object. It's probably possible to make //o case very close to static by rebuilding op_tree, however that will disappoint some deparse modules. Static case is the fastest and it's understandable.

Should you use this? Yes. All the time? No. For example if you write a parser for apache log, not simple one, but parser that takes log file format strings and builds regular expressions for this particular format. In this case I would think twice about design and the way REs are used.

Wednesday, October 07, 2009

Easy thing, but useful, strange that nobody implemented it earlier

This post is about Perl, Mason, memory leaks and hunting them easily in objects oriented applications based on these technologies.

It's not a secret that you can cause a memory leak by introducing a cycle with references. It often happens in tree structures when parent holds references on all its children and each child references its parent.

Perl has references weakening that helps avoid most of problems or you can ask people to call a method to destroy structure. Developers who post modules on the CPAN usually aware of the solutions and cover this. However, it can be done differently and it's easy to overlook in the doc.

For a long time I was using different modules to catch leaks, for example Devel::Leak::Object. It's really a useful module, but I used with custom patches for better diagnosis.

Recently had to look into leaks once again and started to wonder how to find a leak that is not reproducible on my machine, but a customer see it and can not say which request cause it. Looked at the CPAN again. Found Devel::LeakGuard::Object, new reincarnation of Devel::Leak::Object with additional ways to instrument reporting.

It was very easy to write a simple memory leaks tracer for mason based applications as a mason plugin. At this moment it helped me identify three small memory leaks in Request Tracker software just by enabling this new module in my devolpment environment. Leaks just poped up in logs during testing of things.

I hope that MasonX::LeakGuard::Object can help other people as well.

Saturday, September 12, 2009

Improving usability for people and code reuse

For my long going tisql project I wrote Parse::Boolean module a while ago. Recently found a new application for it and it worked pretty well.

In RT we have scrips - condition, action and a template. When something happens with a ticket, a change checked against conditions of scrips. Only those actions are applied for which conditions returned a true value. Pretty simple, but condition is just a a code and we want code to be re-usable.

Conditions are implemented as a modules and controlled by an argument, usually a parsable string. Strings work good for people. Lots of RT admins can not write a code for conditions and it's stupid to ask them to wrap a code they can not write into a module.

We have 'User Defined' condition module for such cases, where you can write code right in the UI. It helps, but anyway. Here goes another problem. If you have code in a module that nobody except you can write then this module should be help for everybody, but it's not. Often people want to mix complex things with simple conditions and either you have to extend format of argument in the condition or invent a new thing.

I decided to "invent".

There was nothing to invent actually, but connect technologies together and that's what I did. Sometimes I think our work is to connect things together. Recall I said that User Defined allow you to type a condition using Perl right in the UI. Ok. What if we replace perl with custom syntax. Parse::BooleanLogic provides a good parser for pretty random things inside nested parentheses and joined with boolean operators 'AND' and 'OR'. Almost any condition in RT falls into this category and looks like the following even in perl:

    return 1 if ( this OR that OR (that AND that) ) AND else...
    return 0;

In RT we have TicketSQL using which you can search tickets and use the following simple SQL like conditions:

    x = 10 OR y LIKE 'string' OR z IS NULL

I decided that in condition it will work pretty well too. In condition we have the current ticket we check and the change (transaction).

    ( Type = 'Create' AND Ticket.Status = 'resolved' )
    OR ( Type = 'Set' AND Field = 'Status' AND NewValue = 'resolved' )

Looks good and every user can get the syntax, but it's not there yet.

We have modules already that implement conditions and want to reuse them. Pretty easy to solve:

    ModuleName{'argument'} OR !AnotherModule{'argument'}

I'm really proud that my parser allows me to parse syntax like this without much work:

    sub ParseCode {
        my $self = shift;

        my $code = $self->ScripObj->CustomIsApplicableCode;

        my @errors = ();
        my $res = $parser->as_array(
            $code,
            error_cb => sub { push @errors, $_[0]; },
            operand_cb => sub {
                my $op = shift;
                if ( $op =~ /^(!?)($re_exec_module)(?:{$re_module_argument})?$/o ) {
                    return {
                        module => $2,
                        negative => $1,
                        argument => $parser->dq($3),
                    };
                }
                elsif ( $op =~ /^($re_field)\s+($re_bin_op)\s+($re_value)$/o ) {
                    return { op => $2, lhs => $1, rhs => $3 };
                }
                elsif ( $op =~ /^($re_field)\s+($re_un_op)$/o ) {
                    return { op => $2, lhs => $1 };
                }
                else {
                    push @errors, "'$op' is not a check 'Complex' condition knows about";
                    return undef;
                }
            },
        );
        return @errors? (undef, @errors) : ($res);
    }

It's not only parser, but solver as well:

    my $solver = sub {
        my $cond = shift;
        my $self = $_[0];
        if ( $cond->{'op'} ) {
            return $self->OpHandler($cond->{'op'})->(
                $self->GetField( $cond->{'lhs'}, @_ ),
                $self->GetValue( $cond->{'rhs'}, @_ )
            );
        }
        elsif ( $cond->{'module'} ) {
            my $module = 'RT::Condition::'. $cond->{'module'};
            eval "require $module;1" || die "Require of $module failed.\n$@\n";
            my $obj = $module->new (
                TransactionObj => $_[1],
                TicketObj      => $_[2],
                Argument       => $cond->{'argument'},
                CurrentUser    => $RT::SystemUser,
            );
            return $obj->IsApplicable;
        } else {
            die "Boo";
        }
    };

    sub Solve {
        my $self = shift;
        my $tree = shift;

        my $txn = $self->TransactionObj;
        my $ticket = $self->TicketObj;

        return $parser->solve( $tree, $solver, $self, $txn, $ticket );
    }

That's it. Eveything else is grammar regexpes, column handlers and and documentation. Available on the CPAN and on the github.

Thursday, September 03, 2009

Расширения к RT с Module::Install::RTx запакует даже ребенок

В Request Tracker есть много интсрументов для расширения функционала без patch'ей, но и держать в их в одной директории с инсталяцией не стоит. Возможно завтра вы захотите их скопировать на новый сервер или вам нужно внести изменения и от-тестировать их предварительно. Запакуйте ваши изменения в расширение, ведь сделать это элементарно.

Шарим больше памяти между процессам apache/fastcgi

Если у вас fork'ающийся apache с mod_perl или FastCGI приложение, то неплохо загружать как можно больше модулей, до fork'ов. Это позволит эффективнее использовать copy on write и сохранить память под другие нужды.

С модулями, которое вы используете напрямую, все просто. Вы знаете список и можете их перечислить в файле и загружать его до форков, но в некоторых случаях внешние модули откладывают загрузку до определенного момента. В таких случаях можно использовать простой трюк, который изначально предложил JJ и я расписал подробнее на русском в статье для пользователей Request Tracker'а.

Monday, August 24, 2009

Trying Padre on MacOS

For a while wanted to play with Padre. It's IDE writen in perl programming language and at this point its main target is perl developers. I tried it once on windows, but I don't develop on windows. For development I use perl5.8 from MacPorts on MacOS X.

First of all you find that Padre requires threaded perl and it's reasonable requirement. So I had to switch perl. I've deactivated perl5.8 and installed new one with threads using the following commands:

    port deactivate perl5.8
    port install perl5.8 +threads

Sure such things don't work well: binary incompatibility and path changes. CPAN shell died complaining about missing dzopen in Compress::Zlib. Installed manually from the CPAN. Didn't help. So I deleted all directories that may affect things:

    # get rid of old perl files, the current version is 5.8.9
    find /opt/local/lib/perl5 -name '5.8.8' | xargs sudo rm -fr
    # get rid of everything related to compression
    find /opt/local/lib/perl5 | grep 'Compress' | xargs sudo rm -fr
    # get rid of everything related to old architecture, new one is darwin-threaded-multi-2level
    find /opt/local/lib/perl5 -type d -name darwin-2level | xargs sudo rm -fr

Ok. CPAN started to work as it can use gzip and gunzip commands. Re-installed Compress::Zlib. Then I usually install CPAN::Reporter module. It slows down installation a little bit, but it helps perl community provide you better solutions. Installation is simple:

    sh> cpan
    cpan> install CPAN::Reporter
    cpan> o conf init test_report
    cpan> o conf commit

Then I started installing Padre :)

    cpan> install Padre

It takes some time, so in another console I was looking at breaks in my perl. First of all subversion-perlbings was broken and svk didn't work. So it was easy to fix by reinstalling it using "port -f upgrade subversion-perlbindings" command. Upgraded in the same way all packages matching p5-*.

Installation of Padre failed, but it doesn't mean anything. I deleted lots of files. For example Algorithm::C3 was deleted when Class::MOP is not. It's all easy to fix. Just install modules when some tests die with "module is missing" error.

At the end I had problems with File::HomeDir. It's something that affects loading Padre. Reported a bug report and decided to stop at this point.

Some conclusions. Macports are good, but not that good. For example gentoo has perl-cleaner utility that helps fix things. revdep-rebuild is another gentoo's tool that help a lot during upgrades. At the end switching from perl without threads to version with threads support is a lot of reinstallations. I'm fine with that, but don't think a lot of people are too. I don't think that building everything from scratch for Padre is going to help its acceptance. I see more benefit from helping MacPorts and other distributions solve problems switching from not-threaded perl to threaded and back. File::HomeDir needs more love. Otherwise I had no problems, but the app is not functioning at this moment. Going to try a little some day later.

Tuesday, July 28, 2009

Debugging perl programs/internals in gdb

When it comes to perl internals, print based debugging doesn't work that well. Compilation and installation are too slow and you can not place a print and quickly see output. At some point gbd should be used. In perl world we have Devel::Peek's Dump function to look behind curtain. In C world there is sv_dump.

    # threaded perl:
    (gdb) call Perl_sv_dump(my_perl, variable)
    # not threaded perl:
    (gdb) call Perl_sv_dump(variable)

Perl_ prefix is some magic, I don't care much why, but in most cases you need prefix things.

Using breakpoints is a must. Use pp_* functions to break, for example Perl_pp_entersub. Here is simple session where we stop before entering a sub and using dumper to figure out sub's name:

    > gdb ./perl
    GNU gdb 6.3.50-20050815 (Apple version gdb-962) (Sat Jul 26 08:14:40 UTC 2008)

    (gdb) break Perl_pp_entersub
    Breakpoint 1 at 0xe512c: file pp_hot.c, line 2663.

    (gdb) run -e 'sub foo { return $x } foo()'

    Breakpoint 1, Perl_pp_entersub (my_perl=0x800000) at pp_hot.c:2663
    2663        dVAR; dSP; dPOPss;
    (gdb) n
    2662    {
    (gdb) 
    2663        dVAR; dSP; dPOPss;
    (gdb) 
    2668        const bool hasargs = (PL_op->op_flags & OPf_STACKED) != 0;
    (gdb) 
    2670        if (!sv)

    (gdb) call Perl_sv_dump(my_perl, sv)
    SV = PVGV(0x8103fc) at 0x813ef0
      REFCNT = 2
      FLAGS = (MULTI,IN_PAD)
      NAME = "foo"
      NAMELEN = 3
      GvSTASH = 0x8038f0    "main"
      GP = 0x3078f0
        SV = 0x0
        REFCNT = 1
        IO = 0x0
        FORM = 0x0  
        AV = 0x0
        HV = 0x0
        CV = 0x813eb0
        CVGEN = 0x0
        LINE = 1
        FILE = "-e"
        FLAGS = 0xa
        EGV = 0x813ef0      "foo"

Quite simple, but when you start investigating internals it's very helpful.

Monday, July 27, 2009

Proper double linked list

Double linked list is well known structure. Each element refereces prev and next element in the chain:

    use strict;
    use warnings;

    package List;

    sub new {
        my $proto = shift;
        my $self = bless {@_}, ref($proto) || $proto;
    }

    sub prev {
        my $self = shift;
        if ( @_ ) {
            my $prev = $self->{'prev'} = shift;
            $prev->{'next'} = $self;
        }
        return $self->{'prev'};
    }

    sub next {
        my $self = shift;
        if ( @_ ) {
            my $next = $self->{'next'} = shift;
            $next->{'prev'} = $self;
        }
        return $self->{'next'};
    }

    package main;

    my $head = List->new(v=>1);
    $head->next( List->new(v=>3)->prev( List->new(v=>2) ) );

Clean and simple. If you experienced in perl you should know that such thing leaks memory. Each element has at least one reference all the time from neighbor, so perl's garbage collector never sees that structure can be collected. It's called refernce cycle, google for it. As well, you may know that weaken from Scalar::Util module can help you solve this:

    use Scalar::Util qw(weaken);

    sub prev {
        my $self = shift;
        if ( @_ ) {
            my $prev = $self->{'prev'} = shift;
            $prev->{'next'} = $self;
            weaken $self->{'prev'};
        }
        return $self->{'prev'};
    }

    # similar thing for next

So we weak one group of references, in our example prev links. It's a win and loose. Yes, perl frees elements before exit, no more memory leaks, it's our win. But, there is always but, you can not leave point to the first element out of the scope or otherwise some elements can be freed without your wish. For a while I thought that it's impossible to solve this problem, but recent hacking, reading on perl internalsand a question on a mailing list ding a bell. After a short discussion on #p5p irc channel with Matt Trout, solution has been found. Actually there it's all been there and Matt even has module started that may help make it all easier, but here we're going to look at guts.

DESTROY method called on destroy we all know that, but a few people know that we can prevent actual destroy by incrementing reference counter on the object. One woe - you shouldn't do it during global destruction, but there is module to check when we're called:

    use Devel::GlobalDestruction;

    sub DESTROY {
        return if in_global_destruction;
        do_something_a_little_tricky();
    }

What we can do with this? We have two links: from the current element to next and from that next back to the current. One of them is weak and on destroy we can swap them if the element that is going to be destroyed is referenced by a weak link. It's easier in code than in my words:

    sub DESTROY {
        return if in_global_destruction();

        my $self = shift;
        if ( $self->{'next'} && isweak $self->{'next'}{'prev'} ) {
            $self->{'next'}{'prev'} = $self;
            weaken $self->{'next'};
        }
        if ( $self->{'prev'} && isweak $self->{'prev'}{'next'} ) {
            $self->{'prev'}{'next'} = $self;
            weaken $self->{'prev'};
        }
    }

That's it, now you can forget about heads of the Lists, pass around any element you like. isweak is also part of Scalar::Util module. Good luck with cool lists and other linked structures. Matt is looking for help with his module. You always can find user mst on irc.perl.org to chat about this.

Thursday, July 02, 2009

Nice article on perl internals nothingmuch wrote

If you interested in perl5's internals even for a little then will find this article useful. It doesn't describe quite well described SVs, AVs, HVs and other representations of perl structures, but introduces on examples execution of a perl code.

I know a few things about internals, but author's point of view allowed me to understand better RETURN and PUSHBACK macros, stack pointer, op_tree.

It's one tiny step towards understanding how cool things, like Devel::NYTProf, work.

Enjoy reading!

Friday, June 26, 2009

Perl resource you may didn't know about

Do you know which country dominates on the CPAN? Guys in RostovOnDon.pm know that. They wrote a simple service for that. It looks nice and simple. It has one feature that may be useful - RSS feed of releases per author.

Friday, May 29, 2009

The Dual-Lived Problem

Chromatic writes about perl future. His recent post on The Dual-Lived Problem brought my attention and I had time to read it to the bottom. I'll stop on the following idea and will try to promote tools we at Best Practical develop for our needs:



First, improve the core's automated testing.
This helps everyone; it can identify changes
in the core code that affect the stability
and behavior of the standard library. It can
also identify changes in standard library
modules which do not work on important platforms.


I do believe that CHIMPS toolset can help with this. Can not saythat it's been tested for such purpose, but it used exactly inthis way by BPS. We tests each revision of our project on mostrecent revision of another project it depends on.

Hope people can try with some modules they care about and reportback issues. You should prefer developer releases I recentlyblogged about orrepositories on github.