Regular expressions is a powerful tool, but they quickly become too long to be readable. Some people use //x modifier. I prefer split into many smaller regular expressions, for example:
my $re_num = qr/.../;
my $re_quoted = qr/.../;
my $re_value = qr/$re_num|$re_quoted/;
It works just fine and usually I compile them in package space beforehead and then use in functions with //o:
my $re_foo = ...;
sub foo {
...
if ( /^$re_foo/o ) {
...
}
...
}
Doesn't matter what exactly you do, the question is how much speed do you loose if you need these REs to be dynamic. I've decided to make a simple test to understang which one is faster:
use Benchmark qw(cmpthese);
my $count = -60;
my $re = qr/\d+/;
my $re_pre = qr/^\d+$/;
cmpthese($count, {
static => sub { return "123456789" =~ /^\d+$/ },
o => sub { return "123456789" =~ /^$re$/o },
no_o => sub { return "123456789" =~ /^$re$/ },
no_o_pre => sub { return "123456789" =~ $re_pre },
});
cmpthese($count, {
static => sub { return "123456w789" =~ /^\d+$/ },
o => sub { return "123456w789" =~ /^$re$/o },
no_o => sub { return "123456w789" =~ /^$re$/ },
no_o_pre => sub { return "123456w789" =~ $re_pre },
});
Just compare four different variants: just plain old static regexp, regexp in a variable with some additions, the same with //o and finally another RE with all additions and use it without any quotes. Here are results:
Rate no_o no_o_pre o static
no_o 851115/s -- -30% -41% -47%
no_o_pre 1222940/s 44% -- -15% -24%
o 1443941/s 70% 18% -- -11%
static 1613818/s 90% 32% 12% --
Rate no_o no_o_pre o static
no_o 923012/s -- -33% -37% -46%
no_o_pre 1376153/s 49% -- -6% -19%
o 1471770/s 59% 7% -- -14%
static 1705241/s 85% 24% 16% --
Results are consistent with my hopes. I'll try to describe them, but can not say I do know everything about this. In 'no_o' case perl have to compile regular expression each time you run the code. Time spent in compilation is enough to give up 40% to next variant. 'o' and 'no_o_pre' are very close and I expected something like that. In 'o' case perl have to compile once at runtime and each time check cache. In 'no_o_pre' perl have to check each time that thing on the right hand is an RE object. It's probably possible to make //o case very close to static by rebuilding op_tree, however that will disappoint some deparse modules. Static case is the fastest and it's understandable.
Should you use this? Yes. All the time? No. For example if you write a parser for apache log, not simple one, but parser that takes log file format strings and builds regular expressions for this particular format. In this case I would think twice about design and the way REs are used.
Hi, I have a few perl programming jobs i would like to advertise on your site. PLease contact me at crose@enticelabs.com
ReplyDelete