Regular expressions is a powerful tool, but they quickly become too long to be readable. Some people use //x modifier. I prefer split into many smaller regular expressions, for example:
my $re_num = qr/.../; my $re_quoted = qr/.../; my $re_value = qr/$re_num|$re_quoted/;
It works just fine and usually I compile them in package space beforehead and then use in functions with //o:
my $re_foo = ...; sub foo { ... if ( /^$re_foo/o ) { ... } ... }
Doesn't matter what exactly you do, the question is how much speed do you loose if you need these REs to be dynamic. I've decided to make a simple test to understang which one is faster:
use Benchmark qw(cmpthese); my $count = -60; my $re = qr/\d+/; my $re_pre = qr/^\d+$/; cmpthese($count, { static => sub { return "123456789" =~ /^\d+$/ }, o => sub { return "123456789" =~ /^$re$/o }, no_o => sub { return "123456789" =~ /^$re$/ }, no_o_pre => sub { return "123456789" =~ $re_pre }, }); cmpthese($count, { static => sub { return "123456w789" =~ /^\d+$/ }, o => sub { return "123456w789" =~ /^$re$/o }, no_o => sub { return "123456w789" =~ /^$re$/ }, no_o_pre => sub { return "123456w789" =~ $re_pre }, });
Just compare four different variants: just plain old static regexp, regexp in a variable with some additions, the same with //o and finally another RE with all additions and use it without any quotes. Here are results:
Rate no_o no_o_pre o static no_o 851115/s -- -30% -41% -47% no_o_pre 1222940/s 44% -- -15% -24% o 1443941/s 70% 18% -- -11% static 1613818/s 90% 32% 12% -- Rate no_o no_o_pre o static no_o 923012/s -- -33% -37% -46% no_o_pre 1376153/s 49% -- -6% -19% o 1471770/s 59% 7% -- -14% static 1705241/s 85% 24% 16% --
Results are consistent with my hopes. I'll try to describe them, but can not say I do know everything about this. In 'no_o' case perl have to compile regular expression each time you run the code. Time spent in compilation is enough to give up 40% to next variant. 'o' and 'no_o_pre' are very close and I expected something like that. In 'o' case perl have to compile once at runtime and each time check cache. In 'no_o_pre' perl have to check each time that thing on the right hand is an RE object. It's probably possible to make //o case very close to static by rebuilding op_tree, however that will disappoint some deparse modules. Static case is the fastest and it's understandable.
Should you use this? Yes. All the time? No. For example if you write a parser for apache log, not simple one, but parser that takes log file format strings and builds regular expressions for this particular format. In this case I would think twice about design and the way REs are used.
Hi, I have a few perl programming jobs i would like to advertise on your site. PLease contact me at crose@enticelabs.com
ReplyDelete