of, relating to or resembling an enigma
something puzzling, perplexing, inexplicable
Native Perl has always been an idiomatic language. Some people embrace the idioms; others eschew them. Some people change their opinions the more they learn about Perl. Most people sit somewhere in the middle, happy to work with some idioms but finding others obfuscatory and counter-intuitive. If you decide to use an idiom you should probably understand exactly what it is doing. And if you decide not to use an idiom you'll probably want to understand it anyway so that you can rip it out of that awful code you are having to maintain.
Here we will take a look at some common and not so common Perl idioms, what they really do and why they work.
Of $|
perlvar says:
If set to nonzero, forces a flush right away and after every write or print on the currently selected output channel.
So when people want to see their output straight away for some reason or another they will often set this variable to be nonzero.
$| = 1;
is common. But so is
$|++;
Since $|
defaults to zero this would seem to do the same thing. And even
if this statement gets executed more than once the value will still be nonzero
which is also OK for us. Well, unless the variable wraps around and gets back
to zero, I suppose. But Perl variables don't normally do that unless you are
running with use integer
. We would normally expect the variable to be
promoted to a floating point representation and its value to keep on
increasing until such time as incrementing it left its value indistinguishable
from the previous value.
But even if we can safely increment $|
and not have to worry about it
wrapping around, what if some joker had previously decided to force flushing
by setting $|
to -1, which is a perfectly valid nonzero value?
$| = -1;
Or by decrementing $|
instead of incrementing?
$|--;
That's going to mess us up, right? The docs don't say much about all of this,
so maybe some experimentation is in order. Maybe we should see if $|
does
wrap around. Let's assume a 32 bit machine here (because that's what I'm
currently using), and so let's increment $|
2**32 times. If it wraps
around we should get back to zero.
$ perl -le '$|++ for 1 .. 2**32; print $|' Range iterator outside integer range at -e line 1.
Oops.
Let's be a little less ambitious.
$ perl -le '$|++ for 1 .. 2**31; print $|' Range iterator outside integer range at -e line 1.
OK, so let's start simply.
$ perl -le '$|++ for 1 .. 5; print $|' 1
WHAT?
Is that code wrong?
$ perl -le '$x++ for 1 .. 5; print $x' 5
No, it seems OK. So something strange is going on here. Let's try something a little different:
$ perl -le 'print $|++ for 1 .. 5' 0 1 1 1 1
Weird. So we don't seem to be able to increment $|
over one. Maybe we can
set it directly?
$ perl -le '$| = 2; print $|' 1
Seems not. So it looks as though we don't have to worry about any kind of wraparound. Perl seems to be stopping us from shooting ourselves in the feet.
But what about that joker who explicitly sets $|
to -1 ?
$ perl -le '$| = -1; print $|' 1
He's been neutered too. And the joker decrementing $|
instead of
incrementing?
$ perl -le 'print $|-- for 1 .. 5' 0 1 0 1 0
Well, it seems that he has discovered a little trick beloved of the Perl golfing community, and unless this behaviour is what he is looking for, his program will only work providing the decrement is only executed once.
So it seems that $|++
will work and is functionally equivalent to $| = 1
in all cases. So what is going on here? Well, it turns out that $|
isn't
really a variable, at least not in the sense that $x
is. It's actually
just a bit in an IO structure related to the currently selected output
channel, and perl exposes it as a special variable through the use of magic.
If you never wanted to know that, and don't want to explain it to anyone who
asks why you used $|++
, then you should probably use $| = 1
instead,
which seems to be what most people do nowadays.
Having learnt all about $|
, and having decided to use $| = 1
(or maybe
not), you now need to make sure that you are setting it for the correct
channel. $|
works on the currently selected filehandle, so if you want to
operate on another channel you will first need to select the appropriate
filehandle, then set $|
, then reselect your original filehandle.
The select
function comes in two flavours. We are interested in the first
one here (as documented in perlfunc); we'll get to the second one a little
later. In its first form select
sets the current default filehandle for
output and returns the previously selected filehandle. (This isn't completely
accurate, see perlfunc for the full details.)
So code to do what we are looking for might look something like:
my $oldh = select($h); $| = 1; select($oldh);
But you don't really want $oldh
hanging around, so maybe enclose the code
in its own scope.
{ my $oldh = select($h); $| = 1; select($oldh); }
That's a bit of a mouthful, so you might sometimes see it all on one line:
{ my $oldh = select($h); $| = 1; select($oldh); }
But it can also be done without using a temporary variable:
select((select($h), $| = 1)[0])
What is going on here? To decipher the code we need to work from the inside
out. First, let's look at (select($h), $| = 1)
.
The parentheses around this code tell us that it is creating a list, and our
list has two elements. The first element is select($h)
, that is it is the
result of setting the current filehandle to $h
, that result being the
previously selected filehandle. The second element of the list is $| = 1
,
that is the result of setting $|
to one. That result is one, of
course.
So now we have a list of two elements, the first being the original current
filehandle and the second being one, and while we have created that list we
have changed the current filehandle to $h
and set $|
to one on that
filehandle. That's half of what we wanted to do. Now we just need to restore
the original filehandle. This can be done by calling select
on the first
element of our list.
We access the first element of the list as (select($h), $| = 1)[0]
, and
call select
on it as select((select($h), $| = 1)[0])
.
You might notice that this idiom only works if perl creates the list from left to right. Although perl has always done this, and lots of code would break if it didn't, this behaviour is now guaranteed in the documentation and so should remain. (Actually the docs are in bleadperl, so it's possible they might never see a stable release.)
If you are heavily into OO, or you don't mind loading the extra 33KB of code (or maybe you have already paid that price) then you might also prefer the OO solution which probably wins in terms of clarity.
use IO::Handle; $h->autoflush(1);
Everyone knows what while (<>)
means, right? Well maybe not. Let's
have a look at what B::Deparse
makes of it.
$ perl -MO=Deparse -e 'while (<>) {}' while (defined($_ = <ARGV>)) { (); } -e syntax OK
Interesting. Quite a few things have changed there, indicating that that perl
has used a few defaults that we haven't specified. In particular, our
<>
has changed to defined($_ = <ARGV>)
.
perlvar explains that <ARGV>
is pretty much the same as <>
.
So that's alright then. Then we have the explicit assignment to $_
. I
suppose that most people knew that that was happening too. But what about the
defined
stuff?
Well, before perl-5.005 that defined
wouldn't have been there. (Actually,
before perl-5.005 there was no B::Deparse
, but that's not the point here.)
But it was noted that many (most?) people omitted the defined
and that this
was rarely, if ever, the correct thing to do. It lead to problems when
reading a file which contained a line comprising a single 0, for example,
which would cause the loop to exit. There were also similar problems reading
a directory which contained a file named 0. So now perl helps you to do
the right thing, and if you wanted that old behaviour you'll need to explicitly
code for it.
Sometimes a function returns a list, and you don't really care what is in the
list, only how many elements are in the list. How do you find that out?
Let's take /.../g
as an example. The ...
there is really perl-6's
yadda yadda yadda operator, but in this case it actually works fine as a
regular expression too. So let's call /.../g
when $_
contains
"1234567890"
. We want to determine how many times the regular expression
matches. (This is the wrong way to determine a third of the length of a
string, but it will do for our example.)
One method is to read the result into a temporary array and then get the size of that array.
my @tmp = /.../g; my $count = @tmp;
That gives the correct answer, three, but it's kludgey and wasteful. Surely Perl can do better than that? How about just getting rid of the temporary array?
my $count = /.../g;
Oops. That gives the answer one. What has gone wrong? The big difference
here is that we have called /.../g
in scalar context instead of list
context, and as perlop tells us, in scalar context this will return true or
false, depending on whether there were any matches. In our case there were
matches, so we got back one, which is a true value.
So we need to be sure we call the operator in list context. Easy!
my ($count) = /.../g;
Wrong! The value returned here is 123, the first match, not the count. We have a problem in that the operator doesn't actually return the count at all, only a true or false value in scalar context or the list of matches in list context.
But perl has a trick up its sleeve. perldata says:
List assignment in scalar context returns the number of elements produced by the expression on the right side of the assignment:
This is what allows us to write
while (my ($key, $val) = each %hash) { ... }
and have the loop stop when the hash entries are exhausted. So we are looking
for list assignment in scalar context. The scalar context part is easy,
we'll just say my $count = ...
. Now we just need to assign to a list.
One simple way is to assign to our temporary array again.
my $count = my @tmp = /.../g;
That works and we have our answer, three, but it hasn't got rid of the temporary array. We can assign to a list to save assigning to an array. But what should the list look like?
my $count = (my $dummy) = /.../g;
That works too, but isn't really very pleasing, exchanging one temporary variable for another. Perl has another trick to help us here. Instead of creating dummy variables we can assign to undef.
my $count = (undef) = /.../g;
Still working, and looking better. But that undef still looks ugly there. Do we even need it? Can we assign to an empty list?
my $count = () = /.../g;
Yes we can! And here is our solution. Some people will go one step further and change that to
my $count =()= /.../g;
Making =()=
into a pseudo-operator, complete with a colourful name.
Don't go there.
There aren't too many times when you should be using local
in your
programs, and whenever you are you should know why you are using it instead of
my
. One time when you should be using local
is when you want to
temporarily change one of the special variables. (The word temporarily is
a big clue here.)
perlvar explains that $/
is the input record separator. That means that
when you read from a filehandle in standard line based mode, $/
will
determine what constitutes a line. Normally it will be set to whatever line
ending is usual for your system but you can alter that. The value is a
string, not a regular expression ("awk has to be better for something") and
there are a couple of magic values. Setting it to undef reads the rest
of the file (slurp mode), and setting it to "" sets paragraph mode
where one or more blank lines are used as the delimiter.
local $/
temporarily sets the value of $/
, but to what? Not to its
previous value, as some people expect, but to undef. So local $/
temporarily sets slurp mode.
But there's more! $/
can also be used to read in records. Most common
operating systems don't have this feature, but this mode can still be used to
read in a set amount of data as a line. For example, to read lines of
16 characters each, set $/ = \16
. The details are to be found in
perlvar.
Most people know that for
and foreach
are synonyms. Many people like to
use foreach
to iterate over a list, but of course, for
will do just as
well for that. So what about a list with only one element? What's the point
in iterating over that?
A side effect of iterating over a list is that $_
becomes an alias to each
element of the list. Changing $_
changes the actual list element. So if
you want to perform a number of operations on a variable which can be
performed on $_
, for ($val) { }
is an easy way to set up $_
. It also
has the advantage of clearly showing the scope of the work, and of restoring
$_
to its previous value at the end of that scope.
$_ = 1; my $x = "xyz"; for ($x) { s/x/a/; s/y/--/; s/z/length/e; } print "$_:$x";
This prints out 1:a--4
.
Perl's command line options are great. Using them appropriately can save you many lines of code and can make the difference between a one-liner and a short script. The -i option in particular can be extremely useful to ensure you have a backup of your files in case your carefully crafted code splats all over them.
In fact, it is so useful that sometimes you want to use it even though you are not writing a one-liner. But how can you do that? Let's assume we want to do the equivalent of:
$ perl -pi.bak -e 's/Pb/Au/' *
One option is to chuck the line above into a system
call. That would work,
and we could probably even call it pragmatic, but we might not want to hold it
up as a shining example of Perl mastery.
We need to be able to specify two things that would normally be found on the command line; the backup extension and the files on which we will operate. Code to do that might look something like:
... { local $^I = ".bak"; local @ARGV = glob "*"; while (<>) { s/Pb/Au/; print; } } ...
Or you might prefer something a little shorter:
... { local ($^I, @ARGV) = (".bak", glob "*"); s/Pb/Au/, print while <>; } ...
Sometimes your program just needs to wait for a bit. In the olden days you might put an empty loop into your code. The equivalent of:
for (1 .. 1_000_000) { }
But of course this was never particularly accurate. Did you ever try to play
a game written for a 286 on a Pentium? Fortunately, perl has a sleep
function which will cause your program to wait for a certain number of
seconds:
sleep(3);
But what if you want to sleep for half a second? sleep
only works with an
integral number of seconds, and you would be ill advised to rely on the
duration being too accurate. So you should turn to CPAN and specifically to
Time::HiRes, which will provide an interface to time and timers to the
accuracy provided by your system.
But Time::HiRes didn't always exist. You might also see an (ab)use of
select
. Earlier we mentioned that there are two flavours of select
in
Perl. We looked at one, which was used to set the default output filehandle.
Now we can investigate the second a little.
This second form of select
is a layer over the select(2)
system call
which is used to monitor file descriptors and wait until one of them becomes
available for some type of IO. We don't need to go into detail about what
happens exactly, or how the call is used. What is important to us is that
call waits with an accuracy of milliseconds, and we can use this to effect our
half second sleep with a call of:
select(undef, undef, undef, 0.5);
Since we are not actually worried about any events on any file descriptors the
first three arguments are undef
. The fourth argument specifies the wait
period in seconds.
If you hang around Perl for long enough you'll see all sorts of code.
Some people can make Perl look like C. Or more likely Java, nowadays. Some people can make Perl look like bourne shell. A few people can make Perl look like OCaml or Haskell. It's really not that easy to make Perl look like Prolog.
Some people can make Perl look like a mess.
And some people can make Perl look beautiful.