Enigmatic Perl

Enigmatic: of, relating to or resembling an enigma
Enigma: something puzzling, perplexing, inexplicable

Introduction

Native Perl has always been an idiomatic language. Some people embrace the idioms; others eschew them. Some people change their opinions the more they learn about Perl. Most people sit somewhere in the middle, happy to work with some idioms but finding others obfuscatory and counter-intuitive. If you decide to use an idiom you should probably understand exactly what it is doing. And if you decide not to use an idiom you'll probably want to understand it anyway so that you can rip it out of that awful code you are having to maintain.

Here we will take a look at some common and not so common Perl idioms, what they really do and why they work.

$|++

Of $| perlvar says:

  If set to nonzero, forces a flush right away and after every write or print
  on the currently selected output channel.

So when people want to see their output straight away for some reason or another they will often set this variable to be nonzero.

  $| = 1;

is common. But so is

  $|++;

Since $| defaults to zero this would seem to do the same thing. And even if this statement gets executed more than once the value will still be nonzero which is also OK for us. Well, unless the variable wraps around and gets back to zero, I suppose. But Perl variables don't normally do that unless you are running with use integer. We would normally expect the variable to be promoted to a floating point representation and its value to keep on increasing until such time as incrementing it left its value indistinguishable from the previous value.

But even if we can safely increment $| and not have to worry about it wrapping around, what if some joker had previously decided to force flushing by setting $| to -1, which is a perfectly valid nonzero value?

  $| = -1;

Or by decrementing $| instead of incrementing?

  $|--;

That's going to mess us up, right? The docs don't say much about all of this, so maybe some experimentation is in order. Maybe we should see if $| does wrap around. Let's assume a 32 bit machine here (because that's what I'm currently using), and so let's increment $| 2**32 times. If it wraps around we should get back to zero.

  $ perl -le '$|++ for 1 .. 2**32; print $|'
  Range iterator outside integer range at -e line 1.

Oops.

Let's be a little less ambitious.

  $ perl -le '$|++ for 1 .. 2**31; print $|'
  Range iterator outside integer range at -e line 1.

OK, so let's start simply.

  $ perl -le '$|++ for 1 .. 5; print $|'
  1

WHAT?

Is that code wrong?

  $ perl -le '$x++ for 1 .. 5; print $x'
  5

No, it seems OK. So something strange is going on here. Let's try something a little different:

  $ perl -le 'print $|++ for 1 .. 5'
  0
  1
  1
  1
  1

Weird. So we don't seem to be able to increment $| over one. Maybe we can set it directly?

  $ perl -le '$| = 2; print $|'
  1

Seems not. So it looks as though we don't have to worry about any kind of wraparound. Perl seems to be stopping us from shooting ourselves in the feet.

But what about that joker who explicitly sets $| to -1 ?

  $ perl -le '$| = -1; print $|'
  1

He's been neutered too. And the joker decrementing $| instead of incrementing?

  $ perl -le 'print $|-- for 1 .. 5'
  0
  1
  0
  1
  0

Well, it seems that he has discovered a little trick beloved of the Perl golfing community, and unless this behaviour is what he is looking for, his program will only work providing the decrement is only executed once.

So it seems that $|++ will work and is functionally equivalent to $| = 1 in all cases. So what is going on here? Well, it turns out that $| isn't really a variable, at least not in the sense that $x is. It's actually just a bit in an IO structure related to the currently selected output channel, and perl exposes it as a special variable through the use of magic.

If you never wanted to know that, and don't want to explain it to anyone who asks why you used $|++, then you should probably use $| = 1 instead, which seems to be what most people do nowadays.

select((select($h), $| = 1)[0])

Having learnt all about $|, and having decided to use $| = 1 (or maybe not), you now need to make sure that you are setting it for the correct channel. $| works on the currently selected filehandle, so if you want to operate on another channel you will first need to select the appropriate filehandle, then set $|, then reselect your original filehandle.

The select function comes in two flavours. We are interested in the first one here (as documented in perlfunc); we'll get to the second one a little later. In its first form select sets the current default filehandle for output and returns the previously selected filehandle. (This isn't completely accurate, see perlfunc for the full details.)

So code to do what we are looking for might look something like:

  my $oldh = select($h);
  $| = 1;
  select($oldh);

But you don't really want $oldh hanging around, so maybe enclose the code in its own scope.

  {
      my $oldh = select($h);
      $| = 1;
      select($oldh);
  }

That's a bit of a mouthful, so you might sometimes see it all on one line:

  { my $oldh = select($h); $| = 1; select($oldh); }

But it can also be done without using a temporary variable:

  select((select($h), $| = 1)[0])

What is going on here? To decipher the code we need to work from the inside out. First, let's look at (select($h), $| = 1).

The parentheses around this code tell us that it is creating a list, and our list has two elements. The first element is select($h), that is it is the result of setting the current filehandle to $h, that result being the previously selected filehandle. The second element of the list is $| = 1, that is the result of setting $| to one. That result is one, of course.

So now we have a list of two elements, the first being the original current filehandle and the second being one, and while we have created that list we have changed the current filehandle to $h and set $| to one on that filehandle. That's half of what we wanted to do. Now we just need to restore the original filehandle. This can be done by calling select on the first element of our list.

We access the first element of the list as (select($h), $| = 1)[0], and call select on it as select((select($h), $| = 1)[0]).

You might notice that this idiom only works if perl creates the list from left to right. Although perl has always done this, and lots of code would break if it didn't, this behaviour is now guaranteed in the documentation and so should remain. (Actually the docs are in bleadperl, so it's possible they might never see a stable release.)

If you are heavily into OO, or you don't mind loading the extra 33KB of code (or maybe you have already paid that price) then you might also prefer the OO solution which probably wins in terms of clarity.

  use IO::Handle;
  $h->autoflush(1);

while (<>)

Everyone knows what while (<>) means, right? Well maybe not. Let's have a look at what B::Deparse makes of it.

  $ perl -MO=Deparse -e 'while (<>) {}'
  while (defined($_ = <ARGV>)) {
      ();
  }
  -e syntax OK

Interesting. Quite a few things have changed there, indicating that that perl has used a few defaults that we haven't specified. In particular, our <> has changed to defined($_ = <ARGV>).

perlvar explains that <ARGV> is pretty much the same as <>. So that's alright then. Then we have the explicit assignment to $_. I suppose that most people knew that that was happening too. But what about the defined stuff?

Well, before perl-5.005 that defined wouldn't have been there. (Actually, before perl-5.005 there was no B::Deparse, but that's not the point here.) But it was noted that many (most?) people omitted the defined and that this was rarely, if ever, the correct thing to do. It lead to problems when reading a file which contained a line comprising a single 0, for example, which would cause the loop to exit. There were also similar problems reading a directory which contained a file named 0. So now perl helps you to do the right thing, and if you wanted that old behaviour you'll need to explicitly code for it.

$count = () = /.../g

Sometimes a function returns a list, and you don't really care what is in the list, only how many elements are in the list. How do you find that out? Let's take /.../g as an example. The ... there is really perl-6's yadda yadda yadda operator, but in this case it actually works fine as a regular expression too. So let's call /.../g when $_ contains "1234567890". We want to determine how many times the regular expression matches. (This is the wrong way to determine a third of the length of a string, but it will do for our example.)

One method is to read the result into a temporary array and then get the size of that array.

  my @tmp = /.../g;
  my $count = @tmp;

That gives the correct answer, three, but it's kludgey and wasteful. Surely Perl can do better than that? How about just getting rid of the temporary array?

  my $count = /.../g;

Oops. That gives the answer one. What has gone wrong? The big difference here is that we have called /.../g in scalar context instead of list context, and as perlop tells us, in scalar context this will return true or false, depending on whether there were any matches. In our case there were matches, so we got back one, which is a true value.

So we need to be sure we call the operator in list context. Easy!

  my ($count) = /.../g;

Wrong! The value returned here is 123, the first match, not the count. We have a problem in that the operator doesn't actually return the count at all, only a true or false value in scalar context or the list of matches in list context.

But perl has a trick up its sleeve. perldata says:

  List assignment in scalar context returns the number of elements produced
  by the expression on the right side of the assignment:

This is what allows us to write

  while (my ($key, $val) = each %hash) { ... }

and have the loop stop when the hash entries are exhausted. So we are looking for list assignment in scalar context. The scalar context part is easy, we'll just say my $count = ... . Now we just need to assign to a list. One simple way is to assign to our temporary array again.

  my $count = my @tmp = /.../g;

That works and we have our answer, three, but it hasn't got rid of the temporary array. We can assign to a list to save assigning to an array. But what should the list look like?

  my $count = (my $dummy) = /.../g;

That works too, but isn't really very pleasing, exchanging one temporary variable for another. Perl has another trick to help us here. Instead of creating dummy variables we can assign to undef.

  my $count = (undef) = /.../g;

Still working, and looking better. But that undef still looks ugly there. Do we even need it? Can we assign to an empty list?

  my $count = () = /.../g;

Yes we can! And here is our solution. Some people will go one step further and change that to

  my $count =()= /.../g;

Making =()= into a pseudo-operator, complete with a colourful name. Don't go there.

local $/

There aren't too many times when you should be using local in your programs, and whenever you are you should know why you are using it instead of my. One time when you should be using local is when you want to temporarily change one of the special variables. (The word temporarily is a big clue here.)

perlvar explains that $/ is the input record separator. That means that when you read from a filehandle in standard line based mode, $/ will determine what constitutes a line. Normally it will be set to whatever line ending is usual for your system but you can alter that. The value is a string, not a regular expression ("awk has to be better for something") and there are a couple of magic values. Setting it to undef reads the rest of the file (slurp mode), and setting it to "" sets paragraph mode where one or more blank lines are used as the delimiter.

local $/ temporarily sets the value of $/, but to what? Not to its previous value, as some people expect, but to undef. So local $/ temporarily sets slurp mode.

But there's more! $/ can also be used to read in records. Most common operating systems don't have this feature, but this mode can still be used to read in a set amount of data as a line. For example, to read lines of 16 characters each, set $/ = \16. The details are to be found in perlvar.

for ($val) { }

Most people know that for and foreach are synonyms. Many people like to use foreach to iterate over a list, but of course, for will do just as well for that. So what about a list with only one element? What's the point in iterating over that?

A side effect of iterating over a list is that $_ becomes an alias to each element of the list. Changing $_ changes the actual list element. So if you want to perform a number of operations on a variable which can be performed on $_, for ($val) { } is an easy way to set up $_. It also has the advantage of clearly showing the scope of the work, and of restoring $_ to its previous value at the end of that scope.

  $_ = 1;
  my $x = "xyz";
  for ($x)
  {
      s/x/a/;
      s/y/--/;
      s/z/length/e;
  }
  print "$_:$x";

This prints out 1:a--4.

local $^I = ""

Perl's command line options are great. Using them appropriately can save you many lines of code and can make the difference between a one-liner and a short script. The -i option in particular can be extremely useful to ensure you have a backup of your files in case your carefully crafted code splats all over them.

In fact, it is so useful that sometimes you want to use it even though you are not writing a one-liner. But how can you do that? Let's assume we want to do the equivalent of:

  $ perl -pi.bak -e 's/Pb/Au/' *

One option is to chuck the line above into a system call. That would work, and we could probably even call it pragmatic, but we might not want to hold it up as a shining example of Perl mastery.

We need to be able to specify two things that would normally be found on the command line; the backup extension and the files on which we will operate. Code to do that might look something like:

  ...
  {
      local $^I   = ".bak";
      local @ARGV = glob "*";
      while (<>)
      {
          s/Pb/Au/;
          print;
      }
  }
  ...

Or you might prefer something a little shorter:

  ...
  {
      local ($^I, @ARGV) = (".bak", glob "*");
      s/Pb/Au/, print while <>;
  }
  ...

select undef, undef, undef, 0.5;

Sometimes your program just needs to wait for a bit. In the olden days you might put an empty loop into your code. The equivalent of:

  for (1 .. 1_000_000) { }

But of course this was never particularly accurate. Did you ever try to play a game written for a 286 on a Pentium? Fortunately, perl has a sleep function which will cause your program to wait for a certain number of seconds:

  sleep(3);

But what if you want to sleep for half a second? sleep only works with an integral number of seconds, and you would be ill advised to rely on the duration being too accurate. So you should turn to CPAN and specifically to Time::HiRes, which will provide an interface to time and timers to the accuracy provided by your system.

But Time::HiRes didn't always exist. You might also see an (ab)use of select. Earlier we mentioned that there are two flavours of select in Perl. We looked at one, which was used to set the default output filehandle. Now we can investigate the second a little.

This second form of select is a layer over the select(2) system call which is used to monitor file descriptors and wait until one of them becomes available for some type of IO. We don't need to go into detail about what happens exactly, or how the call is used. What is important to us is that call waits with an accuracy of milliseconds, and we can use this to effect our half second sleep with a call of:

  select(undef, undef, undef, 0.5);

Since we are not actually worried about any events on any file descriptors the first three arguments are undef. The fourth argument specifies the wait period in seconds.

Conclusion

If you hang around Perl for long enough you'll see all sorts of code.

Some people can make Perl look like C. Or more likely Java, nowadays. Some people can make Perl look like bourne shell. A few people can make Perl look like OCaml or Haskell. It's really not that easy to make Perl look like Prolog.

Some people can make Perl look like a mess.

And some people can make Perl look beautiful.