Writing better Perl


21.01.2009 | Categories: Perl


From recursion to iteration

I already talked briefly in a recent blog post about iterators, and about how they can be used to convert a recursive program to an iterative one.

Lets see a practical application: find all dependencies of a binary package, including its dependencies.

My example will be based on Ubuntu (and to keep it simple to just the main pocket), but this should work on any other Debian or Debian based system (and the principle itself will be valid for any other system which handle package dependencies).

The simplest way to implement this would be through recursion: we should first code a find_package function that, given a package as input, produces all the dependancies, and then recurse through all the found dependencies to produce further results.

The following program does exactly that (I hardcoded my binary cache in there, if you want to check this out you would obviously have to adapt it to your system):

#!/usr/bin/perl

use strict;
use warnings;

#prototype declaration
sub find_depends($);

#list of already processed depends
my %seen;

#hardcoded cache, change to suit your configuration
my $cache = '/var/lib/apt/lists/it.archive.ubuntu.com_ubuntu_dists_hardy_main_binary-amd64_Packages';

open my $fh, "<$cache" or die "Cannot open cache $cache: $!\n";

find_depends($ARGV[0]);

sub find_depends($) {

    my $package = shift;

    #stop conditions
    return if (!$package or $seen{$package});

    #mark package as seen
    $seen{$package} = 1;

    #set record separator
    local $/='';

    #reset cache pointer
    seek($fh, 0, 0);

    while(<$fh>) {

	#check if package record is found
	if (/^Package:\s$package$/m) {

	    #extract depends line
	    my $depends = (/^Depends:\s(.+)$/m) && $1;
	    print "Package: $package\n  Depends: $depends\n";

	    #extract and process dependencies
	    foreach ( split(',|\|',$depends) ) {
		s/\s+//g;
		s/\([^)]+\)//;
		find_depends($_);
	    }

	    #stop processing
	    last;
	}
    }

    return;
}

This function is pretty simple, for each record in the cache, it checks if this is the record we are looking for; if it is it then proceeds to excerpt the dependencies (note that for simplicity this only handle Depends) and then for each of these dependencies it calls itself with that dependancy as input.

Well, it does the job, but, beside everything else, it doesn't produce its output in the way I like it.

If, for instance, you check the defoma package (mind that this could be different in your case), you will see that it has the following depends:

file, perl (>= 5.6.0-16), whiptail | dialog

The program is then going to check file, which has the following depends:

libc6 (>= 2.6.1-1), libmagic1, libmagic1 (= 4.21-3)

Then it is going to check libc6, and so on.
As you can see, it works depth first, which is all you can do with recursion.

However, I'd rather prefer to see first all the depends of the first level's depends and then all of those of the second and so on.
In other words, I'm interested in a breadth first search.

This could be done by transforming this version into an iterative one.

Mark Dominus talks about iterators and using them to transform recursion extensively in HOP Chapter 4 and 5, so I will only recap the main ideas here.

First of all, lets imagine having an array which contains already all the packages you are looking for.
Easy does it, you just scan it and print the results.

Obviously you don't have such an array, however, you can construct it iteratively.

Lets take the defoma package again as an example.
Our array will initially just contain this package. We then check its dependancies, since the package is already processed we take it out from the array, and we put into it the dependancies we have found: file, perl, whiptail, dialog.

Now, lets iterate again this array (that we could also call stack, or queue). We find the package file, so we take it out, and we fill it, from the end, with its dependancies.

The array should now look like this: perl, whiptail, dialog, libc6, libmagic1, libmagic1 (never mind the double entry for now).

Again, lets iterate though the array, we take out perl, and we fill the array, from its end, with its dependencies.

As you can see, this will also do the job. In practice we have used our own version of the stack (which perl uses to handle recursion) to keep track of which packages (or function calls) to track.

The following listing will implement this approach:

#!/usr/bin/perl

use strict;
use warnings;

#prototype declaration
sub find_depends($);

#list of already processed depends
my %seen;

#hardcoded cache, change to suit your configuration
my $cache = '/var/lib/apt/lists/it.archive.ubuntu.com_ubuntu_dists_hardy_main_binary-amd64_Packages';

open my $fh, "<$cache" or die "Cannot open cache $cache: $!\n";

*deps = find_depends($ARGV[0]);

while (defined( deps() )) {}

sub find_depends($) {

    my @queue = shift;

    return sub {

	#stop condition
	return unless (@queue);

	# return if package has been processed already
	my $package = shift @queue;
	return $package if $seen{$package};

	#mark package as seen
	$seen{$package} = 1;

	#set record separator
	local $/='';

	#reset cache pointer
	seek($fh, 0, 0);

	while(<$fh>) {

	    #check if package record is found
	    if (/^Package:\s$package$/m) {

		#extract depends line
		my $depends = (/^Depends:\s(.+)$/m) && $1;
		print "Package: $package\n  Depends: $depends\n";

		#extract and process dependencies
		foreach ( split(',|\|',$depends) ) {
		    s/\s+//g;
		    s/\([^)]+\)//;
		    push(@queue, $_);
		}

		return $package;
	    }
	}
    }
}

find_depends is now a function factory. The function it returns (the iterator deps()), does now almost exactly what the recursive version was doing, the only difference is that it now also manages the array of packages to visit (and of course it doesn't recurse itself, the iteration is managed in the main program by calling the iterator in a while loop).

Note that here the return value of the iterator is not used, I could just return anything I wanted as long as its defined; the undefined value is used to stop the iteration.

If you check out this version, you will see that indeed I now print all my dependancies level by level:
it will check file, perl and whiptail first and then proceed with their dependencies: libc6, libmagic1, etc.

By modifying the way new entries are added to the array I can of course also modify this behaviour.

Well, I hope you enjoyed this, it is going to be my last blog post on perl for a while: I intend to work on my next OpenGL demo (fur rendering) for the next few weeks.

Comments