=note I added chomp to all the read_config functions. =chapter Dispatch Tables In the previous chapter, we were able to make functions more flexible by parametrizing their behaviors in terms of other functions. For example, instead of hardwiring the F function to print a certain message every time it wanted to move a disk, we had it call a secondary function that was passed in from outside. By supplying an appropriate secondary function, we could make F print out a list of instructions, or check its own moves, or generate a graphic display, without recoding the basic algorithm. Similarly, we were able to abstract the directory-walking behavior away from the file-size-computing behavior of our F function to get a more useful and generally applicable F function that could be used to do all sorts of different things. To abstract behavior out of F and F, we made use of X. We passed F and F additional functions as arguments, effectively treating the secondary functions as pieces of data. Code references make this possible. Now we'll leave recursion for a while and go off in a different direction which shows another use of code references. =section Configuration File Handling Let's suppose that we have an application that reads in a configuration file in the following format: VERBOSITY 8 CHDIR /usr/local/app LOGFILE log ... ... We would like to read in this configuration file and take an appropriate action for each directive. For example, for the C directive, we just want to set a global variable. But for the C directive, we want to immediately redirect our diagnostic messages to the specified file. For C we might like to C to the specified directory so that subsequent file operations are relative to the new directory. This means that in the example above the C is C, and not the C file in whatever directory the user happened to be at the time the program was run. Many programmers would see this problem and immediately envision a function with a giant C switch in it, perhaps something like this: sub read_config { my ($filename) = @_; open my($CF), $filename or return; # Failure while (<$CF>) { chomp; my ($directive, $rest) = split /\s+/, $_, 2; if ($directive eq 'CHDIR') { chdir($rest) or die "Couldn't chdir to `$rest': $!; aborting"; } elsif ($directive eq 'LOGFILE') { open STDERR, ">> $rest" or die "Couldn't open log file `$rest': $!; aborting"; } elsif ($directive eq 'VERBOSITY') { $VERBOSITY = $rest; } elsif ($directive eq ...) { ... } ... } else { die "Unrecognized directive $directive on line $. of $filename; aborting"; } } return 1; # Success } This function is in two parts. The first part opens the file and reads lines from it one at a time. It separates each line into a C<$directive> part (the first word) and a C<$rest> part (the rest). The C<$rest> part contains the arguments to the directive, such as the name of the log file to open when supplied with the C directive. The second part of the function is a big C tree that checks the C<$directive> variable to see which directive it is, and aborts the program if the directive is unrecognized. This sort of function can get very large, because of the large number of alternatives in the C tree. Every time someone wants to add another directive, they change the function by adding another C clause. The contents of the branches of the C tree don't have much to do with each other, except for the inessential fact that they're all configurable. Such a function violates an important law of programming: related things should be kept together; unrelated things should be separated. Following this law suggests a different structure for this function: The part that reads and parses the file should be separate from the actions that are performed when the configuration directives are recognized. Moreover, the code for implementing the various unrelated directives should not all be lumped together into a single function. =subsection Table-driven configuration =note Is 'flexibility' the word you want here? We can do better by separating the code for opening, reading, and parsing the configuration file from the unrelated segments that implement the various directives. Dividing the program into two halves like this will give us better flexibility to modify each of the halves, and to separate the code for the directives. Here's a replacement for F: =listing read_config_tabular sub read_config { my ($filename, $actions) = @_; open my($CF), $filename or return; # Failure while (<$CF>) { chomp; my ($directive, $rest) = split /\s+/, $_, 2; if (exists $actions->{$directive}) { $actions->{$directive}->($rest); } else { die "Unrecognized directive $directive on line $. of $filename; aborting"; } } return 1; # Success } =endlisting read_config_tabular We open, read, and parse the configuration file exactly as before. But we dispense with the giant C switch. Instead, this version of C receives an extra argument, C<$actions>, which is a table of actions; each time F reads a configuration directive, it will perform one of these actions. This table is called a X, because it contains the functions to which F will dispatch control as it reads the file. The C<$rest> variable has the same meaning as before, but now it is passed to the appropriate action as an argument. A typical dispatch table might look like this: $dispatch_table = { CHDIR => \&change_dir, LOGFILE => \&open_log_file, VERBOSITY => \&set_verbosity, ... => ..., }; The dispatch table is a hash, whose keys (generically called X) are directive names, and whose values are X, references to subroutines that are invoked when the appropriate directive name is recognized. Action functions expect to receive the C<$rest> variable as an argument; typical actions look like these: sub change_dir { my ($dir) = @_; chdir($dir) or die "Couldn't chdir to `$dir: $!; aborting"; } sub open_log_file { open STDERR, ">>", $_[0] or die "Couldn't open log file `$_[0]': $!; aborting"; } sub set_verbosity { $VERBOSITY = shift } If the actions are small, we can put them directly into the dispatch table: $dispatch_table = { CHDIR => sub { my ($dir) = @_; chdir($dir) or die "Couldn't chdir to `$dir: $!; aborting"; }, LOGFILE => sub { open STDERR, ">> $_[0]" or die "Couldn't open log file `$_[0]': $!; aborting"; }, VERBOSITY => sub { $VERBOSITY = shift }, ... => ..., }; By switching to a dispatch table, we've eliminated the huge C tree, but in return we've gotten a table that is only a little smaller. That might not seem like a big win. But the table provides several benefits. =test read_config do 'read_config_tabular'; use File::Temp qw(tempfile); my ($fh, $filename) = tempfile(); print $fh "ONE dog\nTWO 3\nONE cat\nTWO 1\n"; close($fh); my $x = 0; my $y = ""; my $dispatch_table = { ONE => \&one, TWO => sub { $x+=$_[0] }, }; sub one { $y .= $_[0]; chomp $y } read_config($filename,$dispatch_table); is($x,4); is($y,"dogcat"); =endtest read_config =subsection Advantages of Dispatch Tables The dispatch table is data, instead of code, so it can be modified at run-time. For example, you can insert new directives into the table whenever you want to. Suppose the table has: 'DEFINE' => \&define_config_directive, where F is: =startlisting define_config_directive sub define_config_directive { my $rest = shift; $rest =~ s/^\s+//; my ($new_directive, $def_txt) = split /\s+/, $rest, 2; if (exists $CONFIG_DIRECTIVE_TABLE{$new_directive}) { warn "$new_directive already defined; skipping.\n"; return; } my $def = eval "sub { $def_txt }"; if (not defined $def) { warn "Could not compile definition for `$new_directive': $@; skipping.\n"; return; } $CONFIG_DIRECTIVE_TABLE{$new_directive} = $def; } =endlisting define_config_directive =note Not testing this version. Testing define_config_directive_tablearg below instead The configurator now accepts directives like this: DEFINE HOME chdir('/usr/local/app'); F puts C into C<$new_directive> and C into C<$def_txt>. It uses X to compile the definition text into a subroutine, and installs the new subroutine into a master configuration table, C<%CONFIG_DIRECTIVE_TABLE>, using C as the key. If C<%CONFIG_DIRECTIVE_TABLE> was in fact the dispatch table that was passed to F in the first place, then F will see the new definition, and will have an action associated with C if it sees the C directive on a later line of the input file. Now a config file can say DEFINE HOME chdir('/usr/local/app'); CHDIR /some/directory ... HOME The directives in C<...> are invoked in the directory C, and when the processor reaches C, it returns to its home directory. We can also define a more robust version of the same thing: DEFINE PUSHDIR use Cwd; push @dirs, cwd(); chdir($_[0]) DEFINE POPDIR chdir(pop @dirs) C V uses the F function provided by the standard C module to figure out the name of the current directory. It saves the name of the current directory in the variable C<@dirs>, and then changes to V. C undoes the effect of the last C. PUSHDIR /tmp A PUSHDIR /usr/local/app B POPDIR C POPDIR The program changes to C, then executes directive A. Then it changes to C and executes directive B. The following C returns the program to C, where it executes directive C; finally the second C returns it to wherever it started out. In order for C to modify the configuration table, we had to store it in a global variable. It's probably better if we pass the table to C explicitly. To do that we need to make a small change to C: =listing read_config_tablearg sub read_config { my ($filename, $actions) = @_; open my($CF), $filename or return; # Failure while (<$CF>) { chomp; my ($directive, $rest) = split /\s+/, $_, 2; if (exists $actions->{$directive}) { * $actions->{$directive}->($rest, $actions); } else { die "Unrecognized directive $directive on line $. of $filename; aborting"; } } return 1; # Success } =endlisting read_config_tablearg Now C can look like this: =startlisting define_config_directive_tablearg sub define_config_directive { * my ($rest, $dispatch_table) = @_; $rest =~ s/^\s+//; my ($new_directive, $def_txt) = split /\s+/, $rest, 2; * if (exists $dispatch_table->{$new_directive}) { warn "$new_directive already defined; skipping.\n"; return; } my $def = eval "sub { $def_txt }"; if (not defined $def) { warn "Could not compile definition for `$new_directive': $@; skipping.\n"; return; } * $dispatch_table->{$new_directive} = $def; } =endlisting define_config_directive_tablearg With this change, we can add a really useful configuration directive: DEFINE INCLUDE read_config(@_); This installs a new entry into the dispatch table that looks like this: INCLUDE => sub { read_config(@_) } Now, when we write this in the configuration file: INCLUDE extra.conf the main F will invoke the action, passing it two arguments. The first argument will be the C<$rest> from the configuration file; in this case the filename C. The second argument to the action will be the dispatch table again. These two arguments will be passed directly to a recursive call of C. C will read C, and when it's finished it will return control to the main invocation of C which will continue with the main configuration file, picking up where it left off. In order for the recursive call to work properly, F must be X. The easiest way to break reentrancy is to use a global variable, for example by using a global filehandle instead of the X we did use. If we had used a global filehandle, the recursive call to F would open C with the same filehandle that was being used by the main invocation; this would close the main configuration file. When the recursive call returned, F would be unable to read the rest of the main file, because its filehandle would have been closed. The C definition was very simple and very useful. But it was also ingenious, and it might not have occurred to us when we were writing C. It would have been easy to say `Oh, C doesn't need to be reentrant.' But if we had written C in a nonreentrant way, the useful and ingenious C definition wouldn't have worked. There's an important lesson to learn here: make functions reentrant by default, because sometimes the usefulness of being able to call a function recursively will be a surprise. =test read_config_again do 'read_config_tablearg'; do 'define_config_directive_tablearg'; my @known = qw(/tmp /usr /var /usr /home); use File::Temp qw(tempfile); my ($fh0, $temp0) = tempfile(); my $file0=<<" EOF"; DEFINE PUSHDIR use Cwd; push \@dirs, cwd(); chdir(\$_[0]) DEFINE POPDIR chdir(pop \@dirs) EOF $file0 =~ s/^\s+//mg; print $fh0 $file0; close($fh0); my ($fh1, $temp1) = tempfile(); my $file1=<<" EOF"; DEFINE HOME chdir('/home') INCLUDE $temp0 CHDIR /tmp CHECK PUSHDIR /usr CHECK PUSHDIR /var CHECK POPDIR CHECK HOME CHECK EOF $file1 =~ s/^\s+//mg; print $fh1 $file1; close($fh1); my $x = 0; my $y = ""; my $dispatch_table = { INCLUDE => sub { read_config(@_) }, DEFINE => \&define_config_directive, CHDIR => sub { my ($dir) = @_; chdir($dir) or die "Couldn't chdir to `$dir: $!; aborting"; }, CHECK => \&check, }; read_config($temp1,$dispatch_table); use Cwd; sub check { is( shift(@known), Cwd::getcwd ); } =endtest read_config_again X Reentrant functions exhibit a simpler and more predictable behavior than nonreentrant functions. They are more flexible, because they can be called recursively. Our C example above shows that we might not always anticipate all the reasons why someone might want to invoke a function recursively. It's better and safer to make everything reentrant if we can. X Another advantage of the dispatch table over hard-wired code in F is that we can use the same C function to process two unrelated files that have totally different directives, just by passing a different dispatch table to F each time. We can put the program into `beginner mode' by passing a stripped-down dispatch table to F. Or we can reuse F to process a different file with the same basic syntax by passing it a table with a different set of directives. =subsection Dispatch Table Strategies R In our implementation of C and C, the action functions used a global variable, C<@dirs>, to maintain the stack of pushed directories. This is unfortunate. We can get around this, and make the system more flexible, by having F support a X. This is an argument, supplied by the caller of F, which is passed verbatim to the actions: =listing read_config_userparam sub read_config { * my ($filename, $actions, $user_param) = @_; open my($CF), $filename or return; # Failure while (<$CF>) { my ($directive, $rest) = split /\s+/, $_, 2; if (exists $actions->{$directive}) { * $actions->{$directive}->($rest, $userparam, $actions); } else { die "Unrecognized directive $directive on line $. of $filename; aborting"; } } return 1; # Success } =endlisting read_config_userparam =note not testing read_config_userparam. combining with read_config_default This eliminates the global variable, because we can now define C and C like this: DEFINE PUSHDIR use Cwd; push @{$_[1]}, cwd(); chdir($_[0]) DEFINE POPDIR chdir(pop @{$_[1]) The C<$_[1]> parameter refers to the user parameter argument that is passed to F. If F is called with read_config($filename, $dispatch_table, \@dirs); then C and C will use the array C<@dirs> as their stack; if it is called with read_config($filename, $dispatch_table, []); then they will use a fresh, anonymous array as the stack. It's often useful to pass an action callback the name of the tag on whose behalf it was invoked. To do this, we change F like this: =listing read_config_tagarg sub read_config { my ($filename, $actions, $userparam) = @_; open my($CF), $filename or return; # Failure while (<$CF>) { my ($directive, $rest) = split /\s+/, $_, 2; if (exists $actions->{$directive}) { * $actions->{$directive}->($directive, $rest, $actions, $userparam); } else { die "Unrecognized directive $directive on line $. of $filename; aborting"; } } return 1; # Success } =endlisting read_config_tagarg =note not testing read_config_tagarg. combining with read_config_default Why is this useful? Consider the action we defined for the C directive: VERBOSITY => sub { $VERBOSITY = shift }, It's easy to imagine that there might be several configuration directives that all follow this general pattern: VERBOSITY => sub { $VERBOSITY = shift }, TABLESIZE => sub { $TABLESIZE = shift }, PERLPATH => sub { $PERLPATH = shift }, ... etc ... We would like to merge the three similar actions into a single function that does the work of all three. To do that, the function needs to know the name of the directive so that it can set the appropriate global variable: VERBOSITY => \&set_var, TABLESIZE => \&set_var, PERLPATH => \&set_var, ... etc ... sub set_var { my ($var, $val) = @_; $$var = $val; } Or, if you don't like a bunch of global variables running around loose, you can store configuration information in a hash, and pass a reference to the hash as the user parameter: sub set_var { my ($var, $val, undef, $config_hash) = @_; $config_hash->{$var} = $val; } For this example, not much is saved, because the action is so simple. But there might be several configuration directives that need to share a more complicated function. Here's a slightly more complicated example: sub open_input_file { my ($handle, $filename) = @_; unless (open $handle, $filename) { warn "Couldn't open $handle file `$filename': $!; ignoring.\n"; } } This F function can be shared by many configuration directives. For example, suppose a program has three sources of input: a history file, a template file, and a pattern file. We would like the locations of all three files to be configurable in the configuration file; this requires three entries in the dispatch table. But the three entries can all share the same F function: ... HISTORY => \&open_input_file, TEMPLATE => \&open_input_file, PATTERN => \&open_input_file, ... Now suppose the configuration file says: HISTORY /usr/local/app/history TEMPLATE /usr/local/app/templates/main.tmpl PATTERN /home/bill/app/patterns/default.pat F will see the first line and dispatch to the F function, passing it the argument list C<('HISTORY', '/usr/local/app/history')>. F will take the C argument as a filehandle name, and open the C filehandle to come from the C file. On the second line, F will dispatch to the F again, this time passing it C<('TEMPLATE', '/usr/local/app/templates/main.tmpl')>. This time, F will open the C