=chapter Higher-Order Functions and Currying Our F function of R was a function factory, building stub functions that served as replacements for other functions. The technique of using functions to build other functions is extremely powerful. In this chapter, we'll look at a technique called I, which transforms an ordinary function into a function factory for manufacturing more functions, and other techniques for transforming one function into another. A X is a function that operates on other functions instead of on data values. Some of these take data arguments and manufacture functions to order; others, like the F function of X, transform one function into another one. =section Currying We have seen several times so far how to use callbacks to parametrize the behavior of a function so that it can serve many purposes. For example, in R we saw how a generic directory-walking function could be used to print a list of dangling symbolic links, to return a list of interesting files, or to copy an entire directory. Callbacks are a way to make functions more general by supplying other functions to them as arguments. We saw how to write functions that used closures to generate other functions as return values. The I technique we'll see combines closures and callbacks, turning an ordinary function into a factory that manufactures functions on demand. Recall our F function from R. Its arguments were an HTML tree and a pair of callbacks, one to handle plain text and one to handle tagged text. We had found a way to use this to extract all the text that was enclosed in T

tags: =inline extract_headers We then observed that it would make sense to abstract the T

out of F, to make it more general: =inline promote_if The second callback in F is rather peculiar. It's an anonymous function that we manufactured solely to call F with the right arguments. The previous version of the code was tidier. What we need is a way to get F to I the F function we need. This seems like it should be possible, since after all F already knows how to perform the task that we want F to perform. All that we need to do is to have F wrap up that behavior into a new function: =startlisting promote_if_curried sub promote_if { my $is_interesting = shift; * return sub { my $element = shift; if ($is_interesting->($element->{_tag}) { return ['keeper', join '', map {$_->[1]} @_]; } else { return @_; } * } } =contlisting promote_if_curried Instead of accepting both arguments right away, F now gets the C<$is_interesting> callback only, and manufactures a new function that, given an HTML element, promotes it if it's considered interesting. Making this change to F, to turn it from a function of two arguments into a function of one argument that returns a function of one argument, is called X it, and the version of F immediately above is the X version of F.N in 1924. X X X, Moses|i>> The happy outcome is that the call to F is now much simpler: my @tagged_texts = walk_html($tree, sub { ['maybe', $_[0]] }, promote_if('h1'), }); Once you get used to the idea of currying, you start to see opportunities to do it all over. Recall our functions from R for adding and multiplying two streams together element-by-element: F and F. sub add2 { my ($s, $t) = @_; return unless $s && $t; node(head($s) + head($t), promise { add2(tail($s), tail($t)) }); } sub mul2 { my ($s, $t) = @_; return unless $s && $t; node(head($s) * head($t), promise { mul2(tail($s), tail($t)) }); } These functions are almost identical. We saw in R that two functions with similar code can often be combined into a single function that accepts a callback parameter. In this case, the callback, C<$op>, specifies the operation to use to combine C and C: sub combine2 { my ($s, $t, $op) = @_; return unless $s && $t; node($op->(head($s), head($t)), promise { combine2(tail($s), tail($t), $op) }); } Now we can build F and F from F: sub add2 { combine2(@_, sub { $_[0] + $_[1] }) } sub mul2 { combine2(@_, sub { $_[0] * $_[1] }) } Since a major use of F is to manufacture such functions, it would be more convenient for F to do what we wanted in the first place. We can turn F into a factory that manufactures stream-combining functions by currying it: =startlisting combine2 sub combine2 { * my $op = shift; * return sub { * my ($s, $t) = @_; return unless $s && $t; node($op->(head($s), head($t)), * promise { combine2($op)->(tail($s), tail($t)) }); * }; } =endlisting combine2 Now we simply have $add2 = combine2(sub { $_[0] + $_[1] }); $mul2 = combine2(sub { $_[0] * $_[1] }); This may also be fractionally more efficient, since we won't have to do an extra function call every time we call F or F. F is the function to add the two streams, rather than a function that re-invokes F in a way that adds two streams. If we want these functions to stick around, we can give them names, as above; alternatively, we can use them anonymously: my $catstrs = combine2(sub { "$_[0]$_[1]" })->($s, $t); =test catstrs use Stream qw(:all); do 'combine2'; my $s = upto(1,4); my $t = upto(5,9); my $catstrs = combine2(sub { "$_[0]$_[1]" })->($s, $t); for my $want (qw(15 26 37 48)) { is(head($catstrs),$want); $catstrs = tail($catstrs); } is($catstrs,undef); =endtest Instead of the F function we saw earlier, we might prefer this curried version: sub scale { my $s = shift; return sub { my $c = shift; return if $c == 0; transform { $_[0] * $c } $s; } } F is now a function factory. Instead of taking a stream and a number and returning a new stream, it takes a stream and manufactures a function that produces new streams. C<$scale_s = scale($s)> returns a function for scaling C<$s>; given a numeric argument, say C<$n>, C<$scale_s> produces a stream that has the elements of C<$s> scaled by C<$n>. For example C<$scale_s-\>(2)> returns a stream whose every element is twice C<$s>'s, and C<$scale_s-\>(3)> returns a stream whose every element is three times C<$s>'s. If we're planning to scale the same stream by several different factors, it might make sense to have a single scale function to generate all the outputs. Depending on how we're using it, we might have preferred to curry the function arguments in the other order: =listing scale sub scale { * my $c = shift; return sub { * my $s = shift; transform { $_[0] * $c } $s; } } =endlisting scale Now F is a factory for manufacturing scaling functions. C returns a function which takes any stream and doubles it; C returns a function which takes any stream and triples it. We could write C<$double = scale(2)> and then use C<$double-\>($s)> to double C<$s>, or C($s)> to double C<$s>. If you don't like the extra arrows in C<$double-\>($s)> you can get rid of them by using Perl's I feature: =note didn't you use this earlier in *sum? *double = scale(2); $s2 = double($s); =note currying = closures + callbacks =test scale use Stream qw(:all); do 'scale'; my $double = scale(2); my $triple = scale(3); my $s = upto(1,3); my $s2 = $double->($s); my $s3 = $triple->($s); for my $want (qw(2 4 6)) { is(head($s2),$want); $s2 = tail($s2); } is($s2,undef); for my $want (qw(3 6 9)) { is(head($s3),$want); $s3 = tail($s3); } is($s3,undef); =endtest scale Similarly, in R, we defined a F function that returned the slope of some other function at a particular point: sub slope { my ($f, $x) = @_; my $e = 0.00000095367431640625; ($f->($x+$e) - $f->($x-$e)) / (2*$e); } We could make this more flexible by currying the C<$x> argument: =listing slope0 sub slope { * my $f = shift; my $e = 0.00000095367431640625; * return sub { * my $x = shift; ($f->($x+$e) - $f->($x-$e)) / (2*$e); * }; } =endlisting slope0 =test slope0 do 'slope0'; my $d1 = slope( sub { 2 * $_[0] } ); is($d1->(3) ,2); is($d1->(-1233),2); my $d2 = slope( sub { cos( $_[0] ) } ); is(int(10000*$d2->(0)) ,int(10000*-sin(0))); is(int(10000*$d2->(.1)) ,int(10000*-sin(.1))); is(int(10000*$d2->(3.14)) ,int(10000*-sin(3.14))); =endtest slope0 F now takes a function and returns its derivative function! By evaluating the derivative function at a particular point, we compute the slope at that point. If we like, we can use Perl's polymorphism to put both behaviors into the same function: =listing slope sub slope { my $f = shift; my $e = 0.00000095367431640625; * my $d = sub { my ($x) = shift; ($f->($x+$e) - $f->($x-$e)) / (2*$e); }; * return @_ ? $d->(shift) : $d; } =endlisting slope Now we can call C as before, to compute the slope of C<$f> at the point C<$x>, or we can call C and get back the derivative function of C<$f>. =note if you get the kerning-pairs stuff back, then don't forget there is an example there, F, that uses it; bring this up. =test slope do 'slope'; my $d1 = slope( sub { 2 * $_[0] } ); is($d1->(3) ,2); is($d1->(-1233),2); is(slope( sub { 2 * $_[0] }, 12345 ), 2 );; my $d2 = slope( sub { cos( $_[0] ) } ); is(int(10000*$d2->(0)) ,int(10000*-sin(0))); is(int(10000*$d2->(.1)) ,int(10000*-sin(.1))); is(int(10000*$d2->(3.14)) ,int(10000*-sin(3.14))); =endtest slope Currying can be a good habit to get into. Earlier, we wrote sub iterate_function { my ($f, $x) = @_; my $s; $s = node($x, promise { &transform($f, $s) }); } But it's almost as easy to write it this way instead: =startlisting iterate_function sub iterate_function { * my $f = shift; * return sub { * my $x = shift; my $s; $s = node($x, promise { &transform($f, $s) }); * }; } =endlisting iterate_function It requires hardly any extra thought to do it this way, and the payoff is substantially increased functionality. We now have a function which manufactures stream-building functions to order. We could construct F as a special case of F, for example: *upfrom = iterate_function(sub { $_[0] + 1 }); Or similarly, our earlier example of F: *pow2_from = iterate_function(sub { $_[0] * 2 }); =test iterate_function use Stream qw(:all); do 'iterate_function'; *upfrom = iterate_function(sub { $_[0] + 1 }); *pow2_from = iterate_function(sub { $_[0] * 2 }); my $from3toinfinity = upfrom(3); for (qw(3 4 5)) { is( head($from3toinfinity) , $_); $from3toinfinity = tail($from3toinfinity); } # we're not going to infinity my $andbeyond = pow2_from->(4); for (qw(4 8 16 32 64)) { is( head($andbeyond), $_); $andbeyond = tail($andbeyond); } =endtest iterate_function One final lexical point about currying: when currying a recursive function, it's often possible to get a small time and memory performance improvement by tightening up the recursion. For example, consider F again: sub combine2 { my $op = shift; return sub { my ($s, $t) = @_; return unless $s && $t; node($op->(head($s), head($t)), promise { combine2($op)->(tail($s), tail($t)) }); }; } C will return the same result function every time. So we should be able to get a speedup by caching its value and using the cached value in the promise instead of repeatedly calling jC. Moreover, C is precisely the value that F is about to return anyway. So we can change this to: =listing combine2.1 sub combine2 { my $op = shift; * my $r; * $r = sub { my ($s, $t) = @_; return unless $s && $t; node($op->(head($s), head($t)), * promise { $r->(tail($s), tail($t)) }); }; } =endlisting combine2.1 =test combine2.1 use Stream qw(:all); do 'combine2.1'; my $s = upto(1,4); my $t = upto(5,9); my $catstrs = combine2(sub { "$_[0]$_[1]" })->($s, $t); for my $want (qw(15 26 37 48)) { is(head($catstrs),$want); $catstrs = tail($catstrs); } is($catstrs,undef); =endtest combine2.1 Now the promise no longer needs to call F; we've cached the value that F is about to return by storing it in C<$r>, and the promise can call C<$r> directly. The code is also more perspicuous this way: now the promise says explicitly that the function will be calling itself on the tails of the two streams. These curried functions are examples of X. Ordinary functions operate on values: You put some values in, and you get some other values out. Higher-order functions are functions that operate on other functions: You put some functions in, and you get some other functions out. For example, in F we put in a function to operate on two scalars and we got out an analogous function to operate on two streams. =note multiple levels of currying =note See SOAP thing in IDEAS file @20020227 =section Common Higher-Order Functions Probably the two most fundamental higher-order functions for any list or other kind of sequence are analogs of F and F. F and F are higher-order functions because each of them takes an argument which is itself another function. We've already seen versions of F and F for iterators and streams. Perl's standard F and F each take a function and a list and return a new list, for example map { $_ * 2 } (1..5); # returns 2, 4, 6, 8, 10 grep { $_ % 2 == 0 } (1..10); # returns 2, 4, 6, 8, 10 Often it's more convenient to have curried versions of these: =startlisting cmap sub cmap (&) { my $f = shift; my $r = sub { my @result; for (@_) { push @result, $f->($_); } @result; }; return $r; } =endlisting cmap =test cmap-oops 1 BEGIN { do "cmap" } my @a; # my @empty = cmap { $_ * 2 } @a; SKIP: { skip "this was to check if poymorphic cmap failed on an empty list", 1; is(scalar(@empty), 0, "oops"); } =endtest =startlisting cgrep sub cgrep (&) { my $f = shift; my $r = sub { my @result; for (@_) { push @result, $_ if $f->($_); } @result; }; return $r; } =endlisting cgrep These should be called like this: $double = cmap { $_ * 2 }; $find_slashdot = cgrep { $_->{referer} =~ /slashdot/i }; After which C<$double-\>(1..5)> returns (2, 4, 6, 8, 10) and C<$find_slashdot-\>(weblog())> returns the weblog records that represent referrals from Slashdot. It may be tempting to try to make F and F polymorphic, as we did with F. (I was tempted, anyway.) * sub cmap (&;@) { my $f = shift; my $r = sub { my @result; for (@_) { push @result, $f->($_); } @result; }; * return @_ ? $r->(@_) : $r; } Then we would also be able to use F and F like regular F and F: @doubles = cmap { $_ * 2 } (1..5); @evens = cgrep { $_ % 2 == 0 } (1..10); Unfortunately, this apparently happy notation hides an evil surprise: @doubles = cmap { $_ * 2 } @some_array; If C<@some_array> is empty, C<@doubles> is assigned a reference to a doubling function. =test cmapgrep 2 BEGIN { do 'cmap'; do 'cgrep'; } $double = cmap { $_ * 2 }; $find_evens = cgrep { $_ % 2 == 0 }; is_deeply( [$double->(1..5)], [2,4,6,8,10] ); is_deeply( [$find_evens->(1..10)], [2,4,6,8,10] ); # SKIP: { # skip "you took out the stuff about polymorphic cmap and cgrep", 2; # @doubles = cmap { $_ * 2 } (1..5); # @evens = cgrep { $_ % 2 == 0 } (1..10); # # is_deeply( \@doubles, [2,4,6,8,10] ); # is_deeply( \@evens, [2,4,6,8,10] ); # } =endtest cmapgrep =subsection Automatic Currying We've written the same code several times to implement curried functions: sub some_curried_function { my $first_arg = shift; my $r = sub { ... }; return @_ ? $r->(@_) : $r; } (Possibly with the poymorphism trick omitted from the final line.) As usual, once we recognize this pattern, we should see if it makes sense to abstract it into a function: =listing Curry.pm package Curry; use base 'Exporter'; @EXPORT = ('curry'); @EXPORT_OK = qw(curry_listfunc curry_n); sub curry_listfunc { my $f = shift; return sub { my $first_arg = shift; return sub { $f->($first_arg, @_) }; }; } sub curry { my $f = shift; return sub { my $first_arg = shift; my $r = sub { $f->($first_arg, @_) }; return @_ ? $r->(@_) : $r; }; } 1; =endlisting Curry.pm F takes any function and returns a curried version of that function. For example, consider the F function from R: sub imap (&$) { my ($transform, $it) = @_; return sub { my $next = NEXTVAL($it); return unless defined $next; return $transform->($next); } } F is analogous to F, but operates on iterators rather than on lists. We might use it like this: my $doubles_iterator = imap { $_[0] * 2 } $it; If we end up doubling a lot of iterators, we have to repeat the C<{$_[0] * 2}> part: my $doubles_a = imap { $_[0] * 2 } $it_a; my $doubles_b = imap { $_[0] * 2 } $it_b; my $doubles_c = imap { $_[0] * 2 } $it_c; We might wish we had a single, special purpose function for doubling every element of an iterator, so we could write instead my $doubles_a = double $it_a; my $doubles_b = double $it_b; my $doubles_c = double $it_c; Or even my ($doubles_a, $doubles_b, $doubles_c) = map double($_), $it_a, $it_b, $it_c; If we had written F in a curried style, we could have done *double = imap { $_[0] * 2 }; but we didn't, so we can't. But that's no problem, because F will manufacture a curried version of F on the fly: *double = curry(\&imap)->(sub { $_[0] * 2 }); Since the curried F function came in handy once, perhaps we should keep it around in case we need it again: *c_imap = curry(\&imap); Then to manufacture F we do: *double = c_imap(sub { $_[0] * 2 }); =test curry use Curry; # easier to just drop this here, so we know what version we've got sub imap (&$) { my ($transform, $it) = @_; return sub { my $next = $it->(); return unless defined $next; return $transform->($next); } } sub upto { my ($m, $n) = @_; return sub { return $m <= $n ? $m++ : undef; }; } *c_imap = curry(\&imap); *double = c_imap(sub { $_[0] * 2 }); my $it = upto(1,4); my $doubleit = double($it); is($doubleit->(),2); is($doubleit->(),4); is($doubleit->(),6); is($doubleit->(),8); is($doubleit->(),undef); =endtest curry =subsection Prototypes The only drawback of this approach is that we lose F's pretty calling syntax, which is enabled by the C<(&@)> prototype at compile time. We can get it back, although the results are somewhat peculiar. First, we modify F so that the function it manufactures has the appropriate prototype: sub curry { my $f = shift; * return sub (&;@) { my $first_arg = shift; my $r = sub { $f->($first_arg, @_) }; return @_ ? $r->(@_) : $r; }; } Then we call F at compile time instead of at run time: BEGIN { *c_imap = curry(\&imap); } Now we can say *double = c_imap { $_[0] * 2 }; and we can still use F in place of regular F: $doubles_a = c_imap { $_[0] * 2 } $it_a; =subsubsection Prototype Problems The problem with this technique is that the prototype must be hardwired into F, so now it will I generate curried functions with the prototype C<(&;@)>. This isn't a problem for functions like F or F, which would have had that prototype anyway. But that prototype is inappropriate for the curried version of the F function from R. The uncurried version was: sub scale { my ($s, $c) = @_; $s->transform(sub { $_[0]*$c }); } C returns a function that behaves like this: sub { my $s = shift; my $r = sub { scale($s, @_) }; return @_ ? $r->(@_) : $r; } The internals of this function are correct, and it will work just fine, as long as it I have a C<(&;@)> prototype. Such a prototype would be inappropriate, since the function is expecting to get one or two scalar arguments. The correct prototype would be C<($;$)>. But if we did: BEGIN { *c_scale = curry(\&scale) } then the resulting F function wouldn't work, because it would have a C<(&;@)> prototype when we expectedto call it as though it had a C<($;$)> prototype. We want to call it in one of these two ways: my $double = c_scale(2); my $doubled_it = c_scale(2, $it); but because F would have a prototype of C<(&;@)>, these both would be syntax errors, yielding: Type of arg 1 to main::c_scale must be block or sub {} (not constant item)... This isn't a show-stopper. This works: *c_scale = curry(\&scale); my $double = c_scale(2); my $doubled_it = c_scale(2, $it); Here the call to F is compiled, with no prototype, before C<*c_scale> is assigned to; the call to F that sets up the bad prototype occurs too late to foul up our attempt to (correctly) call F. But now we have a somewhat confusing situation. Our F function creates curried functions with C<(&;@)> prototypes, and these prototypes may be inappropriate. But the prototypes are inoperative unless F is called in a C block. To add to the confusion, this doesn't work: *c_scale = curry(\&scale); my $double = eval 'c_scale(2)'; because, once again, the call to F has been compiled after the prototype was set up by F. There isn't really any easy way to fix this. The obvious thing to do is to tell F what prototype we desire by supplying it with an optional parameter: # Doesn't really work sub curry { my $f = shift; * my $PROTOTYPE = shift; * return sub ($PROTOTYPE) { my $first_arg = shift; my $r = sub { $f->($first_arg, @_) }; return @_ ? $r->(@_) : $r; }; } =note You can't just make shit up and expect the computer to know what you mean, retardo! Unfortunately, this is illegal; C<($PROTOTYPE)> does I indicate that the desired prototype is stored in C<$PROTOTYPE>. Perl 5.8.1 provides a C function to set the prototype of a particular function: =listing curry.set_prototype # Doesn't work before 5.8.1 * use Scalar::Util 'set_prototype'; sub curry { my $f = shift; my $PROTOTYPE = shift; * set_prototype(sub { my $first_arg = shift; my $r = sub { $f->($first_arg, @_) }; return @_ ? $r->(@_) : $r; * }, $PROTOTYPE); } =endlisting curry.set_prototype If you don't have 5.8.1 yet, the only way to dynamically specify the prototype of a function is to use string C: =listing curry.eval sub curry { my $f = shift; my $PROTOTYPE = shift; * $PROTOTYPE = "($PROTOTYPE)" if defined $PROTOTYPE; * my $CODE = q{sub PROTOTYPE { my $first_arg = shift; my $r = sub { $f->($first_arg, @_) }; return @_ ? $r->(@_) : $r; * }}; * $CODE =~ s/PROTOTYPE/$PROTOTYPE/; * eval $CODE; } =endlisting curry.eval =note probably should test the curry.set_prototype and the curry.eval =subsection More Currying We can extend the idea of F and build a function that generates a generic curried version of another function: =contlisting Curry.pm sub curry_n { my $N = shift; my $f = shift; my $c; $c = sub { if (@_ >= $N) { $f->(@_) } else { my @a = @_; curry_n($N-@a, sub { $f->(@a, @_) }); } }; } =endlisting Curry.pm F takes two arguments: a number V, and a function V, which expects at least V arguments. The result is a new function, V, which does the same thing V does, but which accepts curried arguments. If V is called with V or more arguments, it just passes them on to V and returns the result. If there are fewer than V arguments, V generates a new function that remembers the arguments that were passed; if this new function is called with the remaining arguments, both old and new arguments are given to V. For example: *add = curry_n(2, sub { $_[0] + $_[1] }); And now we can call add(2, 3); # Returns 5 or: *increment = add(1); increment(8); # return 9 Or perhaps more realistically: *csubstr = curry_n(3, sub { defined $_[3] ? substr($_[0], $_[1], $_[2], $_[3]) : substr($_[0], $_[1], $_[2]) }); Then we can use any of: # Just like regular substr $ss = csubstr($target, $start, $length); csubstr($target, $start, $length, $replacement); # Not just like regular substr $target = "I like pie"; # This '$part' function gets two arguments: a start position # and a length; it returns the apporpriate part of $target. $part = csubstr($target); my $ss = $part->($start, $length); # This function gets an argument N and returns that many characters # from the beginning of $target. $first_N_chars = csubstr($target, 0); my $prefix_3 = $first_N_chars->(3); # "I l" my $prefix_7 = $first_N_chars->(7); # "I like " =test curry_n use Curry 'curry_n'; *add = curry_n(2, sub { $_[0] + $_[1] }); is(add(2, 3), 5); *increment = add(1); is(increment(8),9); *csubstr = curry_n(3, sub { defined $_[3] ? substr($_[0], $_[1], $_[2], $_[3]) : substr($_[0], $_[1], $_[2]) }); { my $target = "I like pie"; is(csubstr($target, 2, 4), "like"); is(csubstr($target, 2, 4, "eat"), "like"); is($target, "I eat pie"); } { # prove it works with substr too my $target = "I like pie"; is(substr($target, 2, 4), "like"); is(substr($target, 2, 4, "eat"), "like"); is($target, "I eat pie"); } # This '$part' function gets two arguments: a start position # and a length; it returns the apporpriate part of $target. { my $target = "I like pie"; my $part = csubstr($target); is($part->(2,4), "like"); my $ss = $part->(2,4,"eat"); is($ss,"like"); # hrm. i expected this to work like the previous ones. - rspier # hrm. me too. - mjd ### is($target,"I eat pie"); } { my $target = "I like pie"; my $first_N_chars = csubstr($target, 0); is($first_N_chars->(3), "I l"); is($first_N_chars->(7), "I like "); } =endtest curry_n =subsection Yet More Currying Many of the functions we saw earlier in the book would benefit from currying. For example, F from R: =inline dir_walk_callbacks Here we specify a top directory and two callback functions. But the callback functions are constant through any call to F, and we might like to specify them in advance, because we might know them well before we know what directories we want to search. The conversion is easy: =startlisting dir_walk_curried sub dir_walk { * unshift @_, undef if @_ < 3; my ($top, $filefunc, $dirfunc) = @_; * my $r; * $r = sub { my $DIR; * my $top = shift; if (-d $top) { my $file; unless (opendir $DIR, $top) { warn "Couldn't open directory $code: $!; skipping.\n"; return; } my @results; while ($file = readdir $DIR) { next if $file eq '.' || $file eq '..'; * push @results, $r->("$top/$file"); } return $dirfunc->($top, @results); } else { return $filefunc->($top); } }; * defined($top) ? $r->($top) : $r; } We can still call C and get the same result, or we can omit the C<$top> argument (or pass C) and get back a specialized file-walking function. As a minor added bonus, the recursive call will be fractionally more efficient because the callback arguments don't need to be explicitly passed. =endlisting dir_walk_curried =note skipped testing for dir_walk_curried because of external dependencies -rsp =test dir_walk_curried 3 do "dir_walk_curried"; my @RESULT; sub accumulate { @_ } my $TOP = "Tests/TESTDIR"; my @items = ($TOP, qw(a a/a1 a/a2 b b/b1 c c/c1 c/c2 c/c3 c/d c/d/d1 c/d/d2)); @RESULT = dir_walk($TOP, \&accumulate, \&accumulate); s{^$TOP/}{}o for @RESULT; print "# @RESULT\n"; is_deeply(\@RESULT, \@items, "uncurried version"); my $DW = dir_walk(\&accumulate, \&accumulate); @RESULT = $DW->($TOP); s{^$TOP/}{}o for @RESULT; print "# @RESULT\n"; is_deeply(\@RESULT, \@items, "curried version"); $DW = dir_walk(undef, \&accumulate, \&accumulate); @RESULT = $DW->($TOP); s{^$TOP/}{}o for @RESULT; print "# @RESULT\n"; is_deeply(\@RESULT, \@items, "curried version"); =endtest =section F and F The standard Perl X module> provides several commonly requested functions that are not built in to Perl. These include F and F functions, which respectively return the largest and smallest numbers in their argument lists, F and F, which are the analogous functions for strings, and F, which returns the sum of the numbers in a list. If we write sample code for these five functions, we'll see the similarity immediately: sub max { my $max = shift; for (@_) { $max = $_ > $max ? $_ : $max } return $max; } sub min { my $min = shift; for (@_) { $min = $_ < $min ? $_ : $min } return $min; } sub maxstr { my $max = shift; for (@_) { $max = $_ gt $max ? $_ : $max } return $max; } sub minstr { my $min = shift; for (@_) { $min = $_ lt $min ? $_ : $min } return $min; } sub sum { my $sum = shift; for (@_) { $sum = $sum + $_ } return $sum; } Generalizing this gives us the F function that is also provided by C: sub reduce { my $code = shift; my $val = shift; for (@_) { $val = $code->($val, $_) } return $val; } (C is actually written in C for speed, but what it does it equivalent to this Perl code.) The idea is that we're going to scan the list one element at a time, accumulating a 'total' of some sort. We provide a function (C<$code>) which says how to compute the new 'total', given the old total (first argument) and the current element (second argument). If our goal is just to add up all the list elements, then we compute the total at each stage by adding the previous total to the current element: reduce(sub { $_[0] + $_[1] }, @VALUES) == sum(@VALUES) If our goal is to find the maximum element, then the 'total' is actually the maximum so far, then we compute the total at each stage by taking whichever of the current maximum and the current element is larger: reduce(sub { $_[0] > $_[1] ? $_[0] : $_[1] }, @VALUES) == max(@VALUES) The F function provided by C is easier to call than the one above. It places the total-so-far in C<$a> and the current list element into C<$b> before invoking the callback, so that one can write reduce(sub { $a + $b }, @VALUES) reduce(sub { $a > $b ? $a : $b }, @VALUES) We saw how to make this change back in R, when we arranged to have F's callback invoked with the current iterator value in C<$_> in addition to C<$_[0]>; this allowed it to have a more F-line calling syntax. We can arrange F similarly: sub reduce (&@) { my $code = shift; my $val = shift; for (@_) { * local ($a, $b) = ($val, $_); $val = $code->($val, $_) } return $val; } Here we're using the global variables C<$a> and C<$b> to pass the total and the current list element. Use of global variables normally causes a compile-time failure under X>, but there is a special exemption for the variables C<$a> and C<$b>. The exemption is there to allow usages just like this one, and in particular to support the analogous feature of Perl's built-in F function. The C version of F already has this feature built in. If we curry the F function, we can use it to I functions like F and F: BEGIN { *reduce = curry(\&List::Util::reduce); *sum = reduce { $a + $b }; *max = reduce { $a > $b ? $a : $b }; } =note also 'every' and 'any' This version of F isn't quite as general as it could be. All the functions manufactured by F have one thing in common: given an empty list of arguments, they always return undef. For F and F this may be appropriate, but for F it's wrong; the sum of an empty list should be taken to be 0. (The F function provided by C also has this defect.) This small defect masks a larger one: when the argument list is nonempty, the F above assumes that the 'total' should be initialized to the first data item. This happens to work for F and F, but it isn't appropriate for all functions. C can be made much more general if we drop this assumption. As a trivial example, suppose we want a function to produce the length of a list. This is I what we want: reduce { $a + 1 }; But it only produces the correct length when given a list whose first element is 1, since otherwise C<$val> is incorrectly initialized. A more general version of F accepts an explicit parameter to say what value should be returned for an empty list: * sub reduce (&$@) { my $code = shift; my $val = shift; for (@_) { local ($a, $b) = ($val, $_); $val = $code->($val, $_) } return $val; } A version with optional currying is: =listing reduce sub reduce (&;$@) { my $code = shift; my $f = sub { my $base_val = shift; my $g = sub { my $val = $base_val; for (@_) { local ($a, $b) = ($val, $_); $val = $code->($val, $_); } return $val; }; @_ ? $g->(@_) : $g; }; @_ ? $f->(@_) : $f; } =endlisting reduce =note the above reduce function is missing a }; and it wouldn't hurt if $val was initialized The list-length function is now *listlength = reduce { $a + 1 } 0; where the C<0> here is the correct result for an empty list. Similarly, *product = reduce { $a * $b } 1; is a function which multiplies all the elements in a list of numbers. We can even use F to compute both at the same time: *length_and_product = reduce { [$a->[0]+1, $a->[1]*$b] } [0, 1]; This makes only one pass over the list to compute both the length and the product. For an empty list, the result is C<[0, 1]>, and for a list with one element V, the result is C<[1, x]>. F can only manufacture functions that return undef for the empty list, and that return the first list element for a single-element list. The F function can't be generated by F because it doesn't have these properties. =note counting function sub count { my $s = @_ } ? =note hashpush %h, k=>v, k=>v,... ? =test reduce sub reduce (&;$@); do 'reduce'; # we're only testing the final super-duper version right now. # reduce(sub { $_[0] + $_[1] }, @VALUES) == sum(@VALUES) is(reduce(sub { $a + $b },1,2,3),6); # reduce(sub { $_[0] > $_[1] ? $_[0] : $_[1] }, @VALUES) == max(@VALUES) my $z = reduce { $_[0] > $_[1] ? $_[0] : $_[1] } 2,3,1; is($z, 3); *listlength = reduce { $a + 1 } 0; is(listlength(10..20), 11, "listlength nonempty"); is(listlength(), 0, "listlength empty"); *product = reduce { $a * $b } 1; is(product(2..7), 5040, "7!"); is(product(), 1, "0!"); =endtest reduce =note Here's a nice example. fold() on gcd() produces a version of gcd that works on multiple values instead of on only two. A properly general version of F gets an additional argument, which says that the function should return when given an empty list as its argument. In the programming literature, the properly general version of F is more typically called F: sub fold { my $f = shift; my $fold; $fold = sub { my $x = shift; sub { return $x unless @_; my $first = shift; $fold->($f->($x, $first), @_) } } } Eliminating the recursion yields: =startlisting fold sub fold { my $f = shift; sub { my $x = shift; sub { my $r = $x; * while (@_) { * $r = $f->($r, shift()); * } * return $r; } } } =endlisting fold =test fold do 'fold'; do 'gcd'; my $gcdf = fold(\&gcd)->(0); is($gcdf->(6,9),3); is($gcdf->(7,5),1); is($gcdf->(9,81,15),3); is($gcdf->(9,81,15,2),1); is($gcdf->(9,81,18,27),9); =endtest fold =note we discussed 'reduce' for streams in Chapter VI already =subsection Boolean operators Back in R we saw a system that would search backwards through a log file looking for records that matched a simple query. To extend this into a useful database system, we need to be able to combine simple queries into more complex ones. Let's suppose that C<$a> and C<$b> are iterators that will produce data items that match queries V and V, respectively. How can we manufacture an iterator that matches the query M? One way we could do this is to interleave the elements of C<$a> and C<$b>: =listing interleave sub interleave { my ($a, $b) = @_; return sub { my $next = $a->(); unless (defined $next) { $a = $b; $next = $a->(); } ($a, $b) = ($b, $a); $next; } } =endlisting interleave =test interleave do 'interleave'; sub upto { my ($m, $n) = @_; return sub { return $m <= $n ? $m++ : undef; }; } my $i1 = upto(1,3); my $i2 = upto(4,6); my $i = interleave($i1,$i2); for (qw(1 4 2 5 3 6)) { is($i->(),$_); } # this should be the end of stream, but it's returning a CODEref # instead. $i->()->() == undef. Is this just a sign of "the # interleaved outputs including some records (the end) more than # once?" is($i->(),undef); =endtest interleave But this has the drawback that if the record sets produced by C<$a> and C<$b> happen to overlap, the interleaved outputs will include some records more than once. We can do better if we suppose that the records will be produced in some sort of canonical order. This assumption isn't unreasonable. Typically, a database will have a natural order dictated by the physical layout of the information on the disk and will always produce records in this natural order, at least until the data is modified. For example, our program for searching the web log file always produces matching records in the order they appear in the file. Even DBM files, which don't appear to keep records in any particular order, have a natural order; this is the order in which the records will be generated by the F function. Supposing that C<$a> and C<$b> will produce records in the same order, we can perform an 'or' operation as follows: =listing Iterator_Logic.pm package Iterator_Logic; use base 'Exporter'; @EXPORT = qw(i_or_ i_or i_and_ i_and i_without_ i_without); sub i_or_ { my ($cmp, $a, $b) = @_; my ($av, $bv) = ($a->(), $b->()); return sub { if (! defined $av && ! defined $bv) { return } elsif (! defined $av) { $rv = $bv; $bv = $b->() } elsif (! defined $bv) { $rv = $av; $av = $a->() } else { my $d = $cmp->($av, $bv); if ($d < 0) { $rv = $av; $av = $a->() } elsif ($d > 0) { $rv = $bv; $bv = $b->() } else { $rv = $av; $av = $a->(); $bv = $b->() } } return $rv; } } use Curry; BEGIN { *i_or = curry(\&i_or_) } =endlisting Iterator_Logic.pm F gets a comparator function, C<$cmp>, which defines the canonical order, and two iterators, C<$a> and C<$b>. It returns a new iterator which returns the next record from either C<$a> or C<$b> in the canonical order. If C<$a> and C<$b> both produce the same record, the duplicate is discarded. It begins by kicking C<$a> and C<$b> to obtain the next record from each. If either is exhausted, it returns the record from the other; if both are exhaused, it returns C to indicate that there are no more records. C<$rv> holds the record that is to be the return value. If both input iterators produce records, the new iterator compares the records to see which should come out first. If the comparator returns zero, it means the two records are the same, and only one of them should be emitted. C<$rv> is assigned one of the two records, as appropriate, and then one or both of the iterators is kicked to produce new records for the next call. The logic is very similar to the F function of R. In fact, F is the stream analog of the 'or' operator. F is a curried version of F, called like this: BEGIN { *numeric_or = i_or { $_[0] <=> $_[1] }; *alphabetic_or = i_or { $_[0] cmp $_[1] }; } $event_times = numeric_or($access_request_times, numeric_or($report_request_times, $server_start_times)); =note make sure you test this 'and' is similar: =contlisting Iterator_Logic.pm sub i_and_ { my ($cmp, $a, $b) = @_; my ($av, $bv) = ($a->(), $b->()); return sub { my $d; until (! defined $av || ! defined $bv || ($d = $cmp->($av, $bv)) == 0) { if ($d < 0) { $av = $a->() } else { $bv = $b->() } } return unless defined $av && defined $bv; my $rv = $av; ($av, $bv) = ($a->(), $b->()); return $rv; } } BEGIN { *i_and = curry \&i_and_ } =endlisting Iterator_Logic.pm =note the code here should be more nearly analogous to 'or'. =test and-or use Curry; use Iterator_Logic; my @a = (2, 3, 5, 7, 11, 13, 17); my @b = (1, 2, 3, 4, 5, 6, 7); my (@and, @or); { my %count; for (@a, @b) { $count{$_}++ } @and = grep $count{$_}==2, sort { $a <=> $b } keys %count; @or = grep $count{$_}!=0, sort { $a <=> $b } keys %count; } print "# and: @and\n"; print "# or: @or\n"; sub l2i { my @a = @_; my $i = 0; return sub { $a[$i++]; }; } { my $and = i_and(sub { $_[0] <=> $_[1] }, l2i(@a), l2i(@b)); my $andf = i_and(sub { $_[0] <=> $_[1] }); my $andc = $andf->(l2i(@a), l2i(@b)); for (@and) { is($and->(), $_, "and uncurried"); is($andc->(), $_, "and curried"); } is($and->(), undef, "and uncurried exhausted"); is($andc->(), undef, "and curried exhausted"); } { my $or = i_or(sub { $_[0] <=> $_[1] }, l2i(@a), l2i(@b)); my $orf = i_or(sub { $_[0] <=> $_[1] }); my $orc = $orf->(l2i(@a), l2i(@b)); for (@or) { is($or->(), $_, "or uncurried"); is($orc->(), $_, "or curried"); } is($or->(), undef, "or uncurried exhausted"); is($orc->(), undef, "or curried exhausted"); } =endtest and-or =section Databases Back in R we saw the beginnings of a database system that would manufacture an iterator containing the results of a simple query. To open the database we did my $dbh = FlatDB->new($datafile); and then to perform a query, $dbh->query($filename, $value); or $dbh->callbackquery(sub { ... }); which selects the records for which the subroutine returns true. Let's extend this system to handle compound queries. Eventually, we'll want the system to support calls like this: $dbh->select("STATE = 'NY' | OWES > 100 & STATE = 'MA'"); This will require parsing of the query string, which we'll see in detail in R. In the meantime, we'll build the internals that are required to support such queries. The internals for simple queries like C<"STATE = 'NY'"> are already done, since that's exactly what the C<$dbh-\>query('STATE', 'NY')> does. We can assume that other simple queries are covered by similar simple functions, or perhaps by calls to F. What we need now are ways to combine simple queries into compound queries. =note figure out the details of simple queries. For example, is there a good way to specify the operator first and the filename and value after? The F and F functions we saw earlier will do what we want, if we modify them suitably. The main thing we need to arrange is to define a canonical order for records produced by one of the simple query iterators. In particular, we need some way for the F and F operators to recognize that their two argument iterators have generated the same output record. The natural way to do this is to tag each record with a unique ID number as it comes out of the query. Two different records will have different ID numbers. For flat-file databases, there's a natural record ID number already to hand: the record number of the record in the file. We'll need to adjust the F function so that the iterators it returns will generate record numbers. When we last saw the F function, it returned each record as a single string; this is a good opportunity to have it return a more structured piece of data: =startlisting FlatDB_Composable.pm package FlatDB_Composable; use base 'FlatDB'; use base 'Exporter'; @EXPORT_OK = qw(query_or query_and query_not query_without); use Iterator_Logic; # usage: $dbh->query(fieldname, value) # returns all records for which (fieldname) matches (value) sub query { my $self = shift; my ($field, $value) = @_; my $fieldnum = $self->{FIELDNUM}{uc $field}; return unless defined $fieldnum; my $fh = $self->{FH}; seek $fh, 0, 0; <$fh>; # discard header line my $position = tell $fh; * my $recno = 0; return sub { local $_; seek $fh, $position, 0; while (<$fh>) { chomp; * $recno++; $position = tell $fh; my @fields = split $self->{FIELDSEP}; my $fieldval = $fields[$fieldnum]; * return [$recno, @fields] if $fieldval eq $value; } return; }; } =endlisting FlatDB_Composable.pm =test query-composable use FlatDB_Composable; do 'query-composable'; my $dbh = FlatDB_Composable->new('Programs/db.txt') or die $!; my $q = $dbh->query('STATE', 'NY'); # assume order of records is_deeply($q->(),[1,'Adler','David','New York','NY','157.00']); is_deeply($q->(),[5,'Schwern','Michael','New York','NY','149658.23']); is($q->(),undef); =endtest query-composable It might be tempting to try to use Perl's built-in X variable> here instead of carrying our own synthetic C<$recno>, but that's a bad idea. We took some pains to make sure that a single database filehandle could be shared among more than one query. However, the information for C<$.> is stored inside the filehandle; since we don't want the current record number to be shared between queries, we need to store it in the query object (which is private) rather than in the filehandle (which isn't). An alternative to maintaining a special C<$recno> variable would be to use C<$position> as a record identifier, since it's already lying around, and since it has the necesary properties of being different for different records and of increasing as the query proceeds through the file. Now we need to manufacture versions of F and F that use the record ID numbers when deciding what to pass along. Because these functions are curried, we don't need to rewrite any code to do this: =contlisting FlatDB_Composable.pm BEGIN { *query_or = i_or(sub { $_[0][0] <=> $_[1][0] }); *query_and = i_and(sub { $_[0][0] <=> $_[1][0] }); } =endlisting FlatDB_Composable.pm =contlisting FlatDB_Composable.pm invisible BEGIN { *query_without = i_without(sub { $_[0][0] <=> $_[1][0] }); } sub callbackquery { my $self = shift; my $is_interesting = shift; my $fh = $self->{FH}; seek $fh, 0, SEEK_SET; <$fh>; # discard header line my $position = tell $fh; * my $recno = 0; return sub { local $_; seek $fh, $position, SEEK_SET; while (<$fh>) { $position = tell $fh; chomp; * $recno++; my %F; my @fieldnames = @{$self->{FIELDS}}; my @fields = split $self->{FIELDSEP}; for (0 .. $#fieldnames) { $F{$fieldnames[$_]} = $fields[$_]; } * return [$recno, @fields] if $is_interesting->(%F); } return; }; } 1; =endlisting FlatDB_Composable.pm The comparator function says that arguments C<$_[0]> and C<$_[1]> will be arrays of record data, and that we should compare the first element of each, which is the record number, to decide which data should come out first and to decide record identity. =note with the provided dataset, this query isn't very interesting In R, we'll build a parser which, given this query: "STATE = 'NY' | OWES > 100 & STATE = 'MA'" makes this call: query_or($dbh->query('STATE', 'NY'), query_and($dbh->callbackquery(sub { $F{OWES} > 100 }), $dbh->query('STATE', 'MA') )) and returns the resulting iterator. In the meantime, we can manufacture the iterator manually. =test query-or use Iterator_Logic; BEGIN { *query_or = i_or(sub { $_[0][0] <=> $_[1][0] }); *query_and = i_and(sub { $_[0][0] <=> $_[1][0] }); } use FlatDB_Composable; do 'query-composable'; my $dbh = FlatDB_Composable->new('Programs/db.txt') or die $!; my $q = query_or($dbh->query('STATE', 'NY'), query_and($dbh->callbackquery(sub { $F{OWES} > 100 }), $dbh->query('STATE', 'MA') )); is_deeply($q->(),[1,'Adler','David','New York','NY','157.00']); is_deeply($q->(),[5,'Schwern','Michael','New York','NY','149658.23']); is($q->(),undef); =endtest query-or The one important logical connective that's still missing is 'not'. 'not' is a little bit peculiar, logically, because its meaning is tied to the original database. If C<$q> is a query for all the people in a database who are male, then C should produce all the people from the database who are female. But the C function can't do that without visiting the original database to find the female persons. Unlike the outputs of F and F, the output of F is not a selection of the inputs. One way around this is for each query to capture a reference back to the original database that it's a query on. An alternative is to specify the database explicitly, as C<$dbh-\>query_not($q)>. Then we can implement a more general operator on queries, the so-called X operator, also known as X. =contlisting Iterator_Logic.pm # $a but not $b sub i_without_ { my ($cmp, $a, $b) = @_; my ($av, $bv) = ($a->(), $b->()); return sub { while (defined $av) { my $d; while (defined $bv && ($d = $cmp->($av, $bv)) > 0) { $bv = $b->(); } if ( ! defined $bv || $d < 0 ) { my $rv = $av; $av = $a->(); return $rv; } else { $bv = $b->(); $av = $a->(); } } return; } } BEGIN { *i_without = curry \&i_without_; *query_without = i_without(sub { my ($a,$b) = @_; $a->[0] <=> $b->[0] }); } 1; =endlisting Iterator_Logic.pm If C<$a> and C<$b> are iterators on the same database, C is an iterator which produces every record that appears in C<$a> but I in C<$b>. This is useful on its own, and it also gives us a base for 'not', which becomes something like this: =contlisting FlatDB_Composable.pm sub query_not { my $self = shift; my $q = shift; query_without($self->all, $q); } C<$self-\>all> is a database method which performs a trivial query that disgorges all the records in the database. We could implement it specially, or, less efficiently, we could simply use sub all { $_[0]->callbackquery(sub { 1 }); } 1; =endlisting FlatDB_Composable.pm A possibly amusing note is that once we have F, we no longer need F, since (V and V) is the same as (V without (V without V)). =note cartesian product corresponds to a table join =test query-not 5 use FlatDB_Composable; # MJD ??? # *FlatDB::query_without = i_without(sub { my ($a,$b) = @_; $a->[0] <=> $b->[0] }); # do 'query-not'; my $dbh = FlatDB_Composable->new('Programs/db.txt') or die $!; my $q = $dbh->query_not($dbh->query('STATE', 'NY')); # problem #1: the wrong query function is beign called. we're # getting a string instead of an arrayref. it has something to do # with the query_not. #while (my $t = $q->()) { # ok(! ref $t, "$t is a plain string"); # unlike( $t, qr/:NY:/, "$t not in NY" ) #} # arrayref version while (my $t = $q->()) { print "# @$t\n"; isnt( $t->[4], "NY", "$t->[4] is not NY"); $count++; } is($count, 4, "four non-NY records"); =endtest query-not =subsection Operator Overloading R Perl provides a feature called X that lets us write complicated query expressions more conveniently. Operator overloading allows us to redefine of Perl's built-in operator symbols to have whatever meaning we like when they are applied to our objects. Enabling the feature is simple. First we make a small change to methods like F so that they return iterators that are blessed into package C. Then we add =startlisting FlatDB_Overloaded.pm invisible package FlatDB_Overloaded; BEGIN { for my $f (qw(and or without)) { *{"query_$f"} = \&{"FlatDB_Composable::query_$f"}; } } use base 'FlatDB_Composable'; sub query { $self = shift; my $q = $self->SUPER::query(@_); bless $q => __PACKAGE__; } sub callbackquery { $self = shift; my $q = $self->SUPER::callbackquery(@_); bless $q => __PACKAGE__; } 1; =endlisting FlatDB_Overloaded.pm =contlisting FlatDB_Overloaded.pm use overload '|' => \&query_or, '&' => \&query_and, '-' => \&query_without, 'fallback' => 1; =endlisting FlatDB_Overloaded.pm at the top of F. From then on, any time a C object participates in an C<|> or C<&> operation, the specified function will be invoked instead. Now, given the following simple queries: my ($ny, $debtor, $ma) = ($dbh->query('STATE', 'NY'), $dbh->callbackquery(sub { $F{OWES} > 100 }), $dbh->query('STATE', 'MA') ); we'll be able to replace this: my $interesting = query_or($ny, query_and($debtor, $ma)) with this: my $interesting = $ny | $debtor & $ma; The operators are still Perl's built-in operators, and so obey the usual precedence and associativity rules. In particular, C<&> has higher precedence than C<|>. =test query-overload use FlatDB_Overloaded; my $dbh = FlatDB_Overloaded->new('Programs/db.txt') or die $!; my ($ny, $debtor, $ma) = ($dbh->query('STATE', 'NY'), $dbh->callbackquery(sub { $F{OWES} > 100 }), $dbh->query('STATE', 'MA') ); my $q = $ny | $debtor & $ma; print "# --$q--\n"; # <-- perl bug? note the non hex values is_deeply($q->(),[1,'Adler','David','New York','NY','157.00']); is_deeply($q->(),[5,'Schwern','Michael','New York','NY','149658.23']); is($q->(),undef); ($ny, $debtor, $ma) = ($dbh->query('STATE', 'NY'), $dbh->callbackquery(sub { my %F=@_; $F{OWES} > 200 }), $dbh->query('STATE', 'MA') ); my $q = $ny & $debtor | $ma; print "# --$q--\n"; # <-- perl bug? note the non hex values is($q->()->[0],2); is($q->()->[0],4); is($q->()->[0],5); is($q->(),undef); =endtest query-overload =Stop =section Products So far nearly all of our transformations of stream values have been essentially one-dimensional, like F and F, which take as input sequences of values and which emit sequences of new values in the same order. Similarly, functions like F take in two sequences and return a single sequence. One important type of operation we haven't covered so far produces a fundamentally two-dimensional result. The simplest example is called the X. Given two streams, (M, M, M, ...) and (M, M, M, ...) the Cartesian product contains all possible pairs of elements from the two streams: (a1, b1) (a1, b2) (a1, b3) ... (a2, b1) (a2, b2) (a2, b3) ... (a3, b1) (a3, b2) (a3, b3) ... ... To see why this might be useful, consider the Unix shell's wildcard expansion. An expression of the form C<{a,b,c}> in the shell will expand to each of the strings C, C, and C. So, for example, the command rm tricks.{ps,pdf} is equivalent to rm tricks.ps tricks.pdf One can combine wildcards; this: rm tricks.{ps,pdf}{,.gz} expands to this: rm tricks.ps tricks.ps.gz tricks.pdf tricks.pdf.gz (The C<{,.gz}> expands to either the empty string or to "C<.gz>".) The general rule for expanding such expressions is that an expression of the form A{B,C,..,Y}Z expands to ABZ ACZ ... AYZ If any of C, C, ... C, or C contains further braces, then the result above is expanded further. So, returning the the example above, rm tricks.{ps,pdf}{,.gz} is expanded first to rm tricks.ps{,.gz} tricks.pdf{,.gz} (here C is "C" and C is "C<{,.gz}>"); then C is expanded: rm tricks.ps tricks.ps.gz tricks.pdf{,.gz} and finally rm tricks.ps tricks.ps.gz tricks.pdf tricks.pdf.gz =endlisting i_without =note missing section on products like cartesian product and 'zip' Is pythagorean triples the best example you can come up with? =note Example 1: Shell curly brace expansion. Example 2: SQL table joins. Note that cartesian product of finite lists is easy; just use nested loops. This generates the matrix starting with the first row, then the second row, etc. This strategy clearly doesn't work for infinite lists, because the first row of the resulting matrix is infinite. Instead we have to adopt a diagonal approach. * Text-based database? (Tibia?) demonstrate composition operators * 20010824 Program (with iterators) to generate all the strings that match a certain regex. Use Japhy's parser? Have a bounded-length and an unbounded-length version. Interesting composition operators here. * append, ordered merge, ordered and, cartesian product * 20020121 Maybe move the currying section here from Chap. VI? * 20020123 More examples of functions produced by 'reduce'? Where did you read about the use of 'fold', anyway? One of those papers in ~mjd/misc/papers/FP? One possibility: Graham Hutton's paper "A tutorial on the universality and expressiveness of fold" in .../fold.ps. Also I think it was in the Bird and Wadler book on functional programming. Examples: Given list of digits, construct base-10 number. Given list of path components, resolve to inodes. (e.g., ("/", "usr", "local", "bin", "ezmlm", "ezmlm-make") => (2, 277985, 2130089, 3229158, ...) or resolve to true paths: -> ("/", "/usr", "/data/local", "/data/local/bin", ...). Given a hash and a list of keys, produce a list of values. * 20020121 Don't forget to use game trees as an example somewhere. Reread the Hughes paprt on why FP matters; it has an extensive example of how to build a game tree search using functional techniques. * 20020123 Function to turn streams into chapter-4 iterators. * 20030919 Function that takes a binary operator and turns it into an any-ary operator. e.g., turns a+b into sum(@p); gcd(a,b) into gcd(@p); concat2() into concatmany(); merge2 into merge * You should be able to define generic and and or operators that work reasonably on any kind of iterator. * Analogues of 'fold' for other data structures. Generic function for HTML trees: supply callback for empty tree, for text node, for tag node. Generic function for directory hierarchies: Supply callback for directory, for nondirectory file. I think they won't disappear entirely, but they will turn into a single chapter. Here's a revised outline: VII. Higher-order functions * Refer back to currying examples from Ch. VI; 'function factories' * Curried file tree walker (variation on Ch. I example) * curried version of ordered merge from Ch. VI * 'reduce' and 'combine' * For lists * For streams * Abstract versions of 'reduce' and 'combine' * 'fold' * 'fold' is universal! * Extended example: Database queries * simple queries ('name = "Smith"') * 'and'; 'or'; 'not' * operator overloading (sets up technique for Ch. IX)