Return-Path: <mjd@plover.com>
Mailing-List: contact mjd-book-help@plover.com; run by ezmlm
Delivered-To: mailing list mjd-book@plover.com
Received: (qmail 3771 invoked by uid 119); 17 Oct 2001 05:31:29 -0000
Date: 17 Oct 2001 05:31:29 -0000
Message-ID: <20011017053129.3770.qmail@plover.com>
From: mjd@plover.com
To: mjd-book@plover.com
Subject: Mark Dominus exciting book news: Chapter IV finished at last

If you forgot what this list is about, or you don't know why you're
getting this message, please see

        http://perl.plover.com/book/

To unsubscribe, send a blank message to mjd-book-unsubscribe@plover.com.

The draft of chapter IV is done!  It took about four months.  One
reason is that I kept going away to conferences and such over the
summer and couldn't work.  The important reason is that chapter IV is
HUGE.  Towards the end, I was starting to feel like I was trapped in
Zeno's Paradox, with the end of the chapter as the slowly-advancing
tortoise that I somehow never seemed to catch.  But I did finally
catch up to it.  Back in September, I estimated that the final length
would be about 28,000-32,000 words long.  It turned out to be 31,533.
I may end up splitting it into smaller chapters.

Chapter IV is about iterators, which are like a generalization of
filehandles: They're objects that produce data on demand.  The big
benefit of programming with iterators is that you can interrupt a
long-running task in the middle, go off and do something else, and
then pick it up again later, continuing where you left off.  This, of
course, is the great benefit of filehandles, so it's not surprising
that it turns out to be so useful to apply the same interface to other
sources of data.  

This is really going to pay off in chapter VIII when I discuss parsing
strategies, but in the meantime I've found some excellent examples.  I
spend a lot of time talking about directory hierarchy traversal,
because it's a common task that also serves as a template for a lot of
other hierarchy traversal and search problems.  There's an example of
a database query system where the database is your HTTP log file.  Big
deal, it's just flat file databases.  Oh, but this one searches the
log file backwards, producing the most recent matching records before
the older ones.  Since you're probably most interested in the recent
records, you don't have to grovel over the entire file just to get to
the end.  And once you find what you're looking for, you can throw
away the query handle, just like with a regular filehandle.

The chapter closes out by developing a complete web spider
application, roughly equivalent to WWW::SimpleRobot.  But it has a few
twists: It respects robots.txt files, which WWW::SimpleRobot doesn't,
and it only requires half as much code, because the canned iterator
functions I develop in the chapter make it easy to plug together small
parts.

The complete outline is available at:

        http://perl.plover.com/book/chap04.html

I'm going to take a couple of days off, and than start work on the
next chapter, maybe V, or 5.5.  Some advance information about the
mysterious chapter 5.5 is available from

        http://perl.plover.com/book/announce/03

Whatever I do next, you will be able to track my daily progress at

        http://perl.plover.com/book/chap05.html


----------------------------------------------------------------


As promised, I've made the draft text of chapter IV available online.

     [ Sorry, freebies are available only to mailing list subscribers. 
       Send mail to 
             mjd-book-subscribe@plover.com
       to subscribe.
     ]

I'd like to remind you that you absolutely must not distribute this to
anyone else, and in fact I'd prefer it if you didn't even download a
permanent copy to your computer.  When the book is finally published,
I will be very unhappy if crappy old drafts are circulating on the
Internet.  If the drafts started circulating, I would have to stop
making them available.  That would be a shame, because I like sharing
the drafts.

It might be tempting to save a copy, because you probably are thinking
that you don't know how long the file will be around.  But there's no
point to doing this.  Once the book is published, the complete,
revised, corrected version will be available on my web site for free,
so you will have no use for the old draft copies.  Please make me
happy and repay my trust by not copying or distributing the drafts.

Also, please do not advertise the draft URL to other people.  I want
the drafts to be a special present for the people on my mailing list.
But please do advertise the mailing list to other people.  I will be
very happy if more people join the mailing list.  To subscribe, send
email to mjd-book-subscribe@plover.com.

Thank you all for your interest.  I will send another message next
time something happens.  I hope the wait will not be as long as last
time.

