Return-Path: Mailing-List: contact mjd-book-help@plover.com; run by ezmlm Delivered-To: mailing list mjd-book@plover.com Received: (qmail 3771 invoked by uid 119); 17 Oct 2001 05:31:29 -0000 Date: 17 Oct 2001 05:31:29 -0000 Message-ID: <20011017053129.3770.qmail@plover.com> From: mjd@plover.com To: mjd-book@plover.com Subject: Mark Dominus exciting book news: Chapter IV finished at last If you forgot what this list is about, or you don't know why you're getting this message, please see http://perl.plover.com/book/ To unsubscribe, send a blank message to mjd-book-unsubscribe@plover.com. The draft of chapter IV is done! It took about four months. One reason is that I kept going away to conferences and such over the summer and couldn't work. The important reason is that chapter IV is HUGE. Towards the end, I was starting to feel like I was trapped in Zeno's Paradox, with the end of the chapter as the slowly-advancing tortoise that I somehow never seemed to catch. But I did finally catch up to it. Back in September, I estimated that the final length would be about 28,000-32,000 words long. It turned out to be 31,533. I may end up splitting it into smaller chapters. Chapter IV is about iterators, which are like a generalization of filehandles: They're objects that produce data on demand. The big benefit of programming with iterators is that you can interrupt a long-running task in the middle, go off and do something else, and then pick it up again later, continuing where you left off. This, of course, is the great benefit of filehandles, so it's not surprising that it turns out to be so useful to apply the same interface to other sources of data. This is really going to pay off in chapter VIII when I discuss parsing strategies, but in the meantime I've found some excellent examples. I spend a lot of time talking about directory hierarchy traversal, because it's a common task that also serves as a template for a lot of other hierarchy traversal and search problems. There's an example of a database query system where the database is your HTTP log file. Big deal, it's just flat file databases. Oh, but this one searches the log file backwards, producing the most recent matching records before the older ones. Since you're probably most interested in the recent records, you don't have to grovel over the entire file just to get to the end. And once you find what you're looking for, you can throw away the query handle, just like with a regular filehandle. The chapter closes out by developing a complete web spider application, roughly equivalent to WWW::SimpleRobot. But it has a few twists: It respects robots.txt files, which WWW::SimpleRobot doesn't, and it only requires half as much code, because the canned iterator functions I develop in the chapter make it easy to plug together small parts. The complete outline is available at: http://perl.plover.com/book/chap04.html I'm going to take a couple of days off, and than start work on the next chapter, maybe V, or 5.5. Some advance information about the mysterious chapter 5.5 is available from http://perl.plover.com/book/announce/03 Whatever I do next, you will be able to track my daily progress at http://perl.plover.com/book/chap05.html ---------------------------------------------------------------- As promised, I've made the draft text of chapter IV available online. [ Sorry, freebies are available only to mailing list subscribers. Send mail to mjd-book-subscribe@plover.com to subscribe. ] I'd like to remind you that you absolutely must not distribute this to anyone else, and in fact I'd prefer it if you didn't even download a permanent copy to your computer. When the book is finally published, I will be very unhappy if crappy old drafts are circulating on the Internet. If the drafts started circulating, I would have to stop making them available. That would be a shame, because I like sharing the drafts. It might be tempting to save a copy, because you probably are thinking that you don't know how long the file will be around. But there's no point to doing this. Once the book is published, the complete, revised, corrected version will be available on my web site for free, so you will have no use for the old draft copies. Please make me happy and repay my trust by not copying or distributing the drafts. Also, please do not advertise the draft URL to other people. I want the drafts to be a special present for the people on my mailing list. But please do advertise the mailing list to other people. I will be very happy if more people join the mailing list. To subscribe, send email to mjd-book-subscribe@plover.com. Thank you all for your interest. I will send another message next time something happens. I hope the wait will not be as long as last time.