2 # This is a .pm just to (try to) make some CPAN document converters
3 # convert it happily as part of the dist's documentation tree.
4 package HTML::Element::traverse;
5 # Time-stamp: "2002-11-22 23:53:39 MST"
7 $VERSION = $VERSION = $HTML::Element::VERSION;
14 HTML::Element::traverse - discussion of HTML::Element's traverse method
18 # $element->traverse is unnecessary and obscure.
19 # Don't use it in new code.
23 C<HTML::Element> provides a method C<traverse> that traverses the tree
24 and calls user-specified callbacks for each node, in pre- or
25 post-order. However, use of the method is quite superfluous: if you
26 want to recursively visit every node in the tree, it's almost always
27 simpler to write a subroutine does just that, than it is to bundle up
28 the pre- and/or post-order code in callbacks for the C<traverse>
33 Suppose you want to traverse at/under a node $tree and give elements
34 an 'id' attribute unless they already have one.
36 You can use the C<traverse> method:
39 my $counter = 'x0000';
40 $start_node->traverse(
45 $x->attr('id', $counter++) unless defined $x->attr('id');
46 return HTML::Element::OK; # keep traversing
48 # post-order callback:
51 1, # don't call the callbacks for text nodes
55 or you can just be simple and clear (and not have to understand the
56 calling format for C<traverse>) by writing a sub that traverses the
57 tree by just calling itself:
60 my $counter = 'x0000';
63 $x->attr('id', $counter++) unless defined $x->attr('id');
64 foreach my $c ($x->content_list) {
65 give_id($c) if ref $c; # ignore text nodes
71 See, isn't that nice and clear?
73 But, if you really need to know:
75 =head1 THE TRAVERSE METHOD
77 The C<traverse()> method is a general object-method for traversing a
78 tree or subtree and calling user-specified callbacks. It accepts the
83 =item $h->traverse(\&callback)
85 =item or $h->traverse(\&callback, $ignore_text)
87 =item or $h->traverse( [\&pre_callback,\&post_callback] , $ignore_text)
91 These all mean to traverse the element and all of its children. That
92 is, this method starts at node $h, "pre-order visits" $h, traverses its
93 children, and then will "post-order visit" $h. "Visiting" means that
94 the callback routine is called, with these arguments:
96 $_[0] : the node (element or text segment),
97 $_[1] : a startflag, and
100 If the $ignore_text parameter is given and true, then the pre-order
101 call I<will not> be happen for text content.
103 The startflag is 1 when we enter a node (i.e., in pre-order calls) and
104 0 when we leave the node (in post-order calls).
106 Note, however, that post-order calls don't happen for nodes that are
107 text segments or are elements that are prototypically empty (like "br",
110 If we visit text nodes (i.e., unless $ignore_text is given and true),
111 then when text nodes are visited, we will also pass two extra
112 arguments to the callback:
114 $_[3] : the element that's the parent
116 $_[4] : the index of this text node
117 in its parent's content list
119 Note that you can specify that the pre-order routine can
120 be a different routine from the post-order one:
122 $h->traverse( [\&pre_callback,\&post_callback], ...);
124 You can also specify that no post-order calls are to be made,
125 by providing a false value as the post-order routine:
127 $h->traverse([ \&pre_callback,0 ], ...);
129 And similarly for suppressing pre-order callbacks:
131 $h->traverse([ 0,\&post_callback ], ...);
133 Note that these two syntaxes specify the same operation:
135 $h->traverse([\&foo,\&foo], ...);
136 $h->traverse( \&foo , ...);
138 The return values from calls to your pre- or post-order
139 routines are significant, and are used to control recursion
142 These are the values you can return, listed in descending order
143 of my estimation of their usefulness:
147 =item HTML::Element::OK, 1, or any other true value
149 ...to keep on traversing.
151 Note that C<HTML::Element::OK> et
152 al are constants. So if you're running under C<use strict>
153 (as I hope you are), and you say:
154 C<return HTML::Element::PRUEN>
155 the compiler will flag this as an error (an unallowable
156 bareword, specifically), whereas if you spell PRUNE correctly,
157 the compiler will not complain.
159 =item undef, 0, '0', '', or HTML::Element::PRUNE
161 ...to block traversing under the current element's content.
162 (This is ignored if received from a post-order callback,
163 since by then the recursion has already happened.)
164 If this is returned by a pre-order callback, no
165 post-order callback for the current node will happen.
166 (Recall that if your callback exits with just C<return;>,
167 it is returning undef -- at least in scalar context, and
168 C<traverse> always calls your callbacks in scalar context.)
170 =item HTML::Element::ABORT
172 ...to abort the whole traversal immediately.
173 This is often useful when you're looking for just the first
174 node in the tree that meets some criterion of yours.
176 =item HTML::Element::PRUNE_UP
178 ...to abort continued traversal into this node and its parent
179 node. No post-order callback for the current or parent
182 =item HTML::Element::PRUNE_SOFTLY
184 Like PRUNE, except that the post-order call for the current
189 Almost every task to do with extracting information from a tree can be
190 expressed in terms of traverse operations (usually in only one pass,
191 and usually paying attention to only pre-order, or to only
192 post-order), or operations based on traversing. (In fact, many of the
193 other methods in this class are basically calls to traverse() with
194 particular arguments.)
196 The source code for HTML::Element and HTML::TreeBuilder contain
197 several examples of the use of the "traverse" method to gather
198 information about the content of trees and subtrees.
200 (Note: you should not change the structure of a tree I<while> you are
203 [End of documentation for the C<traverse()> method]
205 =head2 Traversing with Recursive Anonymous Routines
207 Now, if you've been reading
208 I<Structure and Interpretation of Computer Programs> too much, maybe
209 you even want a recursive lambda. Go ahead:
212 my $counter = 'x0000';
216 $x->attr('id', $counter++) unless defined $x->attr('id');
217 foreach my $c ($x->content_list) {
218 $give_id->($c) if ref $c; # ignore text nodes
221 $give_id->($start_node);
225 It's a bit nutty, and it's I<still> more concise than a call to the
228 It is left as an exercise to the reader to figure out how to do the
229 same thing without using a C<$give_id> symbol at all.
231 It is also left as an exercise to the reader to figure out why I
232 undefine C<$give_id>, above; and why I could achieved the same effect
235 $give_id = 'I like pie!';
239 $give_id = sub { print "Mmmm pie!\n" };
243 $give_id = sub { print "I'm $give_id and I like pie!\n" };
245 $give_id = \$give_id;
247 $give_id = { 'pie' => \$give_id, 'mode' => 'a la' };
249 =head2 Doing Recursive Things Iteratively
251 Note that you may at times see an iterative implementation of
252 pre-order traversal, like so:
255 my @to_do = ($tree); # start-node
257 my $this = shift @to_do;
260 $this->attr('id', $counter++)
261 unless defined $this->attr('id');
263 unshift @to_do, grep ref $_, $this->content_list;
264 # Put children on the stack -- they'll be visited next
268 This can I<under certain circumstances> be more efficient than just a
269 normal recursive routine, but at the cost of being rather obscure. It
270 gains efficiency by avoiding the overhead of function-calling, but
271 since there are several method dispatches however you do it (to
272 C<attr> and C<content_list>), the overhead for a simple function call
275 =head2 Pruning and Whatnot
277 The C<traverse> method does have the fairly neat features of
278 the C<ABORT>, C<PRUNE_UP> and C<PRUNE_SOFTLY> signals. None of these
279 can be implemented I<totally> straightforwardly with recursive
280 routines, but it is quite possible. C<ABORT>-like behavior can be
281 implemented either with using non-local returning with C<eval>/C<die>:
283 my $died_on; # if you need to know where...
286 ... maybe set $died_on to $_[0] and die "ABORT_TRAV" ...
287 ... else call thing($child) for each child...
288 ...any post-order visiting $_[0]...
290 eval { thing($node) };
292 if($@ =~ m<^ABORT_TRAV>) {
293 ...it died (aborted) on $died_on...
295 die $@; # some REAL error happened
299 or you can just do it with flags:
301 my($abort_flag, $died_on);
304 ... maybe set $abort_flag = 1; $died_on = $_[0]; return;
305 foreach my $c ($_[0]->content_list) {
307 return if $abort_flag;
309 ...any post-order visiting $_[0]...
313 $abort_flag = $died_on = undef;
315 ...if defined $abort_flag, it died on $died_on
323 Copyright 2000,2001 Sean M. Burke
327 Sean M. Burke, E<lt>sburke@cpan.orgE<gt>