1 Changelog for HTML-Tree
3 3.23 Sun Nov 12 11:09:31 CST 2006
4 [THINGS THAT MAY BREAK YOUR CODE OR TESTS]
5 * Mark-Jason Dominus points out that the fix for as_html was not
6 proper, and broken behavior should never be codified. Fixed
7 as_html so an empty string doesn't encode entites, instead of
8 blaming the behavior on HTML::Entities. (RT 18571)
10 3.22 Sat Nov 11 21:23:22 CST 2006
11 [THINGS THAT MAY BREAK YOUR CODE OR TESTS]
12 * HTML::Element::as_XML now only escapes five characters, instead
13 of escaping everything but alphanumerics and spaces. This is
14 more in line with the XML spec, and will no longer escape wide
15 characters as two (or more) entities. Resolves RT 14260. Thanks
16 to Carl Franks and somewhere [at] confuzzled.lu for assistance.
19 * A string comparison was commented to use lc() on both sides, but
20 didn't. This caused HTML::Element::look_down to not properly find
21 elements in certain cases. Thanks to Andrew Suhachov. (RT 21114)
24 * Added several new tests and enhanced others. Thanks to Rocco
25 Caputo for t/attributes.t, and several others for providing
26 test cases in their RT bugs.
29 * Fixed description of HTML::Element::all_attr_names. Thanks
30 to dsteinbrunner [at] pobox.com for catching it.
31 * Fixed example code in HTML::Element::push_content. Thanks
32 to dsteinbrunner [at] pobox.com for catching it. (RT 21293)
33 * Fixed description of HTML::Element::as_HTML. Thanks to
34 Mark-Jason Dominus for catching it. (RT 18569)
37 3.21 Sun Aug 6 19:10:00 CDT 2006
39 * Updated HTML::Parser requirement to 3.46 to fix a bug in
40 tag-rendering.t, noted in RT 20816 and 19796. Thanks to
41 Gordon Lack and Ricardo Signes
42 * Fixed HTML::TreeBuilder to not remove where it shouldn't,
43 using patch supplied in RT 17481. Thanks to Chris Madsen.
46 * HTML-Tree has a new maintainer: Pete Krawczyk <petek@cpan.org>
48 3.20 Sun Jun 4 22:24:38 CDT 2006
49 No code changes. Just making sure all notes go to Andy Lester,
52 3.19_04 Wed Feb 1 09:57:35 PST 2006
54 * Modified starttag() so that it could render a literal HTML::Element
55 correctly. Added a test case for this in tag-rendering.t
56 Thanks to Terrence Brannon.
59 3.19_03 Fri Nov 25 22:20:51 CST 2005
60 [THINGS THAT MAY BREAK YOUR CODE]
61 * The store_declarations() method has been restored, but defaults
62 to true instead of false.
65 3.19_02 Thu Nov 24 22:51:40 CST 2005
67 [THINGS THAT MAY BREAK YOUR CODE]
68 * The store_declarations() method has been removed.
69 * Non-closing HTML tags like <IMG> are now rendered as <IMG />.
70 * All values in tags are now double-quoted. Previously, all-numeric
71 values weren't quoted.
74 * The DOCTYPE declaration now always gets put back at the top of
75 the recreated document. Thanks, Terrence Brannon.
76 * Non-closing HTML tags like <IMG> are now rendered as <IMG />.
77 Thanks to Ian Malpass.
78 * All values in tags are now double-quoted.
81 * Updated docs from Terrence Brannon.
84 2005-11-09 Andy Lester
86 Release 3.19_01 -- No new functionality. New tests, though!
87 Thanks to the Chicago Perl Mongers for their work.
89 2003-09-15 Sean M. Burke <sburke@cpan.org>
91 Release 3.18 -- bugfix to test, adding qr// to look_(down|up)
93 Accepting Iain 'Spoon' Truskett's neat patch for qr// as lookdown
94 operators (previously you had to do sub { $_[0]=~ m/.../}).
96 Rewrote some tests, notably parsefile.t, which was pointlessly
97 failing because of an incompatibility with an HTML::Parser version.
99 Removed the disused ancient utils "dtd2pm" and "ent" from the dist.
104 2003-01-18 Sean M. Burke <sburke@cpan.org>
106 Release 3.17 -- minor bugfix
108 HTML::Element : Making as_HTML('',...) work just like
109 as_HTML(undef,...). Also fixing as_XML's docs to remove mention of
110 an unimplemented feature (specifying what characters to escape).
113 2002-11-06 Sean M. Burke <sburke@cpan.org>
115 Release 3.16 -- just fixing a doc typo.
118 2002-11-06 Sean M. Burke <sburke@cpan.org>
120 Release 3.15 -- a few new features.
122 Added the aliases "descendents" and "find" to HTML::Element.
124 Added a new method "simplify_pres" to HTML::Element.
127 2002-10-19 Sean M. Burke <sburke@cpan.org>
128 Release 3.14 -- minor bugfix
130 Just fixes a few problems in HTML::Element with the number_lists
134 2002-08-16 Sean M. Burke <sburke@cpan.org>
136 Release 3.13 -- basically a bugfix version
138 It turns out that 3.12 had a hideous HTML::TreeBuilder bug that
139 made the whole thing damn near useless. Fixed.
140 Many many thanks to Michael Koehne for catching this!
142 Wrote t/parse.t, to catch this sort of thing from happening again.
144 Fixed a bug that would treat <td> outside any table context
145 as <tr><table><td> instead of <table><tr><td>
149 2002-07-30 Sean M. Burke <sburke@cpan.org>
153 Added as_trimmed_text method to HTML::Element, as described
154 (prophesied?) in the fantabulous new book /Perl & LWP/.
156 Bugfix: fixed unshift_content when given a LoL. (_parent wasn't
159 HTML::Element and HTML::TreeBuilder now enforce at least some
160 minimal sanity on what can be in a tag name. (Notably, no spaces,
161 slashes, or control characters.)
163 Semi-bugfix: $element->replace_with(...) can now take LoLs in its
166 Bumped HTML::Element version up to 3.12 (right from 3.09)
168 Semi-bugfix: as_XML now doesn't use named entities in its return
169 value -- it always uses numeric entities.
171 Added behavior: new_frow_lol can now do clever things in list
174 HTML::Tree -- added blurb for /Perl & LWP/
176 HTML::TreeBuilder -- added blurb for /Perl & LWP/
177 Also added a few tweaks to do better with XHTML parsing.
178 Added guts() and disembowel() methods, for parsing document fragments.
181 TODO: desperately need to add tests to t/
184 2001-03-14 Sean M. Burke <sburke@cpan.org>
188 Bugfix: Klaus-Georg Adams <Klaus-Georg.Adams@sap.com> reported that
189 the topmost frameset element in an HTML::TreeBuilder tree wasn't
190 getting its _parent attribute set. Fixed.
192 Minor bugfix: the root element of a new HTML::TreeBuilder tree was
193 missing its initial "_implicit" attribute. Fixed.
195 Two handy new methods in HTML::TreeBuilder:
196 * HTML::TreeBuilder->new_from_content(...)
197 * HTML::TreeBuilder->new_from_file($filename)
198 a.k.a.: HTML::TreeBuilder->new_from_file($fh)
200 2001-03-10 Sean M. Burke <sburke@cpan.org>
204 Now bundling three relevent The Perl Journal articles by me:
205 HTML::Tree::AboutObjects, HTML::Tree::AboutTrees, and
206 HTML::Tree::Scanning.
208 Vadims_Beilins@swh-t.lv observes that $h->push_content(LoL)
209 doesn't take care of _parent bookkeeping right. FIXED.
210 John Woffindin <john@xoren.co.nz> notes a similar bug in clone();
213 Adding no_space_compacting feature to TreeBuilder, at suggestion of
214 Victor Wagner <vitus@ice.ru>.
216 Incorporating the clever suggestion (from Martin H. Sluka,
217 <martin@sluka.de>) that $element->extract_links's returned LoL
218 should contain a third item (for the attribute name) in the
219 per-link listref. I also add a fourth item, the tagname of the
222 New method, "elementify", in HTML::TreeBuilder.
224 Various improvements and clarifications to the POD in
225 HTML::TreeBuilder and HTML::Element.
227 Some new methods in HTML::Element: "number_lists",
228 "objectify_text", and "deobjectify_text".
230 HTML::Element and HTML::TreeBuilder versions both bumped up from
231 3.08 to 3.10, to keep pace with the HTML::Tree version.
233 2001-01-21 Sean M. Burke <sburke@cpan.org>
237 Changed HTML/Element/traverse.pod to HTML/Element/traverse.pm
239 Wrote overview file: HTML/Tree.pm
241 2000-11-03 Sean M. Burke <sburke@cpan.org>
245 In Element and TreeBuilder: fixed handling of textarea content --
246 Thanks to Ronald J Kimball <rjk@linguist.dartmouth.edu> for
249 In Element: a few internal changes to make it subclassable by the
250 forthcoming XML::Element et al.
252 2000-10-20 Sean M. Burke <sburke@cpan.org>
256 In Element: made new_from_lol accept existing HTML::Element objects
257 as part of the loltree. Thanks for Bob Glickstein
258 <bobg@zanshin.com> for the suggestion.
260 In Element: feeding an arrayref to push_content, unshift_content,
261 or splice_content now implicitly calls new_from_lol.
263 In Element: reversed the change in as_HTML/XML/Lisp_form that would
264 skip dumping attributes with references for values. It reacted
265 undesirably with objects that overload stringify; to wit, URI.pm
268 2000-10-15 Sean M. Burke <sburke@cpan.org>
272 In Element: methods added: $x->id, $x->idf, $x->as_XML,
275 In Element: internal optimization: as_HTML no longer uses the
276 tag() accessor. Should cause no change in behavior.
278 In Element: as_HTML (via starttag) no longer tries to dump
279 attributes whose values are references, or whose names
280 are null-string or "/". This should cause no change in
281 behavior, as there's no normal way for any document to parse
282 to a tree containing any such attributes.
284 In Element: minor rewordings or typo-fixes in the POD.
286 2000-10-02 Sean M. Burke <sburke@cpan.org>
290 In Element: fixed typo in docs for the content_refs_list method.
292 foreach my $item ($h->content_array_ref) {
294 foreach my $item (@{ $h->content_array_ref }) {
296 In Element: fixed bug in $h->left that made it useless in scalar
297 context. Thanks to Toby Thurston <toby@wildfire.dircon.co.uk> for
300 In Element: added new method $h->tagname_map
302 In TreeBuilder: Some minor corrections to the logic of handling TD
303 and TH elements -- basically bug fixes, in response to an astute
304 bug report from Toby Thurston <toby@wildfire.dircon.co.uk>.
306 In TreeBuilder: Fixed lame bug that made strict-p mode nearly
307 useless. It may now approach usability!
309 This dist contains a simple utility called "htmltree" that parses
310 given HTML documents, and dumps their parse tree. (It's not
311 actually new in this version, but was never mentioned before.)
313 In TreeBuilder, a change of interest only to advanced programmers
314 familiar with TreeBuilder's source and perpetually undocumented
315 features: there is no $HTML::TreeBuilder::Debug anymore.
317 If you want to throw TreeBuilder into Debug mode, you have to do it
318 at compile time -- by having a line like this BEFORE any line that
319 says "use HTML::TreeBuilder":
321 sub HTML::TreeBuilder::DEBUG () {3};
323 where "5" is whatever debug level (0 for no debug output) that you
324 want TreeBuilder to be in. All the in TreeBuilder that used to say
326 print "...stuff..." if $Debug > 1;
330 print "...stuff..." if DEBUG > 1;
332 where DEBUG is the constant-sub whose default value set at compile
333 time is 0. The point of this is that for the typical
334 compilation-instance of TreeBuilder will run with DEBUG = 0, and
335 having that set at compile time means that all the "print ... if
336 DEBUG" can be optimized away at compile time, so they don't appear
337 in the code tree for TreeBuilder. This leads to a typical ~10%
338 speedup in TreeBuilder code, since it's no longer having to
339 constantly interrogate $Debug.
341 Note that if you really do NEED the debug level to vary at runtime,
343 sub HTML::TreeBuilder::DEBUG () { $HTML::TreeBuilder::DEBUG };
344 and then change that variable's value as need be. Do this only if
347 BTW, useful line to have in your ~/.cshrc:
348 alias deparse 'perl -MO=Deparse \!*'
349 I found it useful for deparsing TreeBuilder.pm to make sure that
350 the DEBUG-conditional statements really were optimized away
353 2000-09-04 Sean M. Burke <sburke@cpan.org>
357 In TreeBuilder: added p_strict, an option to somewhat change
358 behavior of implicating "</p>"s.
359 Added store_comments, store_declarations, store_pis, to control
360 treatment of comments, declarations, and PIs when parsing.
362 In Element: documented the pseudo-elements (~comment, ~declaration,
363 ~pi, and ~literal). Corrected as_HTML dumping of ~pi elements.
365 Removed formfeeds from source of Element and TreeBuilder --
366 different editors (and Perl) treat them differently as far as
367 incrementing the line counter; so Perl might report an error on
368 line 314, but preceding formfeeds might make your editor think that
369 that line is actually 316 or something, resulting in confusion all
372 2000-08-26 Sean M. Burke <sburke@cpan.org>
376 Introduced an optimization in TreeBuilder's logic for checking that
377 body-worthy elements are actually inserted under body. Should
378 speed things up a bit -- it saves two method calls per typical
379 start-tag. Hopefully no change in behavior.
381 Whoops -- 3.01's change in the return values of TreeBuilder's
382 (internal) end(...) method ended up breaking the processing of list
383 elements. Fixed. Thanks to Claus Schotten for spotting this.
385 Whoops 2 -- Margarit A. Nickolov spotted that TreeBuilder
386 documented a implicit_body_p_tag method, but the module didn't
387 define it. I must have deleted it some time or other. Restored.
390 2000-08-20 Sean M. Burke <sburke@cpan.org>
394 Fixed a silly typo in Element that made delete_ignorable_whitespace
397 Made Element's $tree->dump take an optional output-filehandle
400 Added (restored?) "use integer" to TreeBuilder.
403 2000-08-20 Sean M. Burke <sburke@cpan.org>
407 Now depends on HTML::Tagset for data tables of HTML elements and
408 their characteristics.
410 Version numbers for HTML::TreeBuilder and HTML::Element, as well as
411 for the package, moved forward to 3.01.
413 Minor changes to HTML::TreeBuilder's docs.
415 HTML::TreeBuilder now knows not to amp-decode text children of
416 CDATA-parent elements. Also exceptionally stores comments under
417 CDATA-parent elements.
419 TreeBuilder should now correctly parse documents with frameset
420 elements. Tricky bunch of hacks.
422 TreeBuilder now ignores those pointless "x-html" tags that a
423 certain standards-flouting monopolistic American software/OS
424 company's mailer wraps its HTML in.
426 Introduced "tweaks" in HTML::TreeBuilder -- an experimental
427 (and quite undocumented) feature to allow specifying callbacks
428 to be called when specific elements are closed; makes possible
429 rendering (or otherwise scanning and/or manipulating) documents
430 as they are being parsed. Inspired by Michel Rodriguez's clever
431 XML::Twig module. Until I document this, email me if you're
434 HTML::Element's as_HTML now knows not to amp-escape children of
435 CDATA-parent elements. Thanks to folks who kept reminding me about this.
437 HTML::Element's as_HTML can now take an optional parameter
438 specifying which non-empty elements will get end-tags omitted.
440 HTML::Element's traverse's docs moved into separate POD,
441 HTML::Element::traverse.
443 Added HTML::Element methods all_attr_names and
444 all_external_attr_names. Fixed bug in all_external_attr.
446 Added HTML::Element method delete_ignorable_whitespace.
447 (Actually just moved from HTML::TreeBuilder, where it was
448 undocumented, and called tighten_up.)
450 Adding a bit of sanity checking to Element's look_down, look_up.
452 Added some formfeeds to the source of Element and TreeBuilder,
453 to make hardcopy a bit more readable.
455 2000-06-28 Sean M. Burke <sburke@cpan.org>
459 Fixed doc typo for HTML::Element's lineage_tag_names method.
461 Fixed lame bug in HTML::Element's all_external_attr that made it
462 quite useless. Thanks to Rich Wales <richw@webcom.com> for the bug
465 Changed as_text to no longer DEcode entities, as it formerly did,
466 and was documented to. Since entities are already decoded by time
467 text is stored in the tree, another decoding step is wrong. Neither
468 me nor Gisle Aas can remember what that was doing there in the
471 Changed as_text to not traverse under 'style' and 'script'
472 elements. Rewrote as_text's traverser to be iterative.
474 Added a bit of text to HTML::AsSubs to recommend using XML::Generator.
477 2000-06-12 Sean M. Burke <sburke@cpan.org>
479 Release 0.67. Just changes to HTML::Element...
481 Introduced look_up and look_down. Thanks to the folks on the
482 libwww list for helping me find the right form for that idea.
483 Deprecated find_by_attribute
485 Doc typo fixed: at one point in the discussion of "consolidating
486 text", I said push_content('Skronk') when I meant
487 unshift_content('Skronk'). Thanks to Richard Y. Kim (ryk@coho.net)
488 for pointing this out.
490 Added left() and right() methods.
492 Made address([address]) accept relative addresses (".3.0.1")
494 Added content_array_ref and content_refs_list.
496 Added a bit more clarification to bits of the Element docs here and there.
498 Made find_by_tag_name work iteratively now, for speed.
501 2000-05-18 Sean M. Burke <sburke@cpan.org>
505 Noting my new email address.
507 Fixed bug in HTML::Element::detach_content -- it would return
508 empty-list, instead of returing the nodes detached.
510 Fixed big in HTML::Element::replace_with_content -- it would
511 accidentally completely kill the parent's content list!
512 Thanks to Reinier Post and others for spotting this error.
514 Fixed big in HTML::Element::replace_with -- it put replacers
515 in the content list of of the new parent, !but! forgot to update
516 each replacer's _parent attribute.
517 Thanks to Matt Sisk for spotting this error.
520 2000-03-26 Sean M. Burke <sburke@netadventure.net>
524 Important additions to HTML::Element :
526 Totally reimplemented the traverse() method, and added features,
527 now providing a somewhat-new interface. It's still
528 backwards-compatible both syntactically and semantically.
530 Added methods: content_list, detach_content, replace_linkage,
531 normalize_content, preinsert, postinsert, and has_insane_linkage.
533 $h->attr('foo', undef) now actually deletes the attribute
534 'foo' from $h, instead of setting it to undef. Hopefully
535 this won't break any existing code!
537 Rearranged the order of some sections in the Element docs
538 for purely pedagogical reasons.
540 Bugfix: $tree->clone failed to delete the internal
541 _head and _body attributes of the clone (used by TreeBuilder),
542 $tree->clone->delete ended up deleting most/all of the original!
543 Fixed. Added cavets to the docs warning against cloning
544 TreeBuilder objects that are in mid-parse (not that I think most
545 users are exactly rushing to do this).
546 Thanks to Bob Glickstein for finding and reporting this bug.
548 Added some regression/sanity tests in t/
550 A bit more sanity checking in TreeBuilder: checks for _head and
551 _body before including it.
553 Modded TreeBuilder's calls to traverse() to be use new [sub{...},0]
554 calling syntax, for sake of efficiency.
556 Added some undocumented and experimental code in Element and
557 TreeBuilder for using HTML::Element objects to represent
558 comments, PIs, declarations, and "literals".
560 2000-03-08 Sean M. Burke <sburke@netadventure.net>
564 Bugfix: $element->replace_with_content() would cause
565 a fatal error if any of $element's content nodes were
566 text segments. Fixed.
568 2000-03-08 Sean M. Burke <sburke@netadventure.net>
572 Fixed a typo in the SYNOPSIS of TreeBuilder.pm: I had "->destroy" for
575 Added $element->clone and HTML::Element->clone_list(nodes) methods,
576 as Marek Rouchal very helpfully suggested.
578 $tree->as_HTML can now indent, hopefully properly. The logic to do
579 so is pretty frightening, and regrettably doesn't wrap, and it's
580 not obvious how to make it capable of doing so.
582 $tree->as_text can now take a 'skip_dels' parameter.
584 Added $h->same_as($j) method.
586 Added $h->all_attr method.
588 Added $h->new_from_lol constructor method.
591 1999-12-18 Sean M. Burke <sburke@netadventure.net>
595 Incremented HTML::AsSubs version to 1.13, and HTML::Parse version
596 to 2.7, to avoid version confusion with the old (<0.60) HTML-Tree
599 Re-simplified the options to HTML::Element::traverse, removing the
600 verbose_for_text option. (The behavior that it turned on, is now
601 always on; this should not cause any problems with any existing
604 Fixed HTML::Element::delete_content, and made an
605 HTML::TreeBuilder::delete to override it for TreeBuilder nodes,
606 which have their own special attributes.
608 HTML::Element::find_by_attribute, find_by_attribute, and get_attr_i
609 now behave differently in scalar context, if you're the sort that
610 likes context on method calls. HTML::Element::descendant is now
611 optimized in scalar context.
613 Fixed up some of the reporting of lineages in some $Debug-triggered
616 Fixed minor bug in updating pos when a text node under HTML
617 implicates BODY (and maybe P).
619 You should not use release 0.61
623 1999-12-15 Sean M. Burke <sburke@netadventure.net>
627 Versions in this dist:
629 HTML::TreeBuilder: 2.91
633 No longer including the Formatter modules.
635 Lots of new methods and changes in HTML::Element; reorganized docs.
637 Added new HTML tags to HTML::Element's and HTML::TreeBuilder's
640 Reworked the logic in HTML::TreeBuilder. Previous versions dealt
641 badly with tables, and attempts to enforce content-model rules
642 occasionally went quite awry. This new version is much less
643 agressive about content-model rules, and works on the principle
644 that if the HTML source is cock-eyed, there's limits to what can be
645 done to keep the syntax tree from being cock-eyed.
647 HTML::TreeBuilder now also tries to ignore ignorable whitespace.
648 The resulting parse trees often have half (or fewer) the number of
649 nodes, without all the ignorable " " nodes like before.
653 1999-12-15 Gisle Aas <gisle@aas.no>
657 Make it compatible with HTML-Parser-3.00
661 1999-11-10 Gisle Aas <gisle@aas.no>
665 Fix SYNOPSIS for HTML::FormatText as suggested by
666 Michael G Schwern <schwern@pobox.com>
668 Updated my email address.
672 1998-07-07 Gisle Aas <aas@sn.no>
676 Avoid new warnings introduced by perl5.004_70
680 1998-04-01 Gisle Aas <aas@sn.no>
682 Release 0.50, the HTML::* modules the dealt with HTML syntax trees
683 was unbundled from libwww-perl-5.22.