5 HTML::Parse - Deprecated, a wrapper around HTML::TreeBuilder
9 See the documentation for HTML::TreeBuilder
13 Disclaimer: This module is provided only for backwards compatibility
14 with earlier versions of this library. New code should I<not> use
15 this module, and should really use the HTML::Parser and
16 HTML::TreeBuilder modules directly, instead.
18 The C<HTML::Parse> module provides functions to parse HTML documents.
19 There are two functions exported by this module:
23 =item parse_html($html) or parse_html($html, $obj)
25 This function is really just a synonym for $obj->parse($html) and $obj
26 is assumed to be a subclass of C<HTML::Parser>. Refer to
27 L<HTML::Parser> for more documentation.
29 If $obj is not specified, the $obj will default to an internally
30 created new C<HTML::TreeBuilder> object configured with strict_comment()
31 turned on. That class implements a parser that builds (and is) a HTML
32 syntax tree with HTML::Element objects as nodes.
34 The return value from parse_html() is $obj.
36 =item parse_htmlfile($file, [$obj])
38 Same as parse_html(), but pulls the HTML to parse, from the named file.
40 Returns C<undef> if the file could not be opened, or $obj otherwise.
44 When a C<HTML::TreeBuilder> object is created, the following variables
45 control how parsing takes place:
49 =item $HTML::Parse::IMPLICIT_TAGS
51 Setting this variable to true will instruct the parser to try to
52 deduce implicit elements and implicit end tags. If this variable is
53 false you get a parse tree that just reflects the text as it stands.
54 Might be useful for quick & dirty parsing. Default is true.
56 Implicit elements have the implicit() attribute set.
58 =item $HTML::Parse::IGNORE_UNKNOWN
60 This variable contols whether unknow tags should be represented as
61 elements in the parse tree. Default is true.
63 =item $HTML::Parse::IGNORE_TEXT
65 Do not represent the text content of elements. This saves space if
66 all you want is to examine the structure of the document. Default is
69 =item $HTML::Parse::WARN
71 Call warn() with an apropriate message for syntax errors. Default is
78 HTML::TreeBuilder objects should be explicitly destroyed when you're
79 finished with them. See L<HTML::TreeBuilder>.
83 L<HTML::Parser>, L<HTML::TreeBuilder>, L<HTML::Element>
87 Copyright 1995-1998 Gisle Aas, 1999-2004 Sean M. Burke, 2005 Andy Lester,
90 This library is free software; you can redistribute it and/or
91 modify it under the same terms as Perl itself.
93 This program is distributed in the hope that it will be useful, but
94 without any warranty; without even the implied warranty of
95 merchantability or fitness for a particular purpose.
99 Currently maintained by Pete Krawczyk C<< <petek@cpan.org> >>
101 Original authors: Gisle Aas, Sean Burke and Andy Lester.
108 @EXPORT = qw(parse_html parse_htmlfile);
112 $IMPLICIT_TAGS $IGNORE_UNKNOWN $IGNORE_TEXT $WARN
115 # Backwards compatability
121 require HTML::TreeBuilder;
129 $p = _new_tree_maker() unless $p;
134 sub parse_htmlfile ($;$)
138 open(HTML, $file) or return undef;
139 $p = _new_tree_maker() unless $p;
140 $p->parse_file(\*HTML);
145 my $p = HTML::TreeBuilder->new(
146 implicit_tags => $IMPLICIT_TAGS,
147 ignore_unknown => $IGNORE_UNKNOWN,
148 ignore_text => $IGNORE_TEXT,
151 $p->strict_comment(1);