The html-parser package is a variant language implementation of the Python's SGML parser (sgmllib.py), HTML parser (htmllib.py) and Formatter (formatter.py).
The sgml-parser.rb defines a class SGMLParser which serves as the basis for parsing text files formatted in SGML (Standard Generalized Mark-up Language). In fact, it does not provide a full SGML parser -- it only parses SGML insofar as it is used by HTML, and the module only exists as a base for the HTMLParser class.
Please see <URL:http://www.python.org/doc/current/lib/module-sgmllib.html> for detail.
The html-parser.rb defines a class HTMLParser which is a parser for HTML documents.
Please see <URL:http://www.python.org/doc/current/lib/module-htmllib.html> for detail.
The formatter.rb defines 4 classes -- NullFormatter, AbstractFormatter, NullWriter and DumbWriter -- which is a generic output formatter and device interface.
Please see <URL:http://www.python.org/doc/current/lib/module-formatter.html> for detail.
The htmltest.rb is a sample script using html-parser package.
Usage: htmltest.rb [HTML_FILE]
ex.) htmltest.rb index.html
ruby install.rb
or
cp -p formatter.rb html-parser.rb sgml-parser.rb SOMEWHERE
Takahiro Maebashi <maebashi@iij.ad.jp>
Katsuyuki Komatsu <komatsu@sarion.co.jp>
Fix array concatination statement in install_rb() of the install.rb for Ruby 1.6.2 or later. Reported by Ed L Cashin <ecashin@terry.uga.edu>.
[ruby-list:22188] Incorporated html-parser.rb and sgml-parser.rb patch contributed by Ryunosuke Ohshima <ryu@jaist.ac.jp>. Add README.rd and install.rb by the packager.
[ruby-dev:8302] Avoid arity check error of NullFormatter#push_font in formatter.rb by the packager.
html-parser-19990912.tar.gz released by the author.
[ruby-list:13345] html-parser-19990406.tar.gz released by the author.
[ruby-list:12521] html-parser-19990303.tar.gz released by the author.
[ruby-list:5974] html-parser.tar.gz (first release) released by the author.