aboutsummaryrefslogtreecommitdiff
path: root/textproc/tagsoup/pkg-descr
blob: af120570eb65a188822d87090971c9be22675ddd (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
TagSoup - Just Keep On Truckin'

TagSoup is a SAX-compliant parser written in Java that, instead of parsing
well-formed or valid XML, parses HTML as it is found in the wild: poor,
nasty and brutish, though quite often far from short.  TagSoup is designed
for people who have to process this stuff using some semblance of a rational
application design.  By providing a SAX interface, it allows standard XML
tools to be applied to even the worst HTML.  TagSoup also includes
a command-line processor that reads HTML files and can generate either
clean HTML or well-formed XML that is a close approximation to XHTML.

WWW: http://home.ccil.org/~cowan/tagsoup