blob: af120570eb65a188822d87090971c9be22675ddd (
plain) (
blame)
1
2
3
4
5
6
7
8
9
10
11
12
|
TagSoup - Just Keep On Truckin'
TagSoup is a SAX-compliant parser written in Java that, instead of parsing
well-formed or valid XML, parses HTML as it is found in the wild: poor,
nasty and brutish, though quite often far from short. TagSoup is designed
for people who have to process this stuff using some semblance of a rational
application design. By providing a SAX interface, it allows standard XML
tools to be applied to even the worst HTML. TagSoup also includes
a command-line processor that reads HTML files and can generate either
clean HTML or well-formed XML that is a close approximation to XHTML.
WWW: http://home.ccil.org/~cowan/tagsoup
|