[plug] PDF to TXT

Steve Grasso steveg at calm.wa.gov.au
Thu Aug 3 12:04:17 WST 2000


Mike,

> Anyone recommend a good (cheap --> free, its a one off use) PDF to TXT
> ripper. The ones I have found so far on freshmeat either mean I send the
> PDF to them or I pay US $ 500 ... (ouch)

I hunted down an open-source pdf to html ripper a while back.

The site (http://www.ra.informatik.uni-stuttgart.de/~gosho/pdftohtml/) appears
to not be available, but I have the source tarball (~250k) if you're
interested.

Blurb from the author:
Pdftohtml v. 0.22 converts Portable Document Format files to HTML. This 
release converts text and links. Bold and italic face are preserved. Pdftohtml 
v.0.21 extracts all images as JPEG or PNG. Currently "pdf" vector drawings are 
not extracted. The current version is tested on Linux and Solaris 2.6 

Regards,
Steve



More information about the plug mailing list