Tesseract is a very powerful OCR package that works only from the command line. Imagemagick is a very powerful image conversion toolkit. To OCR a PDF:
convert inputfile.pdf covertedfile.tiff tesseract covertedfile.tiff textfile. That's amazingly easy.
No comments:
Post a Comment