Remove existing text layer before writing final file

I am sometimes in situations where the PDF's existing text layer is not great (character issues, etc), and the version this package creates is more reliable. Is it possible to remove the existing text layer and replace it with the one your package creates?

I think I have dug into where it is happening, but I don't know if `original_page` has a method for that or not.

I don't have a Linux dev environment (just running this from a Docker container) so I haven't been able to actually debug this yet.

https://github.com/virantha/pypdfocr/blob/c88a305b61f7004df7f0f9c336ee29a9b90f9998/pypdfocr/pypdfocr.py#L363

https://github.com/virantha/pypdfocr/blob/c88a305b61f7004df7f0f9c336ee29a9b90f9998/pypdfocr/pypdfocr_pdf.py#L256

https://github.com/virantha/pypdfocr/blob/c88a305b61f7004df7f0f9c336ee29a9b90f9998/pypdfocr/pypdfocr_pdf.py#L166

https://github.com/virantha/pypdfocr/blob/c88a305b61f7004df7f0f9c336ee29a9b90f9998/pypdfocr/pypdfocr_pdf.py#L199

It looks like from this SO post that you can extract the text, but I don't know about destroying it.
http://stackoverflow.com/questions/35090948/pypdf2-wont-extract-all-text-from-pdf/36914534#36914534

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove existing text layer before writing final file #62

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Remove existing text layer before writing final file #62

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions