Skip to content

carret1268/PdfHandlerETC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyPI Documentation Status CI codecov

PdfHandlerETC

PdfHandlerETC is a lightweight command-line and Python toolkit for handling common PDF tasks including text extraction, encryption, decryption, permissions inspection, word counting, page resizing, and file merging.

This project is released under the CC0 1.0 Public Domain Dedication.

Features

  • Extract text from PDFs by page or range
  • Encrypt and decrypt PDFs with customizable permissions
  • Count words across entire documents or selected pages
  • Inspect encryption status and permissions
  • Resize page dimensions
  • Merge two PDFs with optional visual separators (blank page or black bar)
  • Detect duplicate PDFs based on text content
  • Includes both a Python API and command-line interface (CLI)

Installation

Install from PyPI:

pip install pdfhandleretc

Command-Line Usage

After installation, you can use the pdfhandler CLI tool:

python -m pdfhandler wordcount document.pdf --pages "1, 3" > document_text.txt
python -m pdfhandler encrypt document.pdf --output secure.pdf
python -m pdfhandler decrypt secure.pdf --in-place
python -m pdfhandler permissions secure.pdf
python -m pdfhandler resize document.pdf 612 792 --output resized.pdf
python -m pdfhandler dupe-check file1.pdf file2.pdf
python -m pdfhandler merge intro.pdf appendix.pdf merged.pdf --add-separator black
python -m pdfhandler extract document.pdf --pages "1-3, 5"

Use --help for details:

python -m pdfhandler --help
python -m pdfhandler extract --help

Python Usage

from pdfhandler import PdfHandler

handler = PdfHandler("example.pdf")

# Extract text
text = handler.get_pdf_text("1-2, 4")
print(text)

# Word count
print("Words:", handler.word_count("1-3"))

# Encrypt the file
handler.encrypt(output="example-encrypted.pdf")

# Show permissions
handler.print_permissions()

# Resize pages
handler.resize(width=612, height=792, output_path="resized.pdf")

# Merge with a visual separator (black bar or blank page)
PdfHandler.merge_pdfs(
    "intro.pdf",
    "appendix.pdf",
    "merged.pdf",
    add_separator=True,
    separator_type="black"  # or "blank"
)

License

This project is licensed under the CC0 1.0 Universal public domain dedication. You may use, modify, and distribute it freely without attribution or restriction.

Dependencies

  • pdfminer.six - for text extraction
  • pikepdf - for encryption and PDF manipulation
  • colorama - for cross-platform terminal colors

About

A lightweight Python and CLI toolkit for extracting, merging, encrypting, and manipulating PDF files. Includes text-based duplicate detection, customizable encryption, page resizing, and visual merge separators.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors