A powerful Python package for automatically identifying and extracting Figure and Table elements from PDF documents
Installation • Quick Start • Features • Demo • Documentation
- 🔍 Automatic Detection: Identifies Figures and Tables in PDF documents
- 🎯 Smart Merging: Combines related elements with their captions
- 🎨 High Quality Output: Generates clean, merged images
- 🖥️ Command Line Interface: Easy-to-use CLI tool
- ⚡ Batch Processing: Process multiple PDFs efficiently
pip install pdf-element-extractorOr install from source:
git clone https://github.com/shenh10/pdf-element-extractor.git
cd pdf-element-extractor
pip install -e .# Basic usage
pdf-element-extractor input.pdf --output results
# Extract only merged images
pdf-element-extractor input.pdf --output results --merged-only
# Process specific pages
pdf-element-extractor input.pdf --output results --pages 1,3,5The tool successfully extracts Figures and Tables from research papers:
pdf-element-extractor [PDF_FILE] [OPTIONS]
Options:
--output PATH Output directory
--pages PAGES Specific pages to process (e.g., 1,3,5)
--merged-only Generate only merged images
--no-viz Skip visualization generation
--verbose Enable verbose output
--help Show help messageoutput_directory/
├── merged_images/ # Combined Figure/Table images
├── figure_images/ # Individual Figure elements
├── table_images/ # Individual Table elements
└── Page_*_analysis.png # Page analysis visualizations
- 🔍 自动检测: 识别PDF文档中的图表和表格
- 🎯 智能合并: 将相关元素与其标题合并
- 🎨 高质量输出: 生成清晰的合并图像
- 🖥️ 命令行界面: 易于使用的CLI工具
- ⚡ 批量处理: 高效处理多个PDF文件
pip install pdf-element-extractor或从源码安装:
git clone https://github.com/shenh10/pdf-element-extractor.git
cd pdf-element-extractor
pip install -e .# 基本用法
pdf-element-extractor input.pdf --output results
# 仅提取合并图像
pdf-element-extractor input.pdf --output results --merged-only
# 处理指定页面
pdf-element-extractor input.pdf --output results --pages 1,3,5该工具成功从研究论文中提取图表和表格:
pdf-element-extractor [PDF文件] [选项]
选项:
--output PATH 输出目录
--pages PAGES 指定要处理的页面 (例如: 1,3,5)
--merged-only 仅生成合并图像
--no-viz 跳过可视化生成
--verbose 启用详细输出
--help 显示帮助信息输出目录/
├── merged_images/ # 合并的图表/表格图像
├── figure_images/ # 单独的图表元素
├── table_images/ # 单独的表格元素
└── Page_*_analysis.png # 页面分析可视化
This project is released under the MIT License.














