A powerful Python-based Facebook scraping tool with a PyQt6 GUI interface for extracting posts, comments, and images from Facebook pages, groups, and individual posts without using the official Facebook API.
- Pure Requests-Based: No browser automation or Selenium required - uses direct HTTP requests to Facebook's GraphQL API
- Lightweight & Fast: Minimal dependencies, efficient memory usage, and faster execution
- No Browser Intervention: Operates entirely through HTTP requests without spawning browser instances
- Headless Operation: Perfect for servers and automated workflows
-
Multiple Scraping Modes:
- π Single post scraping (text, images, comments, and replies)
- π€ Page/Profile posts scraping
- π₯ Facebook Group posts scraping
- πΌοΈ High-quality image extraction
-
Rich Data Extraction:
- Post content (text, reactions, shares)
- Comments and nested replies
- User information (names, IDs, profile links)
- Media content (images with multiple resolution support)
- Timestamps and engagement metrics
-
User-Friendly GUI:
- PyQt6-based desktop interface
- Real-time logging and progress tracking
- Tabbed interface for different scraping types
- Easy configuration and export
-
Robust Architecture:
- Pure
requestslibrary implementation (no browser/Selenium) - Automatic retry mechanism with exponential backoff
- Proxy support for privacy and rate limiting
- Pagination handling for large data sets
- JSON export for easy data processing
- Direct GraphQL API communication
- Pure
-
π― Enhanced Comment Detection:
- 6 extraction paths for comment counts
- Handles deeply nested comment structures
- Ensures posts with 49+ comments are correctly detected
- Never skips posts due to missing comment count data
-
π Advanced Story Node Discovery:
- Multi-location Story node detection (Group edges, timeline edges, direct nodes)
- Handles complex JSON structures from Facebook's varying response formats
- Discovers posts that were previously hidden in nested structures
-
πΈ Complete Album Scraping:
- Automatically fetches ALL images from posts (up to 50 per post)
- Uses media ID iteration to navigate through large albums
- No longer limited to first 5 images
- Perfect for posts with 10-20+ images
-
β»οΈ Smart Deduplication:
- Detects already-scraped posts by checking saved JSON files
- Skips duplicate posts when resuming interrupted sessions
- Saves bandwidth and processing time
- Automatic folder structure validation
-
π Intelligent Retry Logic:
- 3-attempt retry for transient Facebook API errors
- 2-second delays between retry attempts
- Handles empty response arrays gracefully
- Prevents infinite loops on persistent failures
-
π¬ Content Filtering:
- Automatic reel and video post detection and skipping
- Configurable minimum comment threshold
- Focus on high-engagement photo posts only
-
π‘οΈ Robust Error Handling:
- Safe pagination with proper break conditions
- No infinite loops on empty responses
- Comprehensive error logging
- Graceful degradation on failures
- Python 3.8 or higher
- Valid Facebook session tokens (extracted from browser - one-time setup)
- No browser automation tools required (Selenium, Playwright, etc.)
- Works with pure HTTP requests
- Clone the repository:
git clone https://github.com/mohdtalal3/facebook_post_comment_scraper
cd facebook_post_comment_scraper- Install required dependencies (minimal and lightweight):
pip install requests PyQt6 python-dotenvNote: Only requests is needed for scraping - no browser automation libraries required!
- Create a
.envfile in the project root:
# Optional: Add your proxy if needed
PROXY=http://your-proxy:port
# Optional: Add any other configurationCreate a requirements.txt file:
requests>=2.28.0
PyQt6>=6.4.0
python-dotenv>=0.20.0Install with:
pip install -r requirements.txtLaunch the graphical interface:
python facebook_ui.pyThe GUI provides three main tabs:
- Simple Post: Scrape a single post with all its comments and images
- Page Posts: Extract multiple posts from a Facebook page or profile
- Group Posts: Scrape posts from Facebook groups
For advanced users, you can use the command-line interface:
from main import extract_post_id_from_url, fetch_comments_for_post, save_post_data
# Extract post ID
post_id = extract_post_id_from_url("https://www.facebook.com/permalink.php?story_fbid=123...")
# Fetch comments
comments = fetch_comments_for_post(post_id, max_comments=100)
# Save data
save_post_data(post_id, comments, "output_dir")Add your proxy to the .env file:
PROXY=http://username:password@proxy-server:portfacebook-scraper/
βββ main.py # Main orchestration and utilities
βββ facebook_ui.py # PyQt6 GUI interface
βββ post_scraper.py # Page/Profile post scraper
βββ group_post_scraper_v2.py # Group post scraper
βββ comment_scraper.py # Comment and reply scraper
βββ single_post_image.py # Image extraction module
βββ simple_post/ # Output directory for posts
βββ page_post/ # Output directory for page posts
βββ ex/ # Example outputs
βββ extras/ # Additional scripts and tools
Data is saved in JSON format with the following structure:
{
"post_id": "123456789",
"author": "User Name",
"author_id": "100001234567890",
"content": "Post text content",
"timestamp": "2024-01-01T12:00:00",
"reactions": 150,
"shares": 25,
"images": ["url1.jpg", "url2.jpg"],
"comments_count": 45
}{
"comment_id": "987654321",
"author": "Commenter Name",
"author_id": "100009876543210",
"text": "Comment text",
"timestamp": "2024-01-01T12:30:00",
"replies": [...]
}- Terms of Service: This tool may violate Facebook's Terms of Service. Use at your own risk.
- Rate Limiting: Implement appropriate delays between requests to avoid detection.
- Privacy: Respect user privacy and data protection laws (GDPR, CCPA, etc.).
- Personal Use: This tool is intended for educational and research purposes only.
- Doc IDs: Facebook's GraphQL document IDs change frequently. You'll need to update them periodically.
- Authentication: Requires valid Facebook session tokens that expire.
- Rate Limits: Excessive requests may result in temporary blocks or account restrictions.
- Private Content: Cannot access content that requires authentication beyond what's provided.
1. "Failed after 5 attempts" error
- Check your internet connection
- Verify proxy settings
- Update DOC_ID values
- Ensure session tokens are valid
2. No data returned
- Verify the URL/ID is correct
- Check if content is publicly accessible
- Update authentication headers
3. GUI not launching
- Ensure PyQt6 is properly installed:
pip install --upgrade PyQt6 - Check Python version compatibility
Enable verbose logging by modifying the scripts:
import logging
logging.basicConfig(level=logging.DEBUG)Contributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests.
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes
- Test thoroughly
- Submit a pull request
This project is provided for educational purposes only. Users are responsible for ensuring compliance with Facebook's Terms of Service and applicable laws.
- Built with Python and PyQt6
- Uses pure
requestslibrary for HTTP communication - Direct GraphQL API integration (unofficial)
- No browser automation required
- Inspired by the need for lightweight, efficient data research tools
For issues, questions, or suggestions:
- Open an issue on GitHub
- Check existing issues for solutions
- Review the troubleshooting section
Completed:
- Enhanced comment count detection with 6 extraction paths
- Advanced Story node discovery in nested structures
- Complete album scraping (up to 50 images per post)
- Post deduplication for interrupted sessions
- Automatic retry logic for transient API errors
- Robust pagination with proper error handling
- Reel/video filtering
- Configurable comment threshold filtering
Upcoming:
- Add support for Facebook Stories
- Implement video download functionality
- Add data export to CSV/Excel
- Improve authentication flow
- Add scheduling and automation features
- Create web-based interface
- Add data analysis and visualization tools
Disclaimer: This tool is not affiliated with or endorsed by Facebook/Meta. Use responsibly and ethically.