Skip to content

Latest commit

 

History

History
184 lines (135 loc) · 7.97 KB

File metadata and controls

184 lines (135 loc) · 7.97 KB

fakereasoning FakeReasoning

Static Badge

This is the source code for Toward Generalizable Forgery Detection and Reasoning. In this paper,

  • We formulate detection and explanation as a unified Forgery Detection and Reasoning task (FDR-Task), leveraging Multi-Modal Large Language Models (MLLMs) to provide accurate detection through reliable reasoning over forgery attributes.
  • We introduce the Multi-Modal Forgery Reasoning dataset (MMFR-Dataset), a large-scale dataset containing 120K images across 10 generative models, with 378K reasoning annotations on forgery attributes, enabling comprehensive evaluation of the FDR-Task.
  • We propose FakeReasoning, a forgery detection and reasoning framework with three key components: 1) a dual-branch visual encoder that integrates CLIP and DINO to capture both high-level semantics and low-level artifacts; 2) a Forgery-Aware Feature Fusion Module that leverages DINO's attention maps and cross-attention mechanisms to guide MLLMs toward forgery-related clues; 3) a Classification Probability Mapper that couples language modeling and forgery detection, enhancing overall performance.

News

  • Mar 9 2026: Our paper is accepted to IEEE TIP !
  • Aug 27 2025: The pretrained model and source code are released. If you have followed our earlier work, please note that both the dataset and method have been updated. Check details on arXiv.
  • Jun 11 2025: The MMFR-Dataset is released! Also we provide codes to follow our dataset construction pipeline.
  • Apr 15 2025: The Project Page of our paper has been published! Click to find more about performance of FakeReasoning and samples in MMFR-Dataset.
  • Mar 27 2025: Our Paper is released on arXiv.

Dataset

The training set of MMFR-Dataset contains 50K fake images with 129K reasoning annotations and 50K real images with 183K reasoning annotations. The evaluation sets of MMFR-Dataset contains 20K images with 66K reasoning annotations across 10 generative models.

Download

MMFR-Dataset is available on huggingface. Download all split .tar files, concatenate them into a single archive, and then extract the dataset.

Structure

./
├── diffusiondb
│   ├── part-000001
│   │   ├── 0a3c75bb-4bd0-47c8-a2ba-e2aee92ad43f.png
│   │   └── [...]
│   ├── [...]
│   ├── part-000051
│   └── diffusiondb_reasoning.json
├── laion
│   ├── 00000
│   │   ├── 000000000.jpg
│   │   └── [...]
│   ├── [...]
│   ├── 00047
│   └── laion_reasoning.json
├── evaluation_sets
│   ├── stablediffusion
│   │   ├── 0_real
│   │   ├── 1_fake
│   │   └── stablediffusion_reasoning.json
│   ├── [...]
│   └── gigagan
└── forgery_reasoning_cot.json

forgery_reasoning_cot.json contains instruction-CoT annotations for the training set. We also provide original reasoning annotations in diffusiondb_reasoning.json and laion_reasoning.json (for the training set). Reasoning annotations for evaluation sets, such as stablediffusion_reasoning.json, can be found within their respective subfolders.

Generation

Codes are included in ./mmfr_generation/. We use batch API of GPT-4o for dataset generation. To follow our construction pipeline:

  1. Generate jsonl files with get_jsonl.py for batch requests.
  2. Upload your jsonl files and get output from GPT-4o with batch_api_generation.ipynb.
  3. Organize original output from GPT-4o to structured reasoning annotation with output_to_reasoning.py.

Install

The implementation is based on torch==2.1.2+cu121.

  1. Clone this repository and navigate to the LLaVA folder
git clone https://github.com/PRIS-CV/FakeReasoning.git
cd LLaVA
  1. Install required packages
conda create -n fakereasoning python=3.10
conda activate fakereasoning
pip install -e .
  1. Install additional dependencies for training
pip install -e ".[train]"
pip install flash-attn --no-build-isolation

If the installation of flash-attn fails, please visit the official GitHub release page and install the corresponding .whl package.

  1. Install additional dependencies for evaluation
pip install nltk
pip install rouge-score
  1. Download base models

FakeReasoning is built upon the following models:

Please download the corresponding pretrained weights before running the framework.

Inference

The pretrained model of FakeReasoning is available on Hugging Face. To use your local weight of openai/clip-vit-large-patch14-336, modify the "mm_vision_tower" in config.json to path_to_clip-vit-large-patch14-336.

Inference with a single image

cd LLaVA/forgery_eval
export DINO_PATH='path_to_dinov2-main'
export DINO_WEIGHT='path_to_dinov2_vitl14_pretrain.pth'
python inference.py \
--model-path path_to_FakeReasoning_weights \
--img_path commonFake_COCO_if_stage_III_189.png

Evaluation on the MMFR-Dataset

python eval.py \
--model-path path_to_FakeReasoning_weights \
--dataset_path path_to_MMFR-Dataset \
--result_folder ./results \
--clip_path path_to_clip-vit-large-patch14-336

⚠️ Note: Multi-GPU inference is currently not supported. Please ensure that you have at least 30 GB of GPU memory available on a single GPU to run inference and evaluation.

Training

FakeReasoning is trained on 8× A800 GPUs (40GB) for 3 epochs, with the entire training completed in about 7 hours.

cd LLaVA
export DINO_PATH='path_to_dinov2-main'
export DINO_WEIGHT='path_to_dinov2_vitl14_pretrain.pth'
bash finetune_task_lora.sh \
--data_path path_to_forgery_reasoning_cot.json \
--model_name_or_path path_to_MoF_Models \
--image_folder path_to_MMFR-Dataset \
--vision_tower path_to_clip-vit-large-patch14-336

⚠️ Note: If you change the number of training devices, always ensure:

per_device_train_batch_size × gradient_accumulation_steps × num_gpus = 128

Citation

If you find this work useful for your research, please kindly cite our paper:

@article{gao2026toward,
  title={Toward Generalizable Forgery Detection and Reasoning},
  author={Gao, Yueying and Chang, Dongliang and Yu, Bingyao and Qin, Haotian and Diao, Muxi and Chen, Lei and Liang, Kongming and Ma, Zhanyu},
  journal={IEEE Transactions on Image Processing},
  volume={35},
  pages={3395--3410},
  year={2026},
  publisher={IEEE}
}

Acknowledgement

We are thankful to LLaVA, MMVP, DINOv2, UniFD, and MCAN for releasing their models and code as open-source contributions.