Skip to content

lironui/AeroReformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository is the official implementation for "AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation" (paper).

🚀 AeroReformer is a novel vision-language framework for UAV-based referring image segmentation (UAV-RIS), designed to tackle the unique challenges of aerial imagery, such as complex spatial scales, occlusions, and diverse object orientations.

Our approach integrates a Vision-Language Cross-Attention Module (VLCAM) for enhanced multimodal understanding and a Rotation-Aware Multi-Scale Fusion (RAMSF) decoder to improve segmentation accuracy in aerial scenes.

AeroReformer Overview

Setting Up

The code has been verified only on Ubuntu. Please adapt and test it on your own platform as needed.

Preliminaries

The code has been verified to work with PyTorch v2.3.1 and Python 3.10.

  1. Clone this repository.
  2. Change the directory to the root of this repository.

Package Dependencies

  1. Create a new Conda environment with Python 3.10 and then activate it:

    conda create -n AeroReformer python=3.10
    conda activate AeroReformer
  2. Install PyTorch with CUDA 12.4 support (ensure your NVIDIA driver is compatible):

    pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
  3. Install the packages listed in requirements.txt using pip:

    pip install -r requirements.txt

Initialization Weights for Training

  1. Create the ./pretrained_weights directory to store the weights.
    mkdir ./pretrained_weights
  2. Download the pre-trained classification weights of the Swin Transformer from this link.
  3. Place the downloaded .pth file into the ./pretrained_weights directory. These weights are necessary for initializing the model during training.

Datasets

Warning: Experiments are conducted on the UAVid-RIS and VDD-RIS datasets. The text expressions in these datasets were generated by the Qwen and LLaMA models, and may contain errors or inconsistencies. We welcome any collaboration to help improve the quality of the data.

To ensure full reproducibility, we provide the necessary preprocessing code in this GitHub repository. This allows you to generate the exact image data used in our experiments from the original datasets. The text labels, which were generated by our team, are directly available for download from Hugging Face.

Usage and Dataset Preparation

Follow these steps to prepare the datasets for training and testing:


1. Download the Original Datasets

Download the raw datasets and save them under ./data/UAVid_RIS and ./data/VDD_RIS (or similar paths you prefer).


2. Download the Text References

Download the generated text expressions and save them in the corresponding dataset folders. Extract them if necessary.


3. Folder Structure

After downloading and extracting, the dataset folders should look like this:

UAVid-RIS:

$DATA_PATH_UAVID  # ./data/UAVid_RIS
├── uavid_ris
│   ├── refs(uow).p
│   ├── refs_llama(uow).p
│   ├── instances.json
└── images
    └── uavid_ris
        ├── PNGImages
        ├── ann_split
        ├── ann_split_llama
        ├── annotations

VDD-RIS:

$DATA_PATH_VDD  # ./data/VDD_RIS
├── vdd_ris
│   ├── refs(uow).p
│   ├── refs_llama(uow).p
│   ├── instances.json
└── images
    └── vdd_ris
        ├── PNGImages
        ├── ann_split
        ├── ann_split_llama
        ├── annotations

4. Preprocess the Images

Use the preprocessing scripts provided in this repository to split the images for training, validation, and testing:

# UAVid-RIS
python tool/image_split_uavid.py train
python tool/image_split_uavid.py val
python tool/image_split_uavid.py test

# VDD-RIS
python tool/image_split_vdd.py train
python tool/image_split_vdd.py val
python tool/image_split_vdd.py test

5. Demo Images

You can also get access to the UAV images we captured at the University of Warwick in the demo subfolder of this repository.

Training

We use DistributedDataParallel from PyTorch for training. To run on a single GPU (ID 0), use the following commands.

Train on UAVid-RIS:

CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=12345 train.py --dataset uavid_ris --model_id AeroReformer --epochs 40 --img_size 480 --refer_data_root ./data/UAVid_RIS/ --mha 4-4-4-4 --output-dir ./checkpoints/UAVid_RIS

Train on VDD-RIS:

CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=12345 train.py --dataset vdd_ris --model_id AeroReformer --epochs 10 --img_size 480 --refer_data_root ./data/VDD_RIS/ --mha 4-4-4-4 --output-dir ./checkpoints/VDD_RIS

Testing

Test on UAVid-RIS:

python test.py --swin_type base --dataset uavid_ris --resume ./checkpoints/UAVid_RIS/model_best_AeroReformer.pth --model_id AeroReformer --split test --workers 4 --window12 --img_size 480 --refer_data_root ./data/UAVid_RIS/ --mha 4-4-4-4

Test on VDD-RIS:

python test.py --swin_type base --dataset vdd_ris --resume ./checkpoints/VDD_RIS/model_best_AeroReformer.pth --model_id AeroReformer --split test --workers 4 --window12 --img_size 480 --refer_data_root ./data/VDD_RIS/ --mha 4-4-4-4

Acknowledgements

The code in this repository is built upon the work of LAVT and RMSIN. We would like to thank the authors for making their project open source.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages