AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation

This repository is the official implementation for "AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation" (paper).

🚀 AeroReformer is a novel vision-language framework for UAV-based referring image segmentation (UAV-RIS), designed to tackle the unique challenges of aerial imagery, such as complex spatial scales, occlusions, and diverse object orientations.

Our approach integrates a Vision-Language Cross-Attention Module (VLCAM) for enhanced multimodal understanding and a Rotation-Aware Multi-Scale Fusion (RAMSF) decoder to improve segmentation accuracy in aerial scenes.

Setting Up

The code has been verified only on Ubuntu. Please adapt and test it on your own platform as needed.

Preliminaries

The code has been verified to work with PyTorch v2.3.1 and Python 3.10.

Clone this repository.
Change the directory to the root of this repository.

Package Dependencies

Create a new Conda environment with Python 3.10 and then activate it:
```
conda create -n AeroReformer python=3.10
conda activate AeroReformer
```

Install PyTorch with CUDA 12.4 support (ensure your NVIDIA driver is compatible):

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

Install the packages listed in requirements.txt using pip:
```
pip install -r requirements.txt
```

Initialization Weights for Training

Create the ./pretrained_weights directory to store the weights.
```
mkdir ./pretrained_weights
```
Download the pre-trained classification weights of the Swin Transformer from this link.
Place the downloaded .pth file into the ./pretrained_weights directory. These weights are necessary for initializing the model during training.

Datasets

Warning: Experiments are conducted on the UAVid-RIS and VDD-RIS datasets. The text expressions in these datasets were generated by the Qwen and LLaMA models, and may contain errors or inconsistencies. We welcome any collaboration to help improve the quality of the data.

To ensure full reproducibility, we provide the necessary preprocessing code in this GitHub repository. This allows you to generate the exact image data used in our experiments from the original datasets. The text labels, which were generated by our team, are directly available for download from Hugging Face.

Usage and Dataset Preparation

Follow these steps to prepare the datasets for training and testing:

1. Download the Original Datasets

Download the raw datasets and save them under ./data/UAVid_RIS and ./data/VDD_RIS (or similar paths you prefer).

UAVid: UAVid Official Website
VDD: Hugging Face

2. Download the Text References

Download the generated text expressions and save them in the corresponding dataset folders. Extract them if necessary.

UAVid-RIS Texts: Hugging Face link
VDD-RIS Texts: Hugging Face link

3. Folder Structure

After downloading and extracting, the dataset folders should look like this:

UAVid-RIS:

$DATA_PATH_UAVID  # ./data/UAVid_RIS
├── uavid_ris
│   ├── refs(uow).p
│   ├── refs_llama(uow).p
│   ├── instances.json
└── images
    └── uavid_ris
        ├── PNGImages
        ├── ann_split
        ├── ann_split_llama
        ├── annotations

VDD-RIS:

$DATA_PATH_VDD  # ./data/VDD_RIS
├── vdd_ris
│   ├── refs(uow).p
│   ├── refs_llama(uow).p
│   ├── instances.json
└── images
    └── vdd_ris
        ├── PNGImages
        ├── ann_split
        ├── ann_split_llama
        ├── annotations

4. Preprocess the Images

Use the preprocessing scripts provided in this repository to split the images for training, validation, and testing:

# UAVid-RIS
python tool/image_split_uavid.py train
python tool/image_split_uavid.py val
python tool/image_split_uavid.py test

# VDD-RIS
python tool/image_split_vdd.py train
python tool/image_split_vdd.py val
python tool/image_split_vdd.py test

5. Demo Images

You can also get access to the UAV images we captured at the University of Warwick in the demo subfolder of this repository.

Training

We use DistributedDataParallel from PyTorch for training. To run on a single GPU (ID 0), use the following commands.

Train on UAVid-RIS:

CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=12345 train.py --dataset uavid_ris --model_id AeroReformer --epochs 40 --img_size 480 --refer_data_root ./data/UAVid_RIS/ --mha 4-4-4-4 --output-dir ./checkpoints/UAVid_RIS

Train on VDD-RIS:

CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=12345 train.py --dataset vdd_ris --model_id AeroReformer --epochs 10 --img_size 480 --refer_data_root ./data/VDD_RIS/ --mha 4-4-4-4 --output-dir ./checkpoints/VDD_RIS

Testing

Test on UAVid-RIS:

python test.py --swin_type base --dataset uavid_ris --resume ./checkpoints/UAVid_RIS/model_best_AeroReformer.pth --model_id AeroReformer --split test --workers 4 --window12 --img_size 480 --refer_data_root ./data/UAVid_RIS/ --mha 4-4-4-4

Test on VDD-RIS:

python test.py --swin_type base --dataset vdd_ris --resume ./checkpoints/VDD_RIS/model_best_AeroReformer.pth --model_id AeroReformer --split test --workers 4 --window12 --img_size 480 --refer_data_root ./data/VDD_RIS/ --mha 4-4-4-4

Acknowledgements

The code in this repository is built upon the work of LAVT and RMSIN. We would like to thank the authors for making their project open source.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
arc		arc
bert		bert
data		data
demo		demo
lib		lib
loss		loss
refer		refer
tool		tool
AeroReformer.PNG		AeroReformer.PNG
README.md		README.md
args.py		args.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py
transforms.py		transforms.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation

Setting Up

Preliminaries

Package Dependencies

Initialization Weights for Training

Datasets

Usage and Dataset Preparation

1. Download the Original Datasets

2. Download the Text References

3. Folder Structure

4. Preprocess the Images

5. Demo Images

Training

Testing

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation

Setting Up

Preliminaries

Package Dependencies

Initialization Weights for Training

Datasets

Usage and Dataset Preparation

1. Download the Original Datasets

2. Download the Text References

3. Folder Structure

4. Preprocess the Images

5. Demo Images

Training

Testing

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages