Skip to content

LannCX/FreETAD

Repository files navigation

FreETAD [ACM MM 2025]

This is the source code of paper "Ex Pede Herculem, Predicting Global Actionness Curve from Local Clips", which is accepted by ACM MM 2025.

freetad

[Paper Link]

Overview

We present FreETAD, a Frequency-based End-to-end Temporal Action Detection approach, which shifts the focus from local actionness scores to frequency component estimation. Using short-term Fourier Transform, FreETAD reconstructs the global action curve seamlessly. With a DETR-like decoder and frequency-encoded vectors for queries, it enhances multi-scale time-frequency interactions. FreETAD leverages end-to-end training effectively, boosting the mAP by 1.5% on Charades and 2.7% on MultiTHUMOS, outperforming current state-of-the-art.

Data Preparation

We use MutiTHUMOS and Charades for evaluation, please download them from the official website. Then prepare RGB frames as follows:

  • Clone the repository and cd FreETAD; mkdir data, or you can specify your own path in /dataset/dataset_cfg.yaml.

  • For MultiTHUMOS:

    • Download the raw videos of THUMOS14 into /data/thumos14_videos;
    • Extract the RGB frames from raw videos using utils/extract_frames.py. The frames will be placed in /data/multithumos_frames;
    • You also need to generate multithumos_frames.json for the extracted frames with /util/generate_frame_dict.py and put the json file into /dataset folder.
  • For Charades:

    • Download the RGB frames of Charades from here , and place the frames at /data/charades_v1_rgb.
  • Replace the frame folder path or image tensor path in /data/dataset_cfg.yml.

Training

Use train.sh to train PointTAD,

  • MultiTHUMOS:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --master_port=11302 --use_env main.py --dataset multithumos
  • Charades:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --master_port=11302 --use_env main.py --dataset charades

Testing

Use test.sh to evaluate,

  • MultiTHUMOS:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --master_port=11302 --use_env main.py --dataset multithumos --eval --load multithumos_best.pth
  • Charades:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --master_port=11302 --use_env main.py --dataset charades --eval --load charades_best.pth

Acknowledgements

Our code is mainly based on PointTAD, and also references parts of the code from RTD-Net, and E2ETAD. We are very grateful for their open-sourcing of such excellent code!

About

A multi-label temporal action detection method based on frequency estimation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors