This is the source code of paper "Ex Pede Herculem, Predicting Global Actionness Curve from Local Clips", which is accepted by ACM MM 2025.
We present FreETAD, a Frequency-based End-to-end Temporal Action Detection approach, which shifts the focus from local actionness scores to frequency component estimation. Using short-term Fourier Transform, FreETAD reconstructs the global action curve seamlessly. With a DETR-like decoder and frequency-encoded vectors for queries, it enhances multi-scale time-frequency interactions. FreETAD leverages end-to-end training effectively, boosting the mAP by 1.5% on Charades and 2.7% on MultiTHUMOS, outperforming current state-of-the-art.
We use MutiTHUMOS and Charades for evaluation, please download them from the official website. Then prepare RGB frames as follows:
-
Clone the repository and
cd FreETAD; mkdir data, or you can specify your own path in/dataset/dataset_cfg.yaml. -
For MultiTHUMOS:
- Download the raw videos of THUMOS14 into
/data/thumos14_videos; - Extract the RGB frames from raw videos using
utils/extract_frames.py. The frames will be placed in/data/multithumos_frames; - You also need to generate
multithumos_frames.jsonfor the extracted frames with/util/generate_frame_dict.pyand put the json file into/datasetfolder.
- Download the raw videos of THUMOS14 into
-
For Charades:
- Download the RGB frames of Charades from here , and place the frames at
/data/charades_v1_rgb.
- Download the RGB frames of Charades from here , and place the frames at
-
Replace the frame folder path or image tensor path in
/data/dataset_cfg.yml.
Use train.sh to train PointTAD,
- MultiTHUMOS:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --master_port=11302 --use_env main.py --dataset multithumos
- Charades:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --master_port=11302 --use_env main.py --dataset charades
Use test.sh to evaluate,
- MultiTHUMOS:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --master_port=11302 --use_env main.py --dataset multithumos --eval --load multithumos_best.pth
- Charades:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --master_port=11302 --use_env main.py --dataset charades --eval --load charades_best.pth
Our code is mainly based on PointTAD, and also references parts of the code from RTD-Net, and E2ETAD. We are very grateful for their open-sourcing of such excellent code!
