Skip to content

Latest commit

 

History

History
43 lines (29 loc) · 2 KB

File metadata and controls

43 lines (29 loc) · 2 KB

Tex2Sem

Tex2Sem: Learning from Textures to Semantics for Robust Semantic Correspondence

This paper was accepted for publication in IEEE Transactions on Circuits and Systems for Video Technology (https://ieeexplore.ieee.org/document/10311392/).

Our code and configurations are being organized.

Abstract

Recent advances in semantic correspondence have witnessed growing interest in SD and DINO. However, existing methods underutilize the matching potential of SD and DINOv2 features and show similar background interference patterns. They lack texture-to-semantic learning and intra- and inter-image feature interaction. This study proposes Tex2Sem, a framework learning from textures to semantics, to address the two problems. For the first problem, we propose a texture-to-semantic learning paradigm that achieves texture-semantic trade-offs on features and correlation maps. SD and DINOv2 features are aggregated from textures to semantics to produce multi-stage progressive fusion features. For the second problem, MamFormer, a hybrid architecture of Mamba-2 and Transformer, is proposed to improve intra- and inter-image feature aggregation and interaction. The terminal-stage aggregation and interaction mechanism (TAIM) is proposed to enhance feature learning efficiency.

Tex2Sem-based Video Swap

Tex2Sem-based Human Pose Estimation

Citation

@ARTICLE{Wang2025Tex2Sem,
  author={Wang, Zenghui and Du, Songlin and Yan, Yaping and Xiao, Guobao and Lu, Xiaobo},
  journal={IEEE Transactions on Circuits and Systems for Video Technology}, 
  title={Tex2Sem: Learning from Textures to Semantics for Robust Semantic Correspondence}, 
  year={2025},
  volume={35},
  number={11},
  pages={10875-10890},
  doi={10.1109/TCSVT.2025.3576772}}