Skip to content
Change the repository type filter

All

    Repositories list

    • M3VQA

      Public
      Python
      0000Updated Apr 20, 2026Apr 20, 2026
    • SciVQR

      Public
      This repo holds the official evaluation code for "SciVQR: A Multidisciplinary Multimodal Benchmark for Advanced Scientific Reasoning Evaluation".
      Python
      MIT License
      0000Updated Apr 18, 2026Apr 18, 2026
    • VisReason

      Public
      0000Updated Apr 16, 2026Apr 16, 2026
    • Python
      MIT License
      12800Updated Apr 9, 2026Apr 9, 2026
    • DeepSlyme

      Public
      Python
      Apache License 2.0
      0000Updated Apr 7, 2026Apr 7, 2026
    • UrbanNav

      Public
      [AAAI 2026] Official implementation of paper "UrbanNav: Learning Language-Guided Embodied Urban Navigation from Web-Scale Human Trajectories"
      Python
      MIT License
      46030Updated Mar 27, 2026Mar 27, 2026
    • S1-MMAlign: 科学多模态数据集(入口页,数据托管于Hugging Face)
      0000Updated Mar 19, 2026Mar 19, 2026
    • ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval
      Python
      Apache License 2.0
      0910Updated Jan 6, 2026Jan 6, 2026
    • VRoPE

      Public
      [EMNLP 2025 Main] Official implementation of VRoPE: Rotary Position Embedding for Video Large Language Models.
      Python
      12700Updated Nov 18, 2025Nov 18, 2025
    • An efficient GRPO training util.
      Python
      MIT License
      35500Updated Jun 13, 2025Jun 13, 2025
    • VideoNIAH

      Public
      VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
      Python
      05740Updated Mar 9, 2025Mar 9, 2025
    • COSA

      Public
      [ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
      Python
      MIT License
      34330Updated Dec 25, 2024Dec 25, 2024
    • VALOR

      Public
      [TPAMI2024] Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
      Python
      MIT License
      1831080Updated Dec 25, 2024Dec 25, 2024
    • DANet

      Public
      Dual Attention Network for Scene Segmentation (CVPR2019)
      Python
      MIT License
      4852.5k611Updated Dec 23, 2024Dec 23, 2024
    • MRES

      Public
      This repo holds the official code and data for "Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation", accepted by CVPR 2…
      Apache License 2.0
      07350Updated Jun 3, 2024Jun 3, 2024
    • SC-Tune

      Public
      Official code for CVPR 2024 paper, "SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models"
      Python
      MIT License
      11610Updated Apr 22, 2024Apr 22, 2024
    • VAST

      Public
      [NIPS2023] Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
      Jupyter Notebook
      MIT License
      18301220Updated Mar 14, 2024Mar 14, 2024
    • GLOBER

      Public
      Python
      Other
      0910Updated Jan 11, 2024Jan 11, 2024
    • ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without relying on all combinations…
      Python
      BSD 3-Clause "New" or "Revised" License
      15560Updated Sep 4, 2023Sep 4, 2023
    • Official PyTorch implementation of the paper "Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner"
      Python
      MIT License
      11510Updated Aug 9, 2023Aug 9, 2023
    • MOSO

      Public
      Python
      23550Updated Jun 6, 2023Jun 6, 2023
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.