diff --git a/README.md b/README.md index c3fae28..50a0225 100644 --- a/README.md +++ b/README.md @@ -135,7 +135,7 @@ cargo run -r -F openvino -F ort-load-dynamic --example yolo -- --device openvino -## What's Next? +## πŸ” What's Next? - πŸ“– Online Documentation - πŸ“š API Reference @@ -143,6 +143,246 @@ cargo run -r -F openvino -F ort-load-dynamic --example yolo -- --device openvino - πŸš€ Examples +## πŸ“¦ Model Zoo + +> [!NOTE] +> +> **Status:**β€‚βœ… **Supported**  |  ❓ **Unknown**  |β€‚β€‚βŒ **Not Supported For Now** + +πŸ” All ONNX models are available from the [ONNX Models Repository](https://github.com/jamjamjon/assets) + +
+πŸ”₯ YOLO-Series + +| Model | Task / Description | Demo | Dynamic Batch | TensorRT | FP32 | FP16 | Q8 | Q4f16 | BNB4 | +| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | +| [YOLOv5](https://github.com/ultralytics/yolov5) | Image Classification
Object Detection
Instance Segmentation | [demo](./examples/yolo) | βœ… | βœ… | βœ… | βœ… | βœ… | ❌ | ❌ | +| [YOLOv6](https://github.com/meituan/YOLOv6) | Object Detection | [demo](./examples/yolo) | βœ… | βœ… | βœ… | βœ… | βœ… | ❌ | ❌ | +| [YOLOv7](https://github.com/WongKinYiu/yolov7) | Object Detection | [demo](./examples/yolo) | βœ… | βœ… | βœ… | βœ… | βœ… | ❌ | ❌ | +| [YOLOv8](https://github.com/ultralytics/ultralytics) | Object Detection
Instance Segmentation
Image Classification
Oriented Object Detection
Keypoint Detection | [demo](./examples/yolo) | βœ… | βœ… | βœ… | βœ… | βœ… | ❌ | ❌ | +| [YOLO11](https://github.com/ultralytics/ultralytics) | Object Detection
Instance Segmentation
Image Classification
Oriented Object Detection
Keypoint Detection | [demo](./examples/yolo) | βœ… | βœ… | βœ… | βœ… | βœ… | ❌ | ❌ | +| [YOLOv9](https://github.com/WongKinYiu/yolov9) | Object Detection | [demo](./examples/yolo) | βœ… | βœ… | βœ… | βœ… | βœ… | ❌ | ❌ | +| [YOLOv10](https://github.com/THU-MIG/yolov10) | Object Detection | [demo](./examples/yolo) | βœ… | βœ… | βœ… | βœ… | βœ… | ❌ | ❌ | +| [YOLOv12](https://github.com/sunsmarterjie/yolov12) | Image Classification
Object Detection
Instance Segmentation | [demo](./examples/yolo) | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | +| [YOLOv13](https://github.com/iMoonLab/yolov13) | Object Detection | [demo](./examples/yolo) | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | +| [YOLO26](https://github.com/ultralytics/ultralytics) | Object Detection
Instance Segmentation
Image Classification
Oriented Object Detection
Keypoint Detection | [demo](./examples/yolo) | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | + +
+ + +
+🏷️ Image Classification & Tagging + +| Model | Task / Description | Demo | Dynamic Batch | TensorRT | FP32 | FP16 | Q8 | Q4f16 | BNB4 | +| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | +| [BEiT](https://github.com/microsoft/unilm/tree/master/beit) | Image Classification | [demo](./examples/image-classification) | βœ… | βœ… | βœ… | βœ… | ❌ | ❌ | ❌ | +| [ConvNeXt](https://github.com/facebookresearch/ConvNeXt) | Image Classification | [demo](./examples/image-classification) | βœ… | βœ… | βœ… | βœ… | ❌ | ❌ | ❌ | +| [FastViT](https://github.com/apple/ml-fastvit) | Image Classification | [demo](./examples/image-classification) | βœ… | βœ…| βœ… | βœ…| ❌ | ❌ | ❌ | +| [MobileOne](https://github.com/apple/ml-mobileone) | Image Classification | [demo](./examples/image-classification) | βœ… | βœ… | βœ… | βœ… | ❌ | ❌ | ❌ | +| [DeiT](https://github.com/facebookresearch/deit) | Image Classification | [demo](./examples/image-classification) | βœ… | βœ… | βœ… | βœ…| ❌ | ❌ | ❌ | +| [RAM](https://github.com/xinyu1205/recognize-anything) | Image Tagging | [demo](./examples/image-classification) | βœ… | ❓| βœ… | βœ…| βœ… | βœ… | βœ… | +| [RAM++](https://github.com/xinyu1205/recognize-anything) | Image Tagging | [demo](./examples/image-classification) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | + +
+ + +
+🎯 Object Detection + +| Model | Task / Description | Demo | Dynamic Batch | TensorRT | FP32 | FP16 | Q8 | Q4f16 | BNB4 | +| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | +| [RT-DETRv1](https://github.com/lyuwenyu/RT-DETR) | Object Detection | [demo](./examples/object-detection) | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | +| [RT-DETRv2](https://github.com/lyuwenyu/RT-DETR) | Object Detection | [demo](./examples/object-detection) | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | +| [RT-DETRv4](https://github.com/RT-DETRs/RT-DETRv4) | Object Detection | [demo](./examples/object-detection) | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | +| [RF-DETR](https://github.com/roboflow/rf-detr) | Object Detection | [demo](./examples/object-detection)| βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | +| [PP-PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.8/configs/picodet) | Object Detection | [demo](./examples/ocr) | ❌ | ❓ | βœ… | ❌ | ❌ | ❌ | ❌ | +| [D-FINE](https://github.com/manhbd-22022602/D-FINE) | Object Detection | [demo](./examples/object-detection) | βœ… | ❓ | βœ… | ❌ | ❌ | ❌ | ❌ | ❌ | +| [DEIM](https://github.com/ShihuaHuang95/DEIM) | Object Detection | [demo](./examples/object-detection) | βœ… | ❓ | βœ… | ❌ | ❌ | ❌ | ❌ | ❌ | +| [DEIMv2](https://github.com/Intellindust-AI-Lab/DEIMv2) | Object Detection | [demo](./examples/object-detection) | βœ… | ❓ |βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | + +
+ +
+🎨 Image Segmentation + +| Model | Task / Description | Demo | Dynamic Batch | TensorRT | FP32 | FP16 | Q8 | Q4f16 | BNB4 | +| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | +| [SAM](https://github.com/facebookresearch/segment-anything) | Segment Anything | [demo](./examples/image-segmentation) | βœ… | ❓ | βœ… | ❌ | ❌ | ❌ | ❌ | +| [SAM-HQ](https://github.com/SysCV/sam-hq) | Segment Anything | [demo](./examples/image-segmentation) | βœ… | ❓ | βœ… | ❌ | ❌ | ❌ | ❌ | +| [MobileSAM](https://github.com/ChaoningZhang/MobileSAM) | Segment Anything | [demo](./examples/image-segmentation) | βœ… |❓ | βœ… | ❌ | ❌ | ❌ | ❌ | +| [EdgeSAM](https://github.com/chongzhou96/EdgeSAM) | Segment Anything | [demo](./examples/image-segmentation) | βœ… | ❓ | βœ… | ❌ | ❌ | ❌ | ❌ | +| [YOLOE-v8/11-Prompt-Free](https://github.com/THU-MIG/yoloe) | Open-Set Detection And Segmentation | [demo](./examples/image-segmentation/yoloe_prompt_free) | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | +| [YOLOE-26-Prompt-Free](https://github.com/ultralytics/ultralytics) | Open-Set Detection And Segmentation | [demo](./examples/image-segmentation/yoloe_prompt_free) | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | +| [FastSAM](https://github.com/CASIA-IVA-Lab/FastSAM) | Instance Segmentation | [demo](./examples/image-segmentation) | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | +| [SAM2](https://github.com/facebookresearch/segment-anything-2) | Segment Anything | [demo](./examples/image-segmentation) | βœ… | ❓ | βœ… | ❌ | ❌ | ❌ | ❌ | +| [SAM3-Tracker](https://github.com/facebookresearch/segment-anything-3) | Segment Anything | [demo](./examples/image-segmentation) | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | +| [BiRefNet - COD](https://github.com/ZhengPeng7/BiRefNet) | Camouflaged Object Detection | [demo](./examples/birefnet) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | +| [BiRefNet - DIS](https://github.com/ZhengPeng7/BiRefNet) | Dichotomous Image Segmentation | [demo](./examples/birefnet) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | +| [BiRefNet - HRSOD](https://github.com/ZhengPeng7/BiRefNet) | High-Resolution Salient Object Detection | [demo](./examples/birefnet) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | +| [BiRefNet - Massive](https://github.com/ZhengPeng7/BiRefNet) | Multi-Dataset Robust Segmentation | [demo](./examples/birefnet) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | + + +
+ +
+✨ Background Removal + +| Model | Task / Description | Demo | Dynamic Batch | TensorRT | FP32 | FP16 | Q8 | Q4f16 | BNB4 | +| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | +| [RMBG](https://huggingface.co/briaai/RMBG-2.0) | Image Segmentation
Background Removal | [demo](./examples/background-removal) | βœ… | ❓ |βœ… | βœ… | βœ… | βœ… | βœ… | +| [BEN2](https://huggingface.co/PramaLLC/BEN2) | Image Segmentation
Background Removal | [demo](./examples/background-removal) | βœ… | ❓ |βœ… | βœ… | ❌ | ❌ | ❌ | + +
+ +
+πŸ‘€ Gaze Estimation + +| Model | Task / Description | Demo | Dynamic Batch | TensorRT | FP32 | FP16 | Q8 | Q4f16 | BNB4 | +| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | +| [MobileGaze](https://github.com/yakhyo/gaze-estimation) | Eye Gaze Estimation | [demo](./examples/pose-estimation) | βœ… | ❓ |βœ… | βœ… | βœ… | βœ… | βœ… | + +
+ +
+βœ‚οΈ Image Matting & Portrait Segmentation + +| Model | Task / Description | Demo | Dynamic Batch | TensorRT | FP32 | FP16 | Q8 | Q4f16 | BNB4 | +| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | +| [MODNet](https://github.com/ZHKKKe/MODNet) | Image Matting | [demo](./examples/image-matting) | βœ… | ❓ | βœ… | βœ… | βœ… | ❌ | ❌ | +| [MediaPipe Selfie](https://ai.google.dev/edge/mediapipe/solutions/vision/image_segmenter) | Image Segmentation | [demo](./examples/image-matting) | βœ… | ❓ | βœ… | βœ… | βœ… | ❌ | ❌ | +| [BiRefNet - Portrait](https://github.com/ZhengPeng7/BiRefNet) | Portrait Background Removal | [demo](./examples/birefnet) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | +| [BiRefNet - Matting](https://github.com/ZhengPeng7/BiRefNet) | Portrait Matting & Background Removal | [demo](./examples/birefnet) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | +| [BiRefNet - HR Matting](https://github.com/ZhengPeng7/BiRefNet) | High-Resolution Portrait Matting | [demo](./examples/birefnet) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | +| [BiRefNet - General](https://github.com/ZhengPeng7/BiRefNet) | General Purpose Segmentation | [demo](./examples/birefnet) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | +| [BiRefNet - HR General](https://github.com/ZhengPeng7/BiRefNet) | High-Resolution General Segmentation | [demo](./examples/birefnet) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | +| [BiRefNet - Lite General](https://github.com/ZhengPeng7/BiRefNet) | Lightweight General Segmentation (2K) | [demo](./examples/birefnet) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | +| [BiRefNet - General Tiny](https://github.com/ZhengPeng7/BiRefNet) | Lightweight General Segmentation with Swin-V1-Tiny | [demo](./examples/birefnet) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | + +
+ + + +
+πŸ—ΊοΈ Open-Set Detection & Segmentation + +| Model | Task / Description | Demo | Dynamic Batch | TensorRT | FP32 | FP16 | Q8 | Q4f16 | BNB4 | +| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | +| [GroundingDINO](https://github.com/IDEA-Research/GroundingDINO) | Open-Set Detection With Language | [demo](./examples/open-set-detection) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | +| [MM-GDINO](https://github.com/open-mmlab/mmdetection/blob/main/configs/mm_grounding_dino/README.md) | Open-Set Detection With Language | [demo](./examples/open-set-detection) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | +| [LLMDet](https://github.com/iSEE-Laboratory/LLMDet) | Open-Set Detection With Language | [demo](./examples/open-set-detection) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | +| [OWLv2](https://huggingface.co/google/owlv2-base-patch16-ensemble) | Open-Set Object Detection | [demo](./examples/open-set-detection) | βœ… | ❓ | βœ… | βœ… | ❌ | ❌ | ❌ | +| [YOLO-World](https://github.com/AILab-CVC/YOLO-World) | Open-Set Detection With Language | [demo](./examples/yolo) | βœ… | βœ… |βœ… | βœ… | βœ… | βœ… | βœ… | +| [YOLOE-Prompt-Based](https://github.com/THU-MIG/yoloe) | Open-Set Detection And Segmentation | [demo](./examples/open-set-segmentation) | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | +| [YOLOE-26-Prompt-Based](https://github.com/ultralytics/ultralytics) | Open-Set Detection And Segmentation | [demo](./examples/open-set-segmentation) | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | +| [SAM3-Image](https://github.com/facebookresearch/segment-anything-3) | Open-Set Detection And Segmentation| [demo](./examples/open-set-segmentation) | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | + +
+ + +
+πŸƒ Multi-Object Tracking + +| Model | Task / Description | Demo | Dynamic Batch | TensorRT | FP32 | FP16 | Q8 | Q4f16 | BNB4 | +| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | +| [ByteTrack](https://github.com/FoundationVision/ByteTrack) | Multi-Object Tracking | [demo](./examples/mot) | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | + +
+ + +
+πŸ’Ž Image Super-Resolution + +| Model | Task / Description | Demo | Dynamic Batch | TensorRT | FP32 | FP16 | Q8 | Q4f16 | BNB4 | +| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | +| [Swin2SR](https://github.com/mv-lab/swin2sr) | Image Restoration | [demo](./examples/super-resolution) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | +| [APISR](https://github.com/Kiteretsu77/APISR) | Anime Super-Resolution | [demo](./examples/super-resolution) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | + +
+ + +
+🀸 Pose Estimation + +| Model | Task / Description | Demo | Dynamic Batch | TensorRT | FP32 | FP16 | Q8 | Q4f16 | BNB4 | +| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | +| [RTMPose](https://github.com/open-mmlab/mmpose/tree/dev-1.x/projects/rtmpose) | Keypoint Detection | [demo](./examples/pose-estimation) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | +| [DWPose](https://github.com/IDEA-Research/DWPose) | Keypoint Detection | [demo](./examples/pose-estimation) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | +| [RTMW](https://arxiv.org/abs/2407.08634) | Keypoint Detection | [demo](./examples/pose-estimation) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | +| [RTMO](https://github.com/open-mmlab/mmpose/tree/main/projects/rtmo) | Keypoint Detection | [demo](./examples/pose-estimation) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | ❌ | + +
+ +
+πŸ” OCR & Document Understanding + +| Model | Task / Description | Demo | Dynamic Batch | TensorRT | FP32 | FP16 | Q8 | Q4f16 | BNB4 | +| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | +| [DB (PaddleOCR det v4 / v5)](https://arxiv.org/abs/1911.08947) | Text Detection | [demo](./examples/ocr) | βœ… | ❓ | βœ… | βœ… | ❌ | ❌ | ❌ | +| [FAST](https://github.com/czczup/FAST) | Text Detection | [demo](./examples/ocr) | βœ… | ❓ | βœ… | βœ… | ❌ | ❌ | ❌ | +| [LinkNet](https://arxiv.org/abs/1707.03718) | Text Detection | [demo](./examples/ocr) | βœ… | ❓ | βœ… | βœ… | ❌ | ❌ | ❌ | +| [SVTR (PaddleOCR rec v4 / v5)](https://arxiv.org/abs/2205.00159) | Text Recognition | [demo](./examples/ocr) | βœ… | ❓ | βœ… | βœ… | ❌ | ❌ | ❌ | +| [TrOCR](https://huggingface.co/microsoft/trocr-base-printed) | Text Recognition | [demo](./examples/ocr) | βœ… | ❓ | βœ… | βœ… | ❌ | ❌ | ❌ | +| [SLANet (PaddleOCR tab v4 / v5)](https://paddlepaddle.github.io/PaddleOCR/latest/algorithm/table_recognition/algorithm_table_slanet.html) | Table Recognition | [demo](./examples/ocr) | βœ… | ❓ | βœ… | βœ… | ❌ | ❌ | ❌ | +| [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO) | Object Detection | [demo](./examples/ocr) | βœ… | ❓ | βœ… | βœ… | βœ… | ❌ | ❌ | +| [PP-DocLayout-v1-Plus-L](https://huggingface.co/PaddlePaddle/PP-DocLayout_plus-L) | Object Detection | [demo](./examples/ocr) | βœ… | βœ… | βœ… | βœ… | βœ… | ❌ | ❌ | +| [PP-DocLayout-v2](https://huggingface.co/PaddlePaddle/PP-DocLayoutV2) | Object Detection | [demo](./examples/ocr) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | +| [PP-DocLayout-v3](https://huggingface.co/PaddlePaddle/PP-DocLayoutV3) | Object Detection | [demo](./examples/ocr) | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | + +
+ +
+🧩 Vision-Language Models (VLM) + +| Model | Task / Description | Demo | Dynamic Batch | TensorRT | FP32 | FP16 | Q8 | Q4f16 | BNB4 | +| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | +| [BLIP](https://github.com/salesforce/BLIP) | Image Captioning | [demo](./examples/vlm) | βœ… | ❓ | βœ… |❓ | ❌ | ❌ | ❌ | +| [Florence2](https://arxiv.org/abs/2311.06242) | A Variety of Vision Tasks | [demo](./examples/vlm) | βœ… | ❓ | βœ… |βœ… | ❌ | ❌ | ❌ | +| [Moondream2](https://github.com/vikhyat/moondream/tree/main) | Open-Set Object Detection
Open-Set Keypoints Detection
Image Captioning
Visual Question Answering | [demo](./examples/vlm) | βœ… | ❓ | ❌ | ❌ |βœ… | βœ… | ❌ | +| [SmolVLM](https://huggingface.co/HuggingFaceTB/SmolVLM-256M-Instruct) | Visual Question Answering | [demo](./examples/vlm) | βœ… | ❓| βœ… | ❓ | ❓ | ❓ | ❓ | +| [SmolVLM2](https://huggingface.co/HuggingFaceTB/SmolVLM-256M-Instruct) | Visual Question Answering | [demo](./examples/vlm) | βœ… | ❓| βœ… | ❓ | ❓ | ❓ | ❓ | +| [FastVLM](https://github.com/apple/ml-fastvlm) | Vision Language Models | [demo](./examples/vlm) | βœ… | ❓ | βœ… | βœ…|βœ… | βœ… | βœ… | + +
+ + +
+🧬 Embedding Model + +| Model | Task / Description | Demo | Dynamic Batch | TensorRT | FP32 | FP16 | Q8 | Q4f16 | BNB4 | +| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | +| [CLIP](https://github.com/openai/CLIP) | Vision-Language Embedding | [demo](./examples/embedding) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | +| [jina-clip-v1](https://huggingface.co/jinaai/jina-clip-v1) | Vision-Language Embedding | [demo](./examples/embedding) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | +| [jina-clip-v2](https://huggingface.co/jinaai/jina-clip-v2) | Vision-Language Embedding | [demo](./examples/embedding) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | +| [mobileclip](https://github.com/apple/ml-mobileclip) | Vision-Language Embedding | [demo](./examples/embedding) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | +| [SigLIP](https://huggingface.co/collections/google/siglip) | Vision-Language Embedding | [demo](./examples/embedding) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | +| [SigLIPv2](https://huggingface.co/collections/google/siglip2) | Vision-Language Embedding | [demo](./examples/embedding) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | +| [DINOv2](https://github.com/facebookresearch/dinov2) | Vision Embedding | [demo](./examples/embedding) | βœ… | ❓ | βœ… | ❌ | ❌ | ❌ | ❌ | +| [DINOv3](https://github.com/facebookresearch/dinov3) | Vision Embedding | [demo](./examples/embedding) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | +
+ +
+πŸ“ Depth Estimation + +| Model | Task / Description | Demo | Dynamic Batch | TensorRT | FP32 | FP16 | Q8 | Q4f16 | BNB4 | +| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | +| [DepthAnything v1](https://github.com/LiheYoung/Depth-Anything) | Monocular Depth Estimation | [demo](./examples/depth-estimation) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | +| [DepthAnything v2](https://github.com/LiheYoung/Depth-Anything) | Monocular Depth Estimation | [demo](./examples/depth-estimation) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | +| [DepthPro](https://github.com/apple/ml-depth-pro) | Monocular Depth Estimation | [demo](./examples/depth-estimation) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | +| [Depth-Anything-3](https://github.com/ByteDance-Seed/Depth-Anything-3) | Monocular
Metric
Multi-View | [demo](./examples/depth-estimation) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | + +
+ + +
+🌌 Others + +| Model | Task / Description | Demo | Dynamic Batch | TensorRT | FP32 | FP16 | Q8 | Q4f16 | BNB4 | +| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | +| [Sapiens](https://github.com/facebookresearch/sapiens/tree/main) | Foundation for Human Vision Models | [demo](./examples/sapiens) | βœ… | ❓ | βœ… | βœ… | βœ… | βœ… | βœ… | +| [YOLOPv2](https://arxiv.org/abs/2208.11434) | Panoptic Driving | [demo](./examples/image-segmentation) | βœ… | ❓ | βœ… | ❌ | ❌ | ❌ | ❌ | + +
+ ## 🀝 Contributing This is a personal project maintained in spare time, so progress on performance optimization and new model support may vary. diff --git a/docs/model-zoo/ocr.md b/docs/model-zoo/ocr.md index 324cb6b..a3c9a3d 100644 --- a/docs/model-zoo/ocr.md +++ b/docs/model-zoo/ocr.md @@ -9,12 +9,12 @@ hide: | Model | Task / Description | Demo | Dynamic Batch | TensorRT | FP32 | FP16 | Q8 | Q4f16 | BNB4 | | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | -| [DB](https://arxiv.org/abs/1911.08947) | Text Detection | [demo](https://github.com/jamjamjon/usls/tree/main/examples/ocr) | βœ… | ❓ | βœ… | βœ… | ❌ | ❌ | ❌ | +| [DB (PaddleOCR det v4 / v5)](https://arxiv.org/abs/1911.08947) | Text Detection | [demo](https://github.com/jamjamjon/usls/tree/main/examples/ocr) | βœ… | ❓ | βœ… | βœ… | ❌ | ❌ | ❌ | | [FAST](https://github.com/czczup/FAST) | Text Detection | [demo](https://github.com/jamjamjon/usls/tree/main/examples/ocr) | βœ… | ❓ | βœ… | βœ… | ❌ | ❌ | ❌ | | [LinkNet](https://arxiv.org/abs/1707.03718) | Text Detection | [demo](https://github.com/jamjamjon/usls/tree/main/examples/ocr) | βœ… | ❓ | βœ… | βœ… | ❌ | ❌ | ❌ | -| [SVTR](https://arxiv.org/abs/2205.00159) | Text Recognition | [demo](https://github.com/jamjamjon/usls/tree/main/examples/ocr) | βœ… | ❓ | βœ… | βœ… | ❌ | ❌ | ❌ | +| [SVTR (PaddleOCR rec v4 / v5)](https://arxiv.org/abs/2205.00159) | Text Recognition | [demo](https://github.com/jamjamjon/usls/tree/main/examples/ocr) | βœ… | ❓ | βœ… | βœ… | ❌ | ❌ | ❌ | | [TrOCR](https://huggingface.co/microsoft/trocr-base-printed) | Text Recognition | [demo](https://github.com/jamjamjon/usls/tree/main/examples/ocr) | βœ… | ❓ | βœ… | βœ… | ❌ | ❌ | ❌ | -| [SLANet](https://paddlepaddle.github.io/PaddleOCR/latest/algorithm/table_recognition/algorithm_table_slanet.html) | Table Recognition | [demo](https://github.com/jamjamjon/usls/tree/main/examples/ocr) | βœ… | ❓ | βœ… | βœ… | ❌ | ❌ | ❌ | +| [SLANet (PaddleOCR tab v4 / v5)](https://paddlepaddle.github.io/PaddleOCR/latest/algorithm/table_recognition/algorithm_table_slanet.html) | Table Recognition | [demo](https://github.com/jamjamjon/usls/tree/main/examples/ocr) | βœ… | ❓ | βœ… | βœ… | ❌ | ❌ | ❌ | | [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO) | Object Detection | [demo](https://github.com/jamjamjon/usls/tree/main/examples/ocr) | βœ… | ❓ | βœ… | βœ… | βœ… | ❌ | ❌ | | [PP-DocLayout-v1-Plus-L](https://huggingface.co/PaddlePaddle/PP-DocLayout_plus-L) | Object Detection | [demo](https://github.com/jamjamjon/usls/tree/main/examples/ocr) | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | | [PP-DocLayout-v2](https://huggingface.co/PaddlePaddle/PP-DocLayoutV2) | Object Detection | [demo](https://github.com/jamjamjon/usls/tree/main/examples/ocr) | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… |