Zennolab_tt

Approach

Used multi-modal zero-shot model CLIPSeg and converted mask to bounding boxes. Tried zero-shot detection models from Autodistill, all of them showed worse performance globally, but better in some categories.

Launch

git clone https://github.com/grk717/Zennolab_tt.git
cd Zennolab_tt
docker build -t zennolab .
docker run -t --name zenno --gpus all zennolab

Then wait for results in terminal.

Results

Inferenced with batch_size=16 on 3050Ti GPU. Inference took 16.13 minutes.

Category	Accuracy	Mean distance
squirrels_head	0.871	0.061
squirrels_tail	0.238	0.192
the_center_of_the_gemstone	0.884	0.047
the_center_of_the_koalas_nose	0.281	0.134
the_center_of_the_owls_head	0.952	0.036
the_center_of_the_seahorses_head	0.841	0.065
the_center_of_the_teddy_bear_nose	0.459	0.118
Global	0.648	0.089

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Dockerfile		Dockerfile
README.md		README.md
data_utils.py		data_utils.py
evaluator.py		evaluator.py
requirements.txt		requirements.txt
script.py		script.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zennolab_tt

Approach

Launch

Results

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Zennolab_tt

Approach

Launch

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages