Skip to content

varosick/VMM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bag of Features

This project implements an image similarity search system using the Bag of Features model with:

  • SIFT features
  • k-means visual vocabulary
  • TF-IDF weighted histograms
  • Cosine similarity

Backend: FastAPI
Frontend: React/Vite


Project Structure

VMM/
│
├── backend/
├── frontend/
│
├── data/
│   ├── descriptors.pkl
│   ├── kmeans_model.pkl
│   ├── idf.npy
│   └── bow_vectors.pkl
│
├── images/          
│
├── src/
│   ├── extract_sift.py
│   ├── build_dictionary.py
│   ├── compute_bow.py
│   └── search.py
│
├── uploads/
├── requirements.txt
└── README.md

Algorithm Overview

1. Extracting SIFT Descriptors

Each image from /images is converted into SIFT descriptors using:

sift = cv2.SIFT_create(nfeatures=2000)
kp, des = sift.detectAndCompute(img, None)

descriptors[filename] = des

This means:

  • At most 2000 keypoints are extracted.
  • Each keypoint becomes a 128-dimensional descriptor.
  • Output: matrix des of shape (N, 128).
  • Descriptors saved to data/descriptors.pkl.

2. Building Visual Vocabulary (k-means)

Instead of using all descriptors (which may be millions), the code samples only 200 random descriptors per image:

idx = np.random.choice(des.shape[0], 200, replace=False)
sample = des[idx]

All samples are concatenated:

all_samples = np.vstack(all_samples)

Then k-means with K = 700 clusters is trained:

kmeans = MiniBatchKMeans(n_clusters=K, batch_size=2000, verbose=1)
kmeans.fit(all_samples)

Each centroid of k-means becomes a visual word.

The vocabulary is saved as: data/kmeans_model.pkl Training k-means:

kmeans = MiniBatchKMeans(n_clusters=700, batch_size=2000)

3. Constructing TF-IDF BoW

Quantizing descriptors into visual words:

labels = kmeans.predict(des)

This transforms each descriptor into an integer cluster index [0 … K-1].

Building the histogram:

hist, _ = np.histogram(labels, bins=np.arange(K + 1))

L1 normalization:

hist = hist.astype(float)
hist /= hist.sum()

This ensures all histograms are comparable across images.

Computing IDF:

idf = np.log((N + 1) / (df + 1))

Where df is updated using:

df += (hist > 0).astype(int)

TF-IDF + L2 normalization

tfidf = hist * idf
tfidf = tfidf / np.linalg.norm(tfidf)

The final vectors are saved to: data/bow_vectors.pkl

4. Searching for Similar Images

Given a query image, the function search_similar_image() performs the entire retrieval pipeline.

Extract SIFT from quer

des_query = extract_sift_from_image(query_image)

Quantize query descriptor

labels = kmeans.predict(des_query)

Build query histogra

hist, _ = np.histogram(labels, bins=np.arange(K + 1))

L1 normalizatio

hist /= hist.sum()

Apply TF-ID

tfidf = hist * idf

L2 normalizatio

tfidf = tfidf / np.linalg.norm(tfidf)

Cosine similarity with database

This line performs the actual similarity computation:

score = np.dot(tfidf, bow)

Because all vectors are L2-normalized:

cosine_similarity = dot product

Sort results

results.sort(key=lambda x: -x[1])

This ranking is based on TF-IDF cosine similarity.

Backend (FastAPI)

Handles uploads, calls search function, returns similarity results.

@app.post("/upload")
async def upload_file(file: UploadFile = File(...)):
    file_path = os.path.join(UPLOAD_DIR, file.filename)

    with open(file_path, "wb") as f:
        f.write(await file.read())

    result = search_similar_image(file_path)

    return JSONResponse({"uploaded_file": file_path, "result": result})

Frontend (React)

Allows user to upload an image and displays similar images returned by backend.


Installation

pip install -r requirements.txt

Build index:

python src/extract_sift.py
python src/build_dictionary.py
python src/compute_bow.py

Run backend:

uvicorn backend.main:app --reload

Run frontend:

npm install
npm run dev

About

BOF

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors