Bag of Features

This project implements an image similarity search system using the Bag of Features model with:

SIFT features
k-means visual vocabulary
TF-IDF weighted histograms
Cosine similarity

Backend: FastAPI
Frontend: React/Vite

Project Structure

VMM/
│
├── backend/
├── frontend/
│
├── data/
│   ├── descriptors.pkl
│   ├── kmeans_model.pkl
│   ├── idf.npy
│   └── bow_vectors.pkl
│
├── images/          
│
├── src/
│   ├── extract_sift.py
│   ├── build_dictionary.py
│   ├── compute_bow.py
│   └── search.py
│
├── uploads/
├── requirements.txt
└── README.md

Algorithm Overview

1. Extracting SIFT Descriptors

Each image from /images is converted into SIFT descriptors using:

sift = cv2.SIFT_create(nfeatures=2000)
kp, des = sift.detectAndCompute(img, None)

descriptors[filename] = des

This means:

At most 2000 keypoints are extracted.
Each keypoint becomes a 128-dimensional descriptor.
Output: matrix des of shape (N, 128).
Descriptors saved to data/descriptors.pkl.

2. Building Visual Vocabulary (k-means)

Instead of using all descriptors (which may be millions), the code samples only 200 random descriptors per image:

idx = np.random.choice(des.shape[0], 200, replace=False)
sample = des[idx]

All samples are concatenated:

all_samples = np.vstack(all_samples)

Then k-means with K = 700 clusters is trained:

kmeans = MiniBatchKMeans(n_clusters=K, batch_size=2000, verbose=1)
kmeans.fit(all_samples)

Each centroid of k-means becomes a visual word.

The vocabulary is saved as: data/kmeans_model.pkl Training k-means:

kmeans = MiniBatchKMeans(n_clusters=700, batch_size=2000)

3. Constructing TF-IDF BoW

Quantizing descriptors into visual words:

labels = kmeans.predict(des)

This transforms each descriptor into an integer cluster index [0 … K-1].

Building the histogram:

hist, _ = np.histogram(labels, bins=np.arange(K + 1))

L1 normalization:

hist = hist.astype(float)
hist /= hist.sum()

This ensures all histograms are comparable across images.

Computing IDF:

idf = np.log((N + 1) / (df + 1))

Where df is updated using:

df += (hist > 0).astype(int)

TF-IDF + L2 normalization

tfidf = hist * idf
tfidf = tfidf / np.linalg.norm(tfidf)

The final vectors are saved to: data/bow_vectors.pkl

4. Searching for Similar Images

Given a query image, the function search_similar_image() performs the entire retrieval pipeline.

Extract SIFT from quer

des_query = extract_sift_from_image(query_image)

Quantize query descriptor

labels = kmeans.predict(des_query)

Build query histogra

hist, _ = np.histogram(labels, bins=np.arange(K + 1))

L1 normalizatio

hist /= hist.sum()

Apply TF-ID

tfidf = hist * idf

L2 normalizatio

tfidf = tfidf / np.linalg.norm(tfidf)

Cosine similarity with database

This line performs the actual similarity computation:

score = np.dot(tfidf, bow)

Because all vectors are L2-normalized:

cosine_similarity = dot product

Sort results

results.sort(key=lambda x: -x[1])

This ranking is based on TF-IDF cosine similarity.

Backend (FastAPI)

Handles uploads, calls search function, returns similarity results.

@app.post("/upload")
async def upload_file(file: UploadFile = File(...)):
    file_path = os.path.join(UPLOAD_DIR, file.filename)

    with open(file_path, "wb") as f:
        f.write(await file.read())

    result = search_similar_image(file_path)

    return JSONResponse({"uploaded_file": file_path, "result": result})

Frontend (React)

Allows user to upload an image and displays similar images returned by backend.

Installation

pip install -r requirements.txt

Build index:

python src/extract_sift.py
python src/build_dictionary.py
python src/compute_bow.py

Run backend:

uvicorn backend.main:app --reload

Run frontend:

npm install
npm run dev

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
backend		backend
data		data
frontend		frontend
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Bag of Features

Project Structure

Algorithm Overview

1. Extracting SIFT Descriptors

Each image from /images is converted into SIFT descriptors using:

This means:

2. Building Visual Vocabulary (k-means)

Instead of using all descriptors (which may be millions), the code samples only 200 random descriptors per image:

All samples are concatenated:

Then k-means with K = 700 clusters is trained:

Each centroid of k-means becomes a visual word.

3. Constructing TF-IDF BoW

Quantizing descriptors into visual words:

Building the histogram:

L1 normalization:

Computing IDF:

TF-IDF + L2 normalization

4. Searching for Similar Images

Extract SIFT from quer

Quantize query descriptor

Build query histogra

L1 normalizatio

Apply TF-ID

L2 normalizatio

Cosine similarity with database

Sort results

Backend (FastAPI)

Frontend (React)

Installation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages