This project implements an image similarity search system using the Bag of Features model with:
- SIFT features
- k-means visual vocabulary
- TF-IDF weighted histograms
- Cosine similarity
Backend: FastAPI
Frontend: React/Vite
VMM/
│
├── backend/
├── frontend/
│
├── data/
│ ├── descriptors.pkl
│ ├── kmeans_model.pkl
│ ├── idf.npy
│ └── bow_vectors.pkl
│
├── images/
│
├── src/
│ ├── extract_sift.py
│ ├── build_dictionary.py
│ ├── compute_bow.py
│ └── search.py
│
├── uploads/
├── requirements.txt
└── README.md
sift = cv2.SIFT_create(nfeatures=2000)
kp, des = sift.detectAndCompute(img, None)
descriptors[filename] = des- At most 2000 keypoints are extracted.
- Each keypoint becomes a 128-dimensional descriptor.
- Output: matrix des of shape (N, 128).
- Descriptors saved to
data/descriptors.pkl.
Instead of using all descriptors (which may be millions), the code samples only 200 random descriptors per image:
idx = np.random.choice(des.shape[0], 200, replace=False)
sample = des[idx]all_samples = np.vstack(all_samples)kmeans = MiniBatchKMeans(n_clusters=K, batch_size=2000, verbose=1)
kmeans.fit(all_samples)The vocabulary is saved as: data/kmeans_model.pkl
Training k-means:
kmeans = MiniBatchKMeans(n_clusters=700, batch_size=2000)labels = kmeans.predict(des)This transforms each descriptor into an integer cluster index [0 … K-1].
hist, _ = np.histogram(labels, bins=np.arange(K + 1))hist = hist.astype(float)
hist /= hist.sum()This ensures all histograms are comparable across images.
idf = np.log((N + 1) / (df + 1))Where df is updated using:
df += (hist > 0).astype(int)tfidf = hist * idf
tfidf = tfidf / np.linalg.norm(tfidf)The final vectors are saved to: data/bow_vectors.pkl
Given a query image, the function search_similar_image() performs the entire retrieval pipeline.
des_query = extract_sift_from_image(query_image)labels = kmeans.predict(des_query)hist, _ = np.histogram(labels, bins=np.arange(K + 1))hist /= hist.sum()tfidf = hist * idftfidf = tfidf / np.linalg.norm(tfidf)This line performs the actual similarity computation:
score = np.dot(tfidf, bow)Because all vectors are L2-normalized:
cosine_similarity = dot productresults.sort(key=lambda x: -x[1])This ranking is based on TF-IDF cosine similarity.
Handles uploads, calls search function, returns similarity results.
@app.post("/upload")
async def upload_file(file: UploadFile = File(...)):
file_path = os.path.join(UPLOAD_DIR, file.filename)
with open(file_path, "wb") as f:
f.write(await file.read())
result = search_similar_image(file_path)
return JSONResponse({"uploaded_file": file_path, "result": result})Allows user to upload an image and displays similar images returned by backend.
pip install -r requirements.txtBuild index:
python src/extract_sift.py
python src/build_dictionary.py
python src/compute_bow.pyRun backend:
uvicorn backend.main:app --reloadRun frontend:
npm install
npm run dev