We study how feature design affects binary floor-vs-non-floor segmentation when using a single classifier (SVM) on a COCO-annotated dataset. The experiments move from minimal features (raw color) to richer representations (spatial layout, region summaries, texture), so we can see what each layer of structure contributes.
Rather than jumping to deep learning, we wanted to see how far classical CV + SVM can go when we systematically add structure to the input: first color only, then color + position, then region-level color, and finally texture (HOG). The same pipeline and dataset are used across experiments so comparisons are fair.
We use the Floor Segmentation dataset (Roboflow):
- Roboflow: Floor Segmentation (universe.roboflow.com)
- Google Drive (short): Drive link
Details:
- Format: COCO segmentation
- Labels: Binary — floor (1) vs non-floor (0), with pixel-level masks
- Content: Indoor scenes, mixed lighting and textures
Download and point the notebooks to the train split (paths are set in each notebook).
We run four strategies, each in its own notebook. They share the same train/test protocol and evaluation; only the feature extraction and unit of prediction change.
Pixels are sampled at random; each sample is just (R, G, B). No position or context.
- Features: (R, G, B)
- Unit: Pixel
- Classifier: SVM
Result: weak and noisy; color alone is not enough and is sensitive to lighting.
Same pixel sampling, but we add normalized (x, y) so the model can use where in the image the pixel is.
- Features: (R, G, B, x, y)
- Unit: Pixel
- Classifier: SVM
Result: large gain over A; layout is very informative (e.g. floor near bottom). Still no texture.
Image is tiled into fixed-size blocks; each block is one sample. Features are mean RGB and mean (x, y) over the block.
- Features: (mean R, mean G, mean B, mean x, mean y)
- Unit: Region (e.g. 32×32)
- Classifier: SVM
Result: best trade-off in our setup — stable, good accuracy, blocky but clean boundaries. Less noise than per-pixel.
Patches are described by HOG; no color. We use two data regimes: small and large.
-
Features: HOG descriptor
-
Unit: Patch
-
Classifier: SVM
-
Small data: Overfits (high train, low test).
-
Large data: Generalizes well and can reach strong recall.
HOG is discriminative but needs enough data and does not use color.
| Strategy | Features | Test accuracy (approx.) | Notes |
|---|---|---|---|
| A | RGB | ~0.66 | Poor, noisy |
| B | RGB + (x,y) | ~0.86 | Good |
| C | Region mean RGB + (x,y) | ~0.91 | Very good |
| D (small) | HOG | ~0.70 | Overfitting |
| D (large) | HOG | ~0.95 | Strong |
- Low C: wider margin, more tolerance for errors, simpler boundary, less overfitting (can underfit).
- High C: fits training data tightly, complex boundary, more overfitting.
In our runs: pixel-level (A, B) worked best with moderate C; region-based (C) was less sensitive; HOG (D) overfit badly with high C on small data.
- Linear: Fast and stable; worked well for B and C (color + layout, region).
- RBF: More expressive; helped D (HOG) when data was sufficient, but overfit on small HOG sets.
- Polynomial: No clear benefit here; slower and less stable, so we did not rely on it.
- Raw RGB alone is not sufficient for reliable floor segmentation.
- Adding (x, y) brings a big improvement; layout matters a lot.
- Region-level (tiled) color + layout gave the best balance of robustness and simplicity in our setting.
- HOG works well with enough data but is data-hungry and ignores color.
- Careful feature design matters as much as the choice of classifier.
├── approach1_pixel_rgb.ipynb → Strategy A (color-only)
├── approach2_pixel_rgb_xy.ipynb → Strategy B (color + layout)
├── approach3_region_rgb_xy.ipynb → Strategy C (tiled color + layout)
├── approach4_hog_small_dataset.ipynb
├── approach4_hog_large_dataset.ipynb → Strategy D (HOG, two data sizes)
└── README.md
- Hybrid HOG + color features.
- Handling class imbalance (e.g. sampling or weighting).
- Comparison with a small deep segmentation model.
- Cross-dataset evaluation.