Windows compatibility issues with installation and embedding pipeline

# Windows compatibility issues with installation and embedding pipeline

## Summary

Several issues prevent a smooth experience on Windows. Tested on Windows 11 with Python 3.12, CUDA 12.8 (Quadro P2000).

## Issues

### 1. DataLoader crash: unpicklable lambda in `ImageFolderDataset`
Windows uses `spawn` (not `fork`) for multiprocessing. `ImageFolderDataset._init_uuid_generator()` assigns lambdas to `self.generate_uuid`, which cannot be pickled when `num_workers > 0`.

```
AttributeError: Can't get local object 'ImageFolderDataset._init_uuid_generator.<locals>.<lambda>'
```

**Fix:** Set `num_workers=0` on Windows in `embedding_service.py`, or replace the lambdas in `hpc-inference` with a regular method.

### 2. `gpu-cu12` / `gpu-cu13` extras fail on Windows
cuML (RAPIDS) only ships Linux wheels. `uv pip install -e ".[gpu-cu12]"` fails with `RuntimeError: Didn't find wheel for cuml-cu12`.

**Fix:** Document that GPU extras (cuML) are Linux-only. Embedding still uses CUDA via PyTorch on Windows.

### 3. PyTorch installs as CPU-only by default
`uv pip install -e .` pulls the CPU-only `torch` from PyPI on Windows. Users must manually reinstall from PyTorch's CUDA index:
```bash
uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
```

**Fix:** Add Windows-specific CUDA PyTorch install instructions to README.

### 4. Missing `pynvml` dependency
Embedding fails at runtime with `No module named 'pynvml'`.

**Fix:** Add `pynvml` or `nvidia-ml-py` to project dependencies.

### 5. README venv activation command is Linux-only
`source .venv/bin/activate` does not exist on Windows. The correct path is `.venv\Scripts\activate`.

**Fix:** Add Windows activation instructions to README.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Windows compatibility issues with installation and embedding pipeline #21

Windows compatibility issues with installation and embedding pipeline

Summary

Issues

1. DataLoader crash: unpicklable lambda in `ImageFolderDataset`

2. `gpu-cu12` / `gpu-cu13` extras fail on Windows

3. PyTorch installs as CPU-only by default

4. Missing `pynvml` dependency

5. README venv activation command is Linux-only

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Windows compatibility issues with installation and embedding pipeline #21

Description

Windows compatibility issues with installation and embedding pipeline

Summary

Issues

1. DataLoader crash: unpicklable lambda in ImageFolderDataset

2. gpu-cu12 / gpu-cu13 extras fail on Windows

3. PyTorch installs as CPU-only by default

4. Missing pynvml dependency

5. README venv activation command is Linux-only

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. DataLoader crash: unpicklable lambda in `ImageFolderDataset`

2. `gpu-cu12` / `gpu-cu13` extras fail on Windows

4. Missing `pynvml` dependency