-
Notifications
You must be signed in to change notification settings - Fork 260
Description
Describe the bug
FOG’s Image Replicator reports that image replication has completed successfully, but the storage nodes end up with inconsistent, partially synced, or corrupted image files. Large Windows images (~190GB) frequently result in missing .img files, zero-byte files, or mismatched partition contents between nodes. This occurs with no error logged. Replication cannot be trusted and must be replaced with manual rsync.
To Reproduce
Steps to reproduce the behavior:
- Capture or upload a large multi-partition Windows image (~150–200GB total).
- Allow FOG Replicator to sync the image from the master node to multiple storage nodes.
- After Replicator reports “complete,” compare replica directories on each node.
- Observe that some nodes have missing, zero-byte, or mismatched .img files.
Expected behavior
Replication should produce byte-identical image copies across all storage nodes.
FOG should not report replication success unless:
- all files are fully copied,
- checksums match, and
- partial or corrupted transfers are detected and retried.
Screenshots
N/A
Software (please complete the following information):
- FOG version: 1.5.10.1721
- FOS kernel 6.12.35
- OS: Ubuntu 22.04 LTS (Azure Kernel 6.14.0-1014-azure)
Additional context
Environment uses 1 master + 3 storage nodes.
Some nodes receive correct files; others do not, in the same replication cycle.
There is no checksum-based verification in FOG replication.
Manual rsync with checksum (-c) reliably produces correct results:
rsync -avhc --inplace --progress /images/<IMAGE>/ fog@<node>:/images/<IMAGE>/
Due to this issue, all storage nodes must be disabled during deployments, and all replication performed manually.
We can provide logs, sample image directories, and test runs to assist debugging.