Phase 2: Distributed Training Correctness by ALJainProjects · Pull Request #3 · ALJainProjects/TurboLoader

ALJainProjects · 2026-02-09T00:33:20Z

Summary

Replace biased custom shuffle with proper Fisher-Yates using std::mt19937 in DistributedSampler
Upgrade HashBasedSharding::hash_index() from weak FNV-1a to splitmix64 for uniform shard distribution across GPUs
Fix checkpoint resume for large shuffled datasets: regenerate shuffle from seed+epoch instead of storing full permutation

Test plan

Added hash balance test: 10000 samples across 8 ranks within ±5% of expected
Added Fisher-Yates permutation test: no duplicates in shuffled output
Added epoch determinism tests: same seed+epoch → same order, different epochs → different order

…resume - Replace biased shuffle with proper Fisher-Yates using std::mt19937 - Upgrade hash_index() from weak FNV-1a to splitmix64 for uniform shard distribution - Fix checkpoint resume for large shuffled datasets via seed+epoch regeneration - Add tests for hash balance, shuffle permutation correctness, and epoch determinism

The shuffle tests reference turboloader::distributed::DistributedConfig and DistributedSampler which live in distributed_dataloader.hpp, not sharding_strategies.hpp.

ALJainProjects added 2 commits February 8, 2026 19:03

Fix missing include for DistributedConfig/DistributedSampler in test

fcb9a60

The shuffle tests reference turboloader::distributed::DistributedConfig and DistributedSampler which live in distributed_dataloader.hpp, not sharding_strategies.hpp.

ALJainProjects merged commit 491d46a into main Feb 9, 2026
7 checks passed

ALJainProjects deleted the phase2/distributed-training-fixes branch February 9, 2026 00:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 2: Distributed Training Correctness#3

Phase 2: Distributed Training Correctness#3
ALJainProjects merged 2 commits into
mainfrom
phase2/distributed-training-fixes

ALJainProjects commented Feb 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ALJainProjects commented Feb 9, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant