Distributed-File-Storage-System is a high-performance, decentralized file storage infrastructure engineered in Go. It solves the "Large File" problem in peer-to-peer networks by combining Content-Addressable Storage (CAS), Custom TCP Transport, and Streaming Cryptography into a single, cohesive engine.
Designed for scalability and resilience, this system eliminates the need for central authority while providing industrial-grade data integrity and security—mirroring the architectural principles found in large-scale distributed systems like Cloud Object Storage or BitTorrent.
- Scalability: Designed to handle arbitrarily large files without memory spikes.
- Resilience: A decentralized "Shared Nothing" architecture where any node can fail without data loss.
- Observability: Consistent logging and clear separation of transport vs. storage concerns.
- Reliability: Self-correcting stream logic with proper synchronization.
Our design prioritizes Zero-Buffer I/O and Stateless Discovery. Unlike centralized storage, every node in this network acts as both an indexer and a provider.
graph TD
subgraph "External API / CLI"
A[User Request] -->|Upload/Download| B[FileServer Orchestrator]
end
subgraph "Internal Engine (Node)"
B --> C{Orchestrator}
C -->|Stream| D[Crypto Layer: AES-CTR]
C -->|Hash & Path| E[CAS Store Engine]
C -->|Broadcast| F[P2P Transport: TCP]
end
subgraph "Distributed Network"
F <---|"Custom Binary Protocol" ---> G[Peer Cluster]
end
D -->|Encrypted Chunk| E
E -->|O(1) Write| J[(Local Storage: Sharded Hierarchy)]
.
├── bin/ # Compiled binaries
├── docs/ # Technical specifications & design docs
├── p2p/ # Peer-to-peer networking logic (TCP, Encoding)
├── recovered_files/ # Default directory for downloaded/recovered files
├── test_files/ # Sample files for cluster simulation
├── crypto.go # Streaming AES-256 CTR implementation
├── main.go # CLI entry point & cluster orchestrator
├── server.go # FileServer core logic (Upload/Download coordination)
├── store.go # Content-Addressable Storage (CAS) engine
├── Makefile # Automation for building and running nodes
└── go.mod # Go module definition
Traditional storage uses file names. This system uses Mathematical Identity.
- The Process: Files are hashed using SHA1. The hash is the pointer.
- The Value:
- Self-Verifying Integrity: The data address is a cryptographic proof of its content.
- Global Deduplication: Identical data across the network occupies exactly one address.
- Scalability: Uses a Sharded Directory Hierarchy (e.g.,
/af32d/12c3b/...) to prevent directory "hotspotting" and filesystem degradation.
Instead of standard HTTP, we implemented a custom, lightweight TCP protocol designed for Large Object Transfers.
- Stateful Decoding: Our
DefaultDecoderpeeks at the wire using aSwitch-Casestrategy, allowing it to transition seamlessly between GOB-encoded metadata and raw binary streams without resetting connections. - Multiplexed Logic: Pause/Resume mechanics allow nodes to process control messages while data streams in the background.
A core requirement for production systems is that memory usage must not scale with file size.
- Streaming Pipeline: Uses Go's
io.Readerandio.Writerinterfaces throughout. Data moves through the crypto engine and out to the disk in small chunks (32KB buffers). - Constant Memory: Whether you store a 10MB photo or a 100GB 8K video, the node's memory footprint remains nearly constant.
Data integrity and privacy are baked into the transport layer.
- AES-256 CTR: Zero-padding, parallelizable encryption that ensures files are never stored or transmitted in plain text.
- Deterministic Integrity: SHA1 hashes serve as both the file address and a tamper-evident seal.
- IV Management: Every file operation uses a unique 16-byte cryptographically secure IV prepended to the data stream.
For in-depth analysis of the system internals, please refer to the following guides:
- 📖 System Architecture Documentation — Understanding the node lifecycle and CAS engine.
- 📡 Wire Protocol Specification — Decoding the custom TLV TCP framing.
- 🔐 Crypto & Security Deep Dive — Details on streaming AES-CTR and integrity checks.
-
Write Complexity:
$O(1)$ directory lookup via sharded CAS pathing. - Read Latency: Minimized through parallel discovery broadcast.
- Replication: Fully decentralized replication across all interconnected nodes.
git clone https://github.com/ibesuperv/Distributed-File-Storage-System
cd Distributed-File-Storage-System
go build -o bin/dfs# Start nodes and upload a sample file
make run ARGS="-u test_files/audio.mpeg"
# Download and verify integrity
make run ARGS="-d audio.mpeg"