Skip to content

ibesuperv/Distributed-File-Storage-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Distributed-File-Storage-System

Go Version License GitHub Star Architecture

Distributed-File-Storage-System is a high-performance, decentralized file storage infrastructure engineered in Go. It solves the "Large File" problem in peer-to-peer networks by combining Content-Addressable Storage (CAS), Custom TCP Transport, and Streaming Cryptography into a single, cohesive engine.

Designed for scalability and resilience, this system eliminates the need for central authority while providing industrial-grade data integrity and security—mirroring the architectural principles found in large-scale distributed systems like Cloud Object Storage or BitTorrent.


✨ Key Capabilities

  • Scalability: Designed to handle arbitrarily large files without memory spikes.
  • Resilience: A decentralized "Shared Nothing" architecture where any node can fail without data loss.
  • Observability: Consistent logging and clear separation of transport vs. storage concerns.
  • Reliability: Self-correcting stream logic with proper synchronization.

🏛️ System Architecture

Our design prioritizes Zero-Buffer I/O and Stateless Discovery. Unlike centralized storage, every node in this network acts as both an indexer and a provider.

graph TD
    subgraph "External API / CLI"
        A[User Request] -->|Upload/Download| B[FileServer Orchestrator]
    end

    subgraph "Internal Engine (Node)"
        B --> C{Orchestrator}
        C -->|Stream| D[Crypto Layer: AES-CTR]
        C -->|Hash & Path| E[CAS Store Engine]
        C -->|Broadcast| F[P2P Transport: TCP]
    end

    subgraph "Distributed Network"
        F <---|"Custom Binary Protocol" ---> G[Peer Cluster]
    end

    D -->|Encrypted Chunk| E
    E -->|O(1) Write| J[(Local Storage: Sharded Hierarchy)]
Loading

📂 Project Structure

.
├── bin/                # Compiled binaries
├── docs/               # Technical specifications & design docs
├── p2p/                # Peer-to-peer networking logic (TCP, Encoding)
├── recovered_files/    # Default directory for downloaded/recovered files
├── test_files/         # Sample files for cluster simulation
├── crypto.go           # Streaming AES-256 CTR implementation
├── main.go             # CLI entry point & cluster orchestrator
├── server.go           # FileServer core logic (Upload/Download coordination)
├── store.go            # Content-Addressable Storage (CAS) engine
├── Makefile            # Automation for building and running nodes
└── go.mod              # Go module definition

🛠️ Technical Implementation

1. Content-Addressable Storage (CAS)

Traditional storage uses file names. This system uses Mathematical Identity.

  • The Process: Files are hashed using SHA1. The hash is the pointer.
  • The Value:
    • Self-Verifying Integrity: The data address is a cryptographic proof of its content.
    • Global Deduplication: Identical data across the network occupies exactly one address.
  • Scalability: Uses a Sharded Directory Hierarchy (e.g., /af32d/12c3b/...) to prevent directory "hotspotting" and filesystem degradation.

2. High-Performance TCP Wire Protocol

Instead of standard HTTP, we implemented a custom, lightweight TCP protocol designed for Large Object Transfers.

  • Stateful Decoding: Our DefaultDecoder peeks at the wire using a Switch-Case strategy, allowing it to transition seamlessly between GOB-encoded metadata and raw binary streams without resetting connections.
  • Multiplexed Logic: Pause/Resume mechanics allow nodes to process control messages while data streams in the background.

3. Handling "Massive" Files ($O(1) Memory$)

A core requirement for production systems is that memory usage must not scale with file size.

  • Streaming Pipeline: Uses Go's io.Reader and io.Writer interfaces throughout. Data moves through the crypto engine and out to the disk in small chunks (32KB buffers).
  • Constant Memory: Whether you store a 10MB photo or a 100GB 8K video, the node's memory footprint remains nearly constant.

4. Streaming Security

Data integrity and privacy are baked into the transport layer.

  • AES-256 CTR: Zero-padding, parallelizable encryption that ensures files are never stored or transmitted in plain text.
  • Deterministic Integrity: SHA1 hashes serve as both the file address and a tamper-evident seal.
  • IV Management: Every file operation uses a unique 16-byte cryptographically secure IV prepended to the data stream.

📚 Technical Documentation

For in-depth analysis of the system internals, please refer to the following guides:


📈 Scalability & Performance Metrics

  • Write Complexity: $O(1)$ directory lookup via sharded CAS pathing.
  • Read Latency: Minimized through parallel discovery broadcast.
  • Replication: Fully decentralized replication across all interconnected nodes.

🚦 Getting Started

1. Installation

git clone https://github.com/ibesuperv/Distributed-File-Storage-System
cd Distributed-File-Storage-System
go build -o bin/dfs

2. Launch Local Cluster (Simulation)

# Start nodes and upload a sample file
make run ARGS="-u test_files/audio.mpeg"

# Download and verify integrity
make run ARGS="-d audio.mpeg"

About

A high-performance, distributed peer-to-peer file storage system built in Go. Featuring Content-Addressable Storage (CAS), AES-CTR streaming encryption, and a custom P2P transport layer.

Resources

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors