Skip to content

FaizarM/bitcoin-forecasting

Repository files navigation

Bitcoin Hourly Price Forecasting — Multi-Horizon Seq2Seq LSTM

End-to-end multi-horizon time series forecasting model that predicts Bitcoin hourly closing prices 24 hours ahead, built with TensorFlow from low-level primitives including custom Multi-Head Attention, custom training loop with tf.GradientTape, and Seq2Seq LSTM architecture with teacher forcing.

Completed as the final project of Dicoding's Advanced Deep Learning Project Development certification (April 2026).


Problem Statement

Cryptocurrency price prediction is a challenging multivariate time series problem characterized by high volatility, non-stationary patterns, and complex dependencies between technical indicators. This project tackles multi-horizon forecasting — predicting 24 consecutive future values — which is harder than single-step prediction because errors compound over time.

The goal: build a Seq2Seq model that outperforms a standard LSTM+Attention baseline on this task, while demonstrating deep understanding of neural network internals by implementing key components from scratch.


Dataset

  • Size: 53,150 hourly records
  • Features (6):
    • Close — target variable (Bitcoin closing price)
    • Volume USDT — trading volume in USDT
    • RSI — Relative Strength Index
    • MACD_Hist — MACD histogram
    • ATR — Average True Range (volatility)
    • KAMAO — Kaufman's Adaptive Moving Average
  • Target: predict next 24 hours of Close values

Architecture

Baseline: LSTM + Multi-Head Attention

Input (window_size, 6) 
  → LSTM(128, return_sequences=True) 
  → CustomDropout(0.2) 
  → CustomMultiHeadAttention(d_model=128, heads=4) 
  → CustomLayerNorm 
  → LSTM(64) 
  → CustomDropout(0.2) 
  → CustomDense(64, relu) 
  → CustomDense(24)  [24-hour forecast]

Proposed: Seq2Seq LSTM (Encoder-Decoder with Teacher Forcing)

ENCODER:
  Input → LSTM(128) → CustomMultiHeadAttention → CustomLayerNorm → CustomDropout
         └─ output encoder states + context

DECODER (autoregressive):
  For each of 24 timesteps:
    LSTMCell(input + prev_output) 
      → CustomMultiHeadAttention (cross-attention with encoder)
      → CustomLayerNorm 
      → CustomDense(1)  [predict next hour]

Trained with teacher forcing during training (ground truth as next input) and autoregressive inference at test time (model output as next input).


Custom Components (Built From Scratch)

To deepen architectural understanding beyond high-level Keras abstractions, the following were implemented using TensorFlow low-level API:

Component Purpose
CustomDense Linear transformation with manual weight/bias initialization
CustomMultiHeadAttention Scaled dot-product attention with multi-head parallelism
CustomDropout Stochastic regularization with training/inference modes
CustomLayerNorm Per-feature normalization with learnable scale & shift
custom_mae_loss Horizon-weighted MAE (later timesteps weighted higher)
CustomEarlyStopping Training halt when val loss stops improving
CustomReduceLROnPlateau Adaptive learning rate reduction
Custom Training Loop Manual tf.GradientTape forward/backward pass instead of model.fit()

Methodology Highlights

  • Split-before-normalize: train/val/test split performed before MinMaxScaler fitting to prevent data leakage — scaler learns only from training distribution.
  • ACF/PACF analysis: autocorrelation and partial autocorrelation plots used to determine optimal window size (input sequence length) empirically rather than arbitrarily.
  • Time series decomposition: STL decomposition applied to identify trend and seasonality components in the Close price.
  • tf.data pipeline: production-style tf.data.Dataset with windowing, batching, and prefetching instead of raw numpy arrays.
  • Feature engineering: rolling statistics and selected technical indicators feeding into the model.

Results

Both models were evaluated on the held-out test set with MAE in scaled space (MinMax [0,1]):

Model Test MAE (scaled) Relative Improvement
LSTM + Multi-Head Attention (baseline) 0.0139
Seq2Seq LSTM (proposed) 0.0052 -62%

Target: Test MAE < 0.015 Achieved.

The Seq2Seq model produces forecasts much closer to actual prices across the 24-hour horizon, thanks to the encoder-decoder's ability to maintain context across long sequences and the teacher forcing strategy stabilizing training.

See inference plots in the notebook for visual comparison of predicted vs actual.


Repository Structure

.
├── Muhammad_Fariz_Abizar_Submission_Akhir_DLTM.ipynb  # Full pipeline notebook
├── model_baseline_LSTM.keras                           # Trained baseline model
├── model_seq2seq_LSTM.keras                            # Trained Seq2Seq model  
├── best_model_seq2seq_LSTM.keras                       # Best Seq2Seq (by val loss)
├── requirements.txt                                    # Python dependencies
└── README.md

How to Reproduce

1. Clone the repository

git clone https://github.com/FaizarM/bitcoin-forecasting.git
cd bitcoin-forecasting

2. Create a virtual environment (recommended)

python -m venv venv
source venv/bin/activate   # On Windows: venv\Scripts\activate

3. Install dependencies

pip install -r requirements.txt

4. Run the notebook

jupyter notebook Muhammad_Fariz_Abizar_Submission_Akhir_DLTM.ipynb

The dataset is loaded directly from a public Google Drive link inside the notebook — no manual download required.


Tech Stack

  • Deep Learning: TensorFlow 2.19, Keras
  • Data Manipulation: NumPy, pandas
  • Visualization: Matplotlib, Seaborn
  • Statistics: statsmodels (ACF/PACF, decomposition)
  • Preprocessing: scikit-learn (MinMaxScaler)
  • Environment: Python 3.10+, Jupyter Notebook

Key Learnings

  • Implementing Multi-Head Attention from scratch made the paper "Attention is All You Need" tangible — understanding Q/K/V projections and the scaled dot-product beyond library abstractions.
  • Custom training loops with tf.GradientTape expose what model.fit() does internally, giving fine-grained control over gradients, metrics, and callbacks.
  • Teacher forcing significantly stabilizes seq2seq training but requires careful inference-time switching to autoregressive mode.
  • Horizon-weighted loss helps the model care more about later (harder) timesteps in multi-horizon forecasting.

Author

Muhammad Fariz Abizar
Data Science undergraduate @ BINUS University Online Learning
Associate Data Scientist (BNSP Certified)


If you find this project useful or learned something from it, consider giving it a ⭐ — it helps and motivates future work!

About

Multi-horizon Bitcoin price forecasting using Seq2Seq LSTM with custom Multi-Head Attention and training loop built from scratch using TensorFlow low-level API.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors