Please star this repository if you find it useful! It means alot to me.
Email marlon.dominguez307@gmail.com for any implementation requests, contribution requests, or bug requests.
A high-performance, parallelized market data ingestion and normalization engine designed to download, parse, organize, and resample raw Dukascopy forex tick data into a clean ML-ready format.
This project focuses on data normalization infrastructure, not trading logic.
- Parallelized hourly tick downloads
- Automatic retry queue with exponential backoff
- Corrupted/empty response detection
- Structured parquet-based storage
- BI5 → Parquet normalization pipeline
- Multithreaded parsing and resampling
- ML-ready dataframe generation
- CLI and code-driven execution
- Hierarchical dataset organization
- Timeframe aggregation using Pandas resampling
- Detailed logging and ingestion diagnostics
The system is designed around a clean separation of concerns:
- Downloader → fetch raw
.bi5tick data - Storage Layer → organize files by symbol/date/hour
- Parser → decode bid/ask/mid and normalie
- Resampler → Produce dataframe with requested timeframe
Completely ready for feature Extraction / ML!
Fetches raw .bi5 tick data directly from Dukascopy servers.
Decodes compressed binary tick data into normalized parquet datasets.
Organizes data hierarchically by:
symbol/date/hourAggregates tick data into configurable timeframes such as:
1min
5min
15min
1h
4h
1d
1wAdditional rarer timeframes supported:
1s
5s
15s
30sraw_data/
EURUSD/
2024-01-02/
EURUSD_20240102_00h.bi5
EURUSD_20240102_01h.bi5parsed_data/
EURUSD/
2024-01-02/
EURUSD_20240102_00h.parquet
EURUSD_20240102_01h.parquetresampled_data/
EURUSD/
1min/
20240102.parquet
5min/
20240102.parquet
1h/
20240102.parquetBy default, the engine performs download and parse operations automatically. In other words, if a specific operation is not specified then the downloader and parser are both performed.
python dukascopy_data_engine.py --symbol EURUSD --start-date 2024-01-02python dukascopy_data_engine.py --symbol EURUSD --start-date 2024-01-01 --end-date 2024-01-10python dukascopy_data_engine.py
--symbol EURUSD
--start-date 2024-01-02
--raw-data-dir custom_raw_data
--parsed-data-dir custom_parsed_datapython dukascopy_data_engine.py --operation resample --symbol EURUSD --parsed-data-dir parsed_data --timeframe 1minThe engine can also be used programmatically.
from dukascopy_data_downloader import begin_downloader_process
from dukascopy_bi5_data_parser import begin_parser_process
import resamplerbegin_downloader_process(
symbol,
start_date,
end_date=None,
location="raw_data"
)from dukascopy_data_downloader import begin_downloader_process
begin_downloader_process(
symbol="EURUSD",
start_date="2024-01-02",
end_date=None,
location="raw_data"
)from dukascopy_bi5_data_parser import begin_parser_process
begin_parser_process(
"raw_data",
"parsed_data"
)import resampler
results = resampler.invoke_resampler(
parquet_dir="parsed_data",
symbol="EURUSD",
timeframe="1d"
)The resampler returns:
dict[date] -> pandas.DataFramewhere each dataframe contains normalized OHLCV-style bars.
timestamp
open
high
low
close
bid_volume
ask_volumeThis project is licensed under the Apache 2.0 License.
