Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ If you would like to build a new component for Numerblox, please consider the fo
1. Place the new component in the appropriate section. Is it a Downloader (`download.py`), a Preprocessor (`preprocessing.py`) or a Submitting tool (`submission.py`)? Also check the documentation on that section for templates, conventions and how these blocks are constructed in general.
2. Add tests for this new component in the appropriate test file. If you are introducing a new Downloader, add tests in `tests/test_downloader.py`. If you are introducing a new Preprocessor, add tests in `tests/test_preprocessing.py`. etc.
3. When making a preprocessor or postprocessor, make sure the component follows [scikit-learn conventions](https://scikit-learn.org/stable/developers/develop.html#rolling-your-own-estimator). The core things to implement are inheriting from `BaseEstimator` and implementing `fit`, `transform` and `get_feature_names_out` methods.
4. If your component introduces new dependencies, make sure to add them to poetry with `poetry add <library>`.
4. If your component introduces new dependencies, make sure to add them to uv with `uv add <library>`.
5. Consider adding support for [metadata routing](https://scikit-learn.org/stable/metadata_routing.html) if your component uses additional arguments for `fit`, `transform` and/or `predict`. Check out the documentation and other Numerblox components that use this feature for examples. We are also happy to help out with implementation of metadata routing.


Expand Down
28 changes: 28 additions & 0 deletions docs/download.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,34 @@ kd = KaggleDownloader(directory_path="my_numerai_signals_folder")
kd.download_live_data("code1110/yfinance-stock-price-data-for-numerai-signals")
```

### Google Cloud Storage Integration

All NumerBlox downloaders inherit from `BaseIO`, which provides built-in support for Google Cloud Storage (GCS). This allows you to easily upload and download data to/from GCS buckets.

#### Prerequisites

Make sure you have Google Cloud Storage credentials configured. You'll need:
- The `google-cloud-storage` Python package installed
- Authentication set up (typically via `GOOGLE_APPLICATION_CREDENTIALS` environment variable or default credentials)

#### Usage

```py
from numerblox.download import NumeraiClassicDownloader

dl = NumeraiClassicDownloader(directory_path="my_numerai_data_folder")

# Download from GCS
dl.download_file_from_gcs(bucket_name="my-bucket", gcs_path="path/to/file.parquet")
dl.download_directory_from_gcs(bucket_name="my-bucket", gcs_path="path/to/directory")

# Upload to GCS
dl.upload_file_to_gcs(bucket_name="my-bucket", gcs_path="path/to/file.parquet", local_path="local_file.parquet")
dl.upload_directory_to_gcs(bucket_name="my-bucket", gcs_path="path/to/directory")
```

This functionality is available for all downloaders (NumeraiClassicDownloader, NumeraiSignalsDownloader, NumeraiCryptoDownloader, EODDownloader, and KaggleDownloader) since they all inherit from BaseIO.

### Rolling your own downloader

We invite users to build out their own downloaders for Numerai Signals. The only requirements are that you inherit from `numerblox.download.BaseDownloader` and implement the `download_training_data` and `download_live_data` methods. Below you will find a template for this.
Expand Down
6 changes: 3 additions & 3 deletions docs/preprocessing.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,12 +98,12 @@ diff_data = pipe.transform(X, ticker_series=tickers_series)

### PandasTaFeatureGenerator

`PandasTaFeatureGenerator` uses the `pandas-ta` library to generate technical analysis features. It's a powerful tool for those interested in financial time-series data.
`PandasTaFeatureGenerator` uses the `pandas-ta-classic` library to generate technical analysis features. It's a powerful tool for those interested in financial time-series data.

Make sure you have `pandas-ta` installed before using this feature generator:
Make sure you have `pandas-ta-classic` installed before using this feature generator:

```bash
!pip install pandas-ta
!pip install pandas-ta-classic
```

Currently `PandasTaFeatureGenerator` only works on `pd.DataFrame` input. Its input is a DataFrame with columns `[ticker, date, open, high, low, close, volume]`.
Expand Down
10 changes: 5 additions & 5 deletions docs/targets.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,14 @@ By integrating these processors into your workflow, you can enhance your modelin

## BayesianGMMTargetProcessor

The `BayesianGMMTargetProcessor`` generates synthetic targets based on a Bayesian Gaussian Mixture model. It's primarily used for creating fake targets, which are useful for experimenting and validating model structures without exposing true labels.
The `BayesianGMMTargetProcessor` generates synthetic targets based on a Bayesian Gaussian Mixture model. It's primarily used for creating fake targets, which are useful for experimenting and validating model structures without exposing true labels.

### Example:
```py
from numerblox.targets import BayesianGMMTargetProcessor
processor = BayesianGMMTargetProcessor(n_components=3)
processor.fit(X=train_features, y=train_targets, eras=train_eras)
fake_target = processor.transform(X=train_features, eras=train_eras)
processor.fit(X=train_features, y=train_targets, era_series=train_eras)
fake_target = processor.transform(X=train_features, era_series=train_eras)
```

For more detailed examples and use-cases, check out `examples/synthetic_data_generation.ipynb.`
Expand All @@ -35,7 +35,7 @@ The `SignalsTargetProcessor` is specifically designed to engineer targets for Nu

### Example:
```py
from numerblox.target_processing import SignalsTargetProcessor
from numerblox.targets import SignalsTargetProcessor
processor = SignalsTargetProcessor(price_col="close")
signals_target_data = processor.transform(dataf=data, eras=eras_column)
signals_target_data = processor.transform(dataf=data, era_series=eras_column)
```
6 changes: 3 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "numerblox"
version = "1.6.0"
version = "1.6.1"
description = "Solid Numerai Pipelines"
authors = [
{name = "CrowdCent", email = "support@crowdcent.com"},
Expand All @@ -10,10 +10,10 @@ readme = "README.md"
requires-python = ">=3.10,<4"
dependencies = [
"tqdm>=4.66.1",
"numpy>=1.26.3,<2.0.0",
"numpy>=1.26.3",
"scipy>=1.10.0",
"pandas>=2.1.1",
"pandas-ta==0.3.14b",
"pandas-ta-classic>=0.3.14b",
"joblib>=1.3.2",
"pyarrow>=14.0.1",
"numerapi>=2.19.1",
Expand Down
8 changes: 4 additions & 4 deletions src/numerblox/preprocessing/signals.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

import numpy as np
import pandas as pd
import pandas_ta as ta
import pandas_ta_classic as ta
from joblib import Parallel, delayed
from sklearn.preprocessing import QuantileTransformer
from sklearn.utils.validation import check_is_fitted
Expand Down Expand Up @@ -382,14 +382,14 @@ def get_feature_names_out(self, input_features=None) -> List[str]:

class PandasTaFeatureGenerator(BasePreProcessor):
"""
Generate features with pandas-ta.
https://github.com/twopirllc/pandas-ta
Generate features with pandas-ta-classic.
https://github.com/xgboosted/pandas-ta-classic
Usage in Pipeline works only with Pandas API.
Run `.set_output("pandas")` on your pipeline first.

:param strategy: Valid Pandas Ta strategy. \n
For more information on creating a strategy, see: \n
https://github.com/twopirllc/pandas-ta#pandas-ta-strategy \n
https://github.com/xgboosted/pandas-ta-classic#pandas-ta-strategy \n
By default, a strategy with RSI(14) and RSI(60) is used. \n
:param ticker_col: Column name for grouping by tickers. \n
:param num_cores: Number of cores to use for multiprocessing. \n
Expand Down
Loading