Skip to content

Forward paper-validation harness for Hyperliquid directional models#2

Open
varonbot wants to merge 1 commit into
mainfrom
feat/forward-validation-harness
Open

Forward paper-validation harness for Hyperliquid directional models#2
varonbot wants to merge 1 commit into
mainfrom
feat/forward-validation-harness

Conversation

@varonbot

@varonbot varonbot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

What & why

The conclusion of an extensive directional-ML research program. Every backtest edge we found either overfit, decayed in the current regime, or failed to transfer to Hyperliquid — so the only trustworthy test is forward out-of-sample. This adds an observe-only harness that measures candidate models' forward performance on live Hyperliquid perps.

forward_validate.py runs every 4h: pulls live HL 4h candles (complete bars only), forms vol-targeted positions for four candidates (ta_ml, trend, flat, buyhold), logs them, and scores prior logs against realized next-bar returns net of 4.32 bps taker + 20% APR funding — accumulating an honest record vs always-flat. It places no orders.

Research findings (in README)

Approach Backtest What killed it
DRQN / QR-DQN overfit underperformed a 2-param rule
5-min ML + microstructure IC≈0 no edge after taker cost
DMN (Sharpe-loss NN) + 5yr transfer val Sharpe<0 overfit; didn't generalize on Binance
Meta-labeling NN BCE=random can't predict trade success
4h MA-cross rule SR~2 on HL 2026 lucky window — SR 0.34 over 6.7y, loses to buy-hold
TA-feature LightGBM (best) SR 1.08, every regime 2021–26, survivorship-clean recent-holdout IC≈0; SR −1.12 zero-shot on HL 2026 — decayed + doesn't transfer

The TA-feature model (RSI/ADX/MACD/NATR… on 528 Binance perps, 6.7y) was a real historical edge that decayed to ~0 in 2025–26 on both venues — consistent with directional crypto-alpha being competed away.

Contents

  • forward_validate.py — the harness (self-contained, idempotent, observe-only)
  • model/ta_lgbm.txt (+meta) — frozen TA-feature model (the strongest candidate)
  • research/ — full reproduction pipeline (Binance fetch → TA features → train → survivorship/cost/transfer verification)
  • forward_results/ — runtime log/report (gitignored; accumulates where scheduled)

Honest status

This is not a profitable strategy — it is the apparatus to determine, forward and un-foolably, whether one exists. Run it every 4h (cron / platform scheduler); after weeks of accumulation, the question is simply whether ta_ml/trend beat flat/buyhold forward, net of cost. If yes → promote toward the platform shadow→paper path; if no → directional ML on current data is honestly ruled out.

🤖 Generated with Claude Code

…dels

Every backtest edge we tested (DRQN/QR-DQN, 5-min ML+microstructure, DMN +5yr
transfer, meta-labeling, the 4h MA-cross rule, and the strongest — a TA-Lib-feature
LightGBM) either overfit, decayed in the current regime, or failed to transfer to
Hyperliquid. The TA-feature model backtested at SR ~1.08 across 2021-2026 / 528
symbols (survivorship-clean) but its recent-holdout IC is ~0 and it scores SR -1.12
zero-shot on Hyperliquid 2026 -> a decayed, non-transferring edge.

Conclusion: no backtest here is trustworthy; only forward OOS counts. This adds an
observe-only harness that runs every 4h on live Hyperliquid data, records each
candidate model's vol-targeted positions (ta_ml / trend / flat / buyhold), and scores
them against realized next-bar returns net of 4.32bps + funding, accumulating an
honest forward record vs always-flat. It places no orders.

Includes the frozen TA-feature model artifact, the reproduction pipeline under
research/ (Binance fetch -> features -> train -> survivorship/cost/transfer
verification), and an updated README documenting the full research findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b0395996e1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread forward_validate.py
Comment on lines +111 to +112
if r["bar_ts"] in logged:
continue

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Key logged predictions by symbol and bar

In scheduled runs where any coin fails to fetch or lags while other coins for the same bar_ts were already logged, this timestamp-only dedupe skips the recovered coin forever on the next run. The forward report then has missing symbols for that bar, and if that symbol appears again at a later bar its previous return spans multiple 4h bars while still being scored as one bar. The log idempotency should track (symbol, bar_ts) rather than bar_ts alone.

Useful? React with 👍 / 👎.

Comment thread research/train_model.py
Comment on lines +13 to +14
OUTDIR = "/home/varon/.openclaw/agents/varon/workspace/github/varon-fi/poc/models"
import os; os.makedirs(OUTDIR, exist_ok=True)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Save retrained artifacts into the repo model path

On any machine other than the original developer's, research/train_model.py writes the retrained LightGBM artifacts under this hard-coded personal absolute path instead of the repository's model/ directory that forward_validate.py loads. That means following the documented “re-train offline” flow can either fail due to the missing home path/permissions or silently produce a refreshed model that the harness never uses, leaving model/ta_lgbm.txt stale.

Useful? React with 👍 / 👎.

Comment thread forward_validate.py
df = pd.DataFrame(rows)
df["bar_ts"] = pd.to_datetime(df["bar_ts"], utc=True)
df = df.sort_values(["symbol", "bar_ts"])
df["ret"] = df.groupby("symbol")["close"].pct_change().shift(-1) # realized next-bar return

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Detect gaps before treating returns as one bar

If the 4h cron is delayed or skips a run, the next logged close may be 8h+ after the prior logged close, but this line still scores that multi-bar move as a single “next-bar” return. The report then undercharges funding/turnover time and annualizes the observation as one 4h bar, so a short scheduler outage can materially distort the forward record that is supposed to be an honest OOS benchmark. Consider checking that consecutive bar_ts values are exactly one interval apart before scoring, or account for the elapsed bars.

Useful? React with 👍 / 👎.

Comment thread research/verify_ta2.py
return df, FCOLS
df, FCOLS = build(path)
raw = pd.read_csv(path); raw["ts"] = pd.to_datetime(raw["ts"], utc=True)
raw["vol"] = raw.groupby("symbol")["close"].pct_change().transform(lambda s: s.ewm(span=60, min_periods=20).std())

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reset volatility EWMA per symbol

Because pct_change() returns a plain Series, the following .transform(...) is no longer grouped by symbol; the EWMA volatility carries from one coin into the next, and after the first symbol min_periods has already been satisfied so early rows for each new coin get a previous coin's volatility. Since this script's reported Sharpe/turnover relies on vol-targeting, the deployability checks can be materially distorted unless the EWMA is computed inside each symbol group.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant