Forward paper-validation harness for Hyperliquid directional models#2
Forward paper-validation harness for Hyperliquid directional models#2varonbot wants to merge 1 commit into
Conversation
…dels Every backtest edge we tested (DRQN/QR-DQN, 5-min ML+microstructure, DMN +5yr transfer, meta-labeling, the 4h MA-cross rule, and the strongest — a TA-Lib-feature LightGBM) either overfit, decayed in the current regime, or failed to transfer to Hyperliquid. The TA-feature model backtested at SR ~1.08 across 2021-2026 / 528 symbols (survivorship-clean) but its recent-holdout IC is ~0 and it scores SR -1.12 zero-shot on Hyperliquid 2026 -> a decayed, non-transferring edge. Conclusion: no backtest here is trustworthy; only forward OOS counts. This adds an observe-only harness that runs every 4h on live Hyperliquid data, records each candidate model's vol-targeted positions (ta_ml / trend / flat / buyhold), and scores them against realized next-bar returns net of 4.32bps + funding, accumulating an honest forward record vs always-flat. It places no orders. Includes the frozen TA-feature model artifact, the reproduction pipeline under research/ (Binance fetch -> features -> train -> survivorship/cost/transfer verification), and an updated README documenting the full research findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b0395996e1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if r["bar_ts"] in logged: | ||
| continue |
There was a problem hiding this comment.
Key logged predictions by symbol and bar
In scheduled runs where any coin fails to fetch or lags while other coins for the same bar_ts were already logged, this timestamp-only dedupe skips the recovered coin forever on the next run. The forward report then has missing symbols for that bar, and if that symbol appears again at a later bar its previous return spans multiple 4h bars while still being scored as one bar. The log idempotency should track (symbol, bar_ts) rather than bar_ts alone.
Useful? React with 👍 / 👎.
| OUTDIR = "/home/varon/.openclaw/agents/varon/workspace/github/varon-fi/poc/models" | ||
| import os; os.makedirs(OUTDIR, exist_ok=True) |
There was a problem hiding this comment.
Save retrained artifacts into the repo model path
On any machine other than the original developer's, research/train_model.py writes the retrained LightGBM artifacts under this hard-coded personal absolute path instead of the repository's model/ directory that forward_validate.py loads. That means following the documented “re-train offline” flow can either fail due to the missing home path/permissions or silently produce a refreshed model that the harness never uses, leaving model/ta_lgbm.txt stale.
Useful? React with 👍 / 👎.
| df = pd.DataFrame(rows) | ||
| df["bar_ts"] = pd.to_datetime(df["bar_ts"], utc=True) | ||
| df = df.sort_values(["symbol", "bar_ts"]) | ||
| df["ret"] = df.groupby("symbol")["close"].pct_change().shift(-1) # realized next-bar return |
There was a problem hiding this comment.
Detect gaps before treating returns as one bar
If the 4h cron is delayed or skips a run, the next logged close may be 8h+ after the prior logged close, but this line still scores that multi-bar move as a single “next-bar” return. The report then undercharges funding/turnover time and annualizes the observation as one 4h bar, so a short scheduler outage can materially distort the forward record that is supposed to be an honest OOS benchmark. Consider checking that consecutive bar_ts values are exactly one interval apart before scoring, or account for the elapsed bars.
Useful? React with 👍 / 👎.
| return df, FCOLS | ||
| df, FCOLS = build(path) | ||
| raw = pd.read_csv(path); raw["ts"] = pd.to_datetime(raw["ts"], utc=True) | ||
| raw["vol"] = raw.groupby("symbol")["close"].pct_change().transform(lambda s: s.ewm(span=60, min_periods=20).std()) |
There was a problem hiding this comment.
Reset volatility EWMA per symbol
Because pct_change() returns a plain Series, the following .transform(...) is no longer grouped by symbol; the EWMA volatility carries from one coin into the next, and after the first symbol min_periods has already been satisfied so early rows for each new coin get a previous coin's volatility. Since this script's reported Sharpe/turnover relies on vol-targeting, the deployability checks can be materially distorted unless the EWMA is computed inside each symbol group.
Useful? React with 👍 / 👎.
What & why
The conclusion of an extensive directional-ML research program. Every backtest edge we found either overfit, decayed in the current regime, or failed to transfer to Hyperliquid — so the only trustworthy test is forward out-of-sample. This adds an observe-only harness that measures candidate models' forward performance on live Hyperliquid perps.
forward_validate.pyruns every 4h: pulls live HL 4h candles (complete bars only), forms vol-targeted positions for four candidates (ta_ml,trend,flat,buyhold), logs them, and scores prior logs against realized next-bar returns net of 4.32 bps taker + 20% APR funding — accumulating an honest record vs always-flat. It places no orders.Research findings (in README)
The TA-feature model (RSI/ADX/MACD/NATR… on 528 Binance perps, 6.7y) was a real historical edge that decayed to ~0 in 2025–26 on both venues — consistent with directional crypto-alpha being competed away.
Contents
forward_validate.py— the harness (self-contained, idempotent, observe-only)model/ta_lgbm.txt(+meta) — frozen TA-feature model (the strongest candidate)research/— full reproduction pipeline (Binance fetch → TA features → train → survivorship/cost/transfer verification)forward_results/— runtime log/report (gitignored; accumulates where scheduled)Honest status
This is not a profitable strategy — it is the apparatus to determine, forward and un-foolably, whether one exists. Run it every 4h (cron / platform scheduler); after weeks of accumulation, the question is simply whether
ta_ml/trendbeatflat/buyholdforward, net of cost. If yes → promote toward the platform shadow→paper path; if no → directional ML on current data is honestly ruled out.🤖 Generated with Claude Code