sandbox: slim image from 1.65GB to ~570MB#230
Open
samcm wants to merge 5 commits into
Open
Conversation
Trim the sandbox Python deps to a curated core and stop persisting build-essential in the final image. - requirements.in: drop unused heavy libs (polars, scikit-learn, statsmodels, networkx, dask, fastparquet) and redundant viz stacks (altair, vl-convert-python, bokeh, plotnine, pygwalker, kaleido). Nothing in product code, docs, or LLM-facing content references them; the only advertised dataframe type is pandas. matplotlib/seaborn cover static chart export (the visualization eval path); plotly is kept for interactive HTML. kaleido is dropped because plotly static export already required a Chrome install that was never present. - Dockerfile: install build-essential only to compile any sdist-only deps and purge it in the same layer so it never lands in the image. site-packages 1090MB -> 551MB; build tools removed from the final image.
Contributor
🐼 Smoke eval —
|
| question | result | tokens | tools |
|---|---|---|---|
forky_node_coverage |
✅ | 13,334 | 4 |
tracoor_node_coverage |
✅ | 12,927 | 3 |
mainnet_block_arrival_p50 |
✅ | 15,541 | 8 |
list_datasources |
✅ | 12,061 | 2 |
block_count_24h |
✅ | 14,676 | 10 |
missed_slots_24h |
✅ | 14,510 | 6 |
🔭 Langfuse traces (6 runs; ⚠️ = failed)
The report walks this branch's commits against the master baseline and the most recent release. A self-contained copy is in the run's eval-smoke-* artifact.
Move uv + build-essential into a builder stage and copy only the resolved site-packages into the final image. Neither uv (~50MB) nor the C toolchain ships anymore, and splitting the toolchain install out of the dependency install keeps each layer cached independently: editing requirements.txt no longer re-runs the build-essential apt install. Final image: 749MB -> 701MB (1.65GB baseline). Session-mode shell helpers (sh/sleep/find/mkdir/chmod/base64/rm) and the full python stack verified present.
pyarrow bundles the entire Arrow C++ stack (~140MB: libarrow, Flight, Substrait, Acero, compute, parquet) but nothing in product code uses Arrow format or parquet beyond a MIME-type mapping. clickhouse-connect runs fine on its native format without it. fastparquet (~9MB + cramjam) gives pandas the same read_parquet/to_parquet capability for sandbox-authored code, and pandas auto-selects it when pyarrow is absent. Final image: 701MB -> 574MB (1.65GB baseline).
…oolchain) Address review feedback on the multi-stage build: - Install deps into a staging prefix and COPY the whole /install onto /usr/local, so the final image keeps console scripts and wheel data files, not just site-packages. - Add --only-binary=:all: so a missing wheel fails the build loudly instead of silently compiling an sdist against builder-only libraries. - Since every dependency now resolves to a prebuilt wheel and the ethpandaops package is pure Python, drop build-essential entirely from the builder (smaller, faster, deterministic). Final image: 568MB (1.65GB baseline). Verified: shell helpers for session mode, full python stack, parquet via fastparquet, and chart export all work; uv and the C toolchain are absent.
The fastparquet floor was carelessly set to 2024.11.0; bump it to 2026.3.0 (resolves to 2026.5.0, released 2026-05-15). Recompile the lock with --exclude-newer=2026-06-03 so every resolved package is recent but at least two weeks old rather than a brand-new release. No other pins changed - they were already settled.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Trims the sandbox image from 1.65GB to ~570MB.
Note: plotly static export (kaleido) was already broken in the shipped image because it needs a Chrome install that was never present, so dropping kaleido removes dead weight rather than a working capability.