refactor(spec-specs,test-types): replace pycryptodome with hashlib for keccak#2370
refactor(spec-specs,test-types): replace pycryptodome with hashlib for keccak#2370kevaundray wants to merge 5 commits into
pycryptodome with hashlib for keccak#2370Conversation
060074f to
816ec24
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## forks/amsterdam #2370 +/- ##
===================================================
+ Coverage 88.62% 89.97% +1.34%
===================================================
Files 577 539 -38
Lines 35659 32632 -3027
Branches 3490 3032 -458
===================================================
- Hits 31604 29361 -2243
+ Misses 3492 2712 -780
+ Partials 563 559 -4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
pycryptodome with hashlib
|
Seems like this is only compatible with python 3.12+. Tagging @danceratopz for follow up. |
There was a problem hiding this comment.
Thanks @kevaundray! More speed-up always good and as hashlib is in the standard library, this seems like a worthwhile change.
I'm assuming we need to keep pycrptodome as a dep for other crypto functions? And that's why you didn't update pyproject.toml?
Happy to limit the change to replacing the keccak implementation though.
From @marioevz
Seems like this is only compatible with python 3.12+. Tagging @danceratopz for follow up.
I think it depends on the version of OPENSSL that your Python is linked against, or in the case of a uv-managed Python, bundled with (the [python-build-standalone] project, now also maintained by astral, (https://github.com/astral-sh/python-build-standalone) statically links these deps).
I did a quick test of uv-managed Pythons: Initially, this failed locally for me with 3.11, 3.12. Then I upgraded my uv-managed 3.11 and 3.12 versions and it worked across all of them.
E.g,
uv python upgrade 3.11
Installed Python 3.11.14 in 4.11s
+ cpython-3.11.14-linux-x86_64-gnu
uv run --python 3.11 python -c 'import ssl, sys, hashlib; print(sys.version, ssl.OPENSSL_VERSION); h=hashlib.new("keccak-256", b"hashme"); print(h.hexdigest())'
3.11.14 (main, Feb 12 2026, 00:42:50) [Clang 21.1.4 ] OpenSSL 3.5.5 27 Jan 2026
7f98885dc9cf152c0bb08eaf056668f99c47cabd8fe01b1276f9a305b1389646
Happy to keep it as is, if it may give compilation errors :) (I'm not that familiar with uv-managed)
Ah I was only focussed on keccak since that was where I was measuring (Title is misleading in that case) |
pycryptodome with hashlibpycryptodome with hashlib for keccak
There was a problem hiding this comment.
Is it possible for keccak-256 or keccak-512 to be disabled in OpenSSL? If so, should we do something like:
if "keccak-256" in hashlib.algorithms_available:
...
else:
...There was a problem hiding this comment.
I guess I would lean towards not adding hashlib since it violates the "it just works" and KISS motto
There was a problem hiding this comment.
Or atleast to my eyes, this part from Dan's message:
depends on the version of OPENSSL that your Python is linked against
Is a quirk that I think is not worth it, given we are in the python ecosystem
There was a problem hiding this comment.
I think optimistically using OpenSSL would be fine, especially if we're keeping pycrptodome around for other reasons.
There was a problem hiding this comment.
I agree, it's not much boilerplate code. But I wanted to see whether it was worth it, by checking how much of the performance gain would propagate to fill. TLDR: ~6.5% speed-up. Not huge but still worth it imo. More details in the updated PR description.
There was a problem hiding this comment.
Unfortunately, hashlib.algorithms_available doesn't return "keccak-256" even when it is available, so I went for a try-except.
|
This PR has had no recent activity and has been marked as stale. |
Add tests that verify the active backend matches published Keccak-256
and Keccak-512 vectors, both backends produce byte-identical output, and
the pycryptodome fallback engages cleanly when `hashlib.new("keccak-256")`
is patched to raise. Also covers EEST `Bytes.keccak256` and
`trie.keccak256` routing.
Without the fallback dispatch these tests fail (or, on Pythons whose
hashlib lacks Keccak entirely, the test session aborts at collection),
so the suite drives the fix that follows.
OpenSSL only gained default-provider Keccak in 3.2.0 (Nov 2023). LTS
3.0.x and 3.1.x, still shipped by Debian 12 and RHEL 9, do not provide
it, so `hashlib.new("keccak-256", ...)` raises `ValueError` there and
every call to `keccak256` crashes.
At module import time, probe with `hashlib.new("keccak-256", b"")`. On
`ValueError`, rebind the digest helpers to pycryptodome's bundled
implementation. The probe is more reliable than checking
`hashlib.algorithms_available`, which omits some OpenSSL-provider
digests on python-build-standalone builds.
Route the EEST keccak helpers in `base_types.Bytes.keccak256` and
`test_types.trie.keccak256` through `ethereum.crypto.hash`, so the
dispatch lives in one place. Widen `keccak256` and `keccak512` parameter
types from `Bytes | bytearray` to `bytes | bytearray` so the EEST `Bytes`
subclass passes through without coercion.
816ec24 to
ad1440b
Compare
danceratopz
left a comment
There was a problem hiding this comment.
This looks good to me and think it's worth including. As we only use uv-managed Pythons in CI, this speed-up (~7%; see updated description for details) will be visible across all CI jobs that fill. I don't have a strong opinion though. I would just like to get resolution on this PR 🙂
@SamWilsn, could you please re-review the recent changes which added the fallback to pycryptodome you requested?
There was a problem hiding this comment.
Unfortunately, hashlib.algorithms_available doesn't return "keccak-256" even when it is available, so I went for a try-except.
pycryptodome with hashlib for keccakpycryptodome with hashlib for keccak
The previous `try/except` block defined `_keccak256_digest` and
`_keccak512_digest` inside its branches. `docc`'s static analyzer does
not traverse `try/except` at module scope when discovering module-level
names, so it could not resolve the references from `keccak256` and
`keccak512`, breaking the `Build Documentation` job with:
docc.plugins.references.ReferenceError: in `src/ethereum/crypto/hash.py`,
undefined identifier: `ethereum.crypto.hash._keccak256_digest`
Define the helpers unconditionally at module scope and branch on a
module-level `_USE_HASHLIB` flag inside their bodies. The probe runs
once at import; the runtime branch adds about 20ns per call, dwarfed by
the hash itself. `pycryptodome` is now imported unconditionally, which
is free since it is already a hard dep via `elliptic_curve.py`.
Update the dispatch tests to assert on `_USE_HASHLIB` instead of the
presence of `_pycryptodome_keccak` (now always imported).
| sys.modules.pop("ethereum.crypto.hash", None) | ||
| return importlib.import_module("ethereum.crypto.hash") |
There was a problem hiding this comment.
| sys.modules.pop("ethereum.crypto.hash", None) | |
| return importlib.import_module("ethereum.crypto.hash") | |
| mod = importlib.import_module("ethereum.crypto.hash") | |
| return importlib.reload(mod) |
I think this would make me more comfortable?
| def _keccak256_digest(buffer: bytes | bytearray) -> bytes: | ||
| if _USE_HASHLIB: | ||
| return hashlib.new("keccak-256", buffer).digest() | ||
| return _pycryptodome_keccak.new(digest_bits=256).update(buffer).digest() | ||
|
|
||
|
|
||
| def _keccak512_digest(buffer: bytes | bytearray) -> bytes: | ||
| if _USE_HASHLIB: | ||
| return hashlib.new("keccak-512", buffer).digest() | ||
| return _pycryptodome_keccak.new(digest_bits=512).update(buffer).digest() |
There was a problem hiding this comment.
Should we invert these? For example:
if _USE_HASHLIB:
def _keccak256_digest(buffer: bytes | bytearray) -> bytes:
return hashlib.new("keccak-256", buffer).digest()
else:
def _keccak256_digest(buffer: bytes | bytearray) -> bytes:
return _pycryptodome_keccak.new(digest_bits=256).update(buffer).digest()That way there's no conditional on every function call.
🗒️ Description
Both expose keccak, however I found hashlib to be around 60% faster on my laptop
Extracted from #2368
Update (2026-05-11)
Benchmark of
fill tests/osaka, hashlib vs pycryptodomeUsing ad1440b.
Command (per row, only
--pythondiffers):Host: Ubuntu (kernel
6.11.0-1017-oem), x86_64.n=10runs per configuration.hashlibpycryptodomefallbackDelta: 4.81 s (≈ 6.5 %) faster on the hashlib path. Welch t ≈ 1.63 (df ≈ 18), one-tailed p ≈ 0.06 — directionally consistent with prior runs, just shy of p < 0.05 two-tailed at this sample size.
The Ubuntu-system interpreter is linked against OpenSSL 3.0.13, which predates the OpenSSL 3.2.0 default-provider Keccak addition (Nov 2023), so
hashlib.new("keccak-256", ...)raisesValueErrorand the module dispatches topycryptodomeinstead. The uv-managedcpython-3.12.12frompython-build-standalonebundles OpenSSL 3.5.5, which exposes Keccak via the default provider, so the fasthashlibpath is taken.Targeted Benchmarking
🔗 Related Issues or PRs
N/A.
✅ Checklist
toxchecks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:uvx tox -e statictype(scope):.mkdocs servelocally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.@ported_frommarker.Cute Animal Picture