Skip to content

refactor(spec-specs,test-types): replace pycryptodome with hashlib for keccak#2370

Open
kevaundray wants to merge 5 commits into
ethereum:forks/amsterdamfrom
kevaundray:kw/use-hashlib
Open

refactor(spec-specs,test-types): replace pycryptodome with hashlib for keccak#2370
kevaundray wants to merge 5 commits into
ethereum:forks/amsterdamfrom
kevaundray:kw/use-hashlib

Conversation

@kevaundray
Copy link
Copy Markdown
Contributor

@kevaundray kevaundray commented Mar 1, 2026

🗒️ Description

Both expose keccak, however I found hashlib to be around 60% faster on my laptop

Extracted from #2368

Update (2026-05-11)

Benchmark of fill tests/osaka, hashlib vs pycryptodome

Using ad1440b.

Command (per row, only --python differs):

taskset -c 0,2,4,5,6,7 uv run --python <interp> \
    fill tests/osaka --output=<dir> --clean -x --no-html --skip-index -n=6

Host: Ubuntu (kernel 6.11.0-1017-oem), x86_64. n=10 runs per configuration.

Python Source OpenSSL (linked) Keccak backend Mean ± σ (s) Range (s)
CPython 3.12.12 uv-managed 3.5.5 (27 Jan 2026) hashlib 69.27 ± 6.74 58.04 – 76.62
CPython 3.12.3 Ubuntu system 3.0.13 (30 Jan 2024) pycryptodome fallback 74.08 ± 6.47 65.95 – 81.54

Delta: 4.81 s (≈ 6.5 %) faster on the hashlib path. Welch t ≈ 1.63 (df ≈ 18), one-tailed p ≈ 0.06 — directionally consistent with prior runs, just shy of p < 0.05 two-tailed at this sample size.

The Ubuntu-system interpreter is linked against OpenSSL 3.0.13, which predates the OpenSSL 3.2.0 default-provider Keccak addition (Nov 2023), so hashlib.new("keccak-256", ...) raises ValueError and the module dispatches to pycryptodome instead. The uv-managed cpython-3.12.12 from python-build-standalone bundles OpenSSL 3.5.5, which exposes Keccak via the default provider, so the fast hashlib path is taken.

image

Targeted Benchmarking

🔗 Related Issues or PRs

N/A.

✅ Checklist

  • All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    uvx tox -e static
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).
  • Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
  • Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
  • Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

Cute Animal Picture

Put a link to a cute animal picture inside the parenthesis-->

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 1, 2026

Codecov Report

❌ Patch coverage is 9.09091% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.97%. Comparing base (462ee80) to head (0a184a8).
⚠️ Report is 7 commits behind head on forks/amsterdam.

Files with missing lines Patch % Lines
src/ethereum/crypto/hash.py 9.09% 19 Missing and 1 partial ⚠️
Additional details and impacted files
@@                 Coverage Diff                 @@
##           forks/amsterdam    #2370      +/-   ##
===================================================
+ Coverage            88.62%   89.97%   +1.34%     
===================================================
  Files                  577      539      -38     
  Lines                35659    32632    -3027     
  Branches              3490     3032     -458     
===================================================
- Hits                 31604    29361    -2243     
+ Misses                3492     2712     -780     
+ Partials               563      559       -4     
Flag Coverage Δ
unittests 89.97% <9.09%> (+1.34%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kevaundray kevaundray changed the title chore(crypto): replace pycryptodome with hashlib chore(crypto): replace pycryptodome with hashlib Mar 1, 2026
@marioevz
Copy link
Copy Markdown
Member

marioevz commented Mar 2, 2026

Seems like this is only compatible with python 3.12+. Tagging @danceratopz for follow up.

@marioevz marioevz requested a review from danceratopz March 2, 2026 22:26
@danceratopz danceratopz self-assigned this Mar 2, 2026
Copy link
Copy Markdown
Member

@danceratopz danceratopz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kevaundray! More speed-up always good and as hashlib is in the standard library, this seems like a worthwhile change.

I'm assuming we need to keep pycrptodome as a dep for other crypto functions? And that's why you didn't update pyproject.toml?

from Crypto.Util.asn1 import DerSequence

Happy to limit the change to replacing the keccak implementation though.

From @marioevz

Seems like this is only compatible with python 3.12+. Tagging @danceratopz for follow up.

I think it depends on the version of OPENSSL that your Python is linked against, or in the case of a uv-managed Python, bundled with (the [python-build-standalone] project, now also maintained by astral, (https://github.com/astral-sh/python-build-standalone) statically links these deps).

I did a quick test of uv-managed Pythons: Initially, this failed locally for me with 3.11, 3.12. Then I upgraded my uv-managed 3.11 and 3.12 versions and it worked across all of them.

E.g,

uv python upgrade 3.11
Installed Python 3.11.14 in 4.11s
 + cpython-3.11.14-linux-x86_64-gnu

uv run --python 3.11 python -c 'import ssl, sys, hashlib; print(sys.version, ssl.OPENSSL_VERSION); h=hashlib.new("keccak-256", b"hashme"); print(h.hexdigest())'
3.11.14 (main, Feb 12 2026, 00:42:50) [Clang 21.1.4 ] OpenSSL 3.5.5 27 Jan 2026
7f98885dc9cf152c0bb08eaf056668f99c47cabd8fe01b1276f9a305b1389646

@kevaundray
Copy link
Copy Markdown
Contributor Author

Thanks @kevaundray! More speed-up always good and as hashlib is in the standard library, this seems like a worthwhile change.

I'm assuming we need to keep pycrptodome as a dep for other crypto functions? And that's why you didn't update pyproject.toml?

from Crypto.Util.asn1 import DerSequence

Happy to limit the change to replacing the keccak implementation though.

From @marioevz

Seems like this is only compatible with python 3.12+. Tagging @danceratopz for follow up.

I think it depends on the version of OPENSSL that your Python is linked against, or in the case of a uv-managed Python, bundled with (the [python-build-standalone] project, now also maintained by astral, (https://github.com/astral-sh/python-build-standalone) statically links these deps).

I did a quick test of uv-managed Pythons: Initially, this failed locally for me with 3.11, 3.12. Then I upgraded my uv-managed 3.11 and 3.12 versions and it worked across all of them.

E.g,

uv python upgrade 3.11
Installed Python 3.11.14 in 4.11s
 + cpython-3.11.14-linux-x86_64-gnu

uv run --python 3.11 python -c 'import ssl, sys, hashlib; print(sys.version, ssl.OPENSSL_VERSION); h=hashlib.new("keccak-256", b"hashme"); print(h.hexdigest())'
3.11.14 (main, Feb 12 2026, 00:42:50) [Clang 21.1.4 ] OpenSSL 3.5.5 27 Jan 2026
7f98885dc9cf152c0bb08eaf056668f99c47cabd8fe01b1276f9a305b1389646

Happy to keep it as is, if it may give compilation errors :) (I'm not that familiar with uv-managed)

I'm assuming we need to keep pycrptodome as a dep for other crypto functions? And that's why you didn't update pyproject.toml?

Ah I was only focussed on keccak since that was where I was measuring (Title is misleading in that case)

@kevaundray kevaundray changed the title chore(crypto): replace pycryptodome with hashlib chore(crypto): replace pycryptodome with hashlib for keccak Mar 3, 2026
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for keccak-256 or keccak-512 to be disabled in OpenSSL? If so, should we do something like:

if "keccak-256" in hashlib.algorithms_available:
    ...
else:
    ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I would lean towards not adding hashlib since it violates the "it just works" and KISS motto

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or atleast to my eyes, this part from Dan's message:

depends on the version of OPENSSL that your Python is linked against

Is a quirk that I think is not worth it, given we are in the python ecosystem

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think optimistically using OpenSSL would be fine, especially if we're keeping pycrptodome around for other reasons.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it's not much boilerplate code. But I wanted to see whether it was worth it, by checking how much of the performance gain would propagate to fill. TLDR: ~6.5% speed-up. Not huge but still worth it imo. More details in the updated PR description.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, hashlib.algorithms_available doesn't return "keccak-256" even when it is available, so I went for a try-except.

@github-actions
Copy link
Copy Markdown

This PR has had no recent activity and has been marked as stale.

@github-actions github-actions Bot added the stale The Issue/PR has not had any activity for 60 days. PRs will be automatically closed. label May 10, 2026
kevaundray and others added 3 commits May 11, 2026 11:38
Add tests that verify the active backend matches published Keccak-256
and Keccak-512 vectors, both backends produce byte-identical output, and
the pycryptodome fallback engages cleanly when `hashlib.new("keccak-256")`
is patched to raise. Also covers EEST `Bytes.keccak256` and
`trie.keccak256` routing.

Without the fallback dispatch these tests fail (or, on Pythons whose
hashlib lacks Keccak entirely, the test session aborts at collection),
so the suite drives the fix that follows.
OpenSSL only gained default-provider Keccak in 3.2.0 (Nov 2023). LTS
3.0.x and 3.1.x, still shipped by Debian 12 and RHEL 9, do not provide
it, so `hashlib.new("keccak-256", ...)` raises `ValueError` there and
every call to `keccak256` crashes.

At module import time, probe with `hashlib.new("keccak-256", b"")`. On
`ValueError`, rebind the digest helpers to pycryptodome's bundled
implementation. The probe is more reliable than checking
`hashlib.algorithms_available`, which omits some OpenSSL-provider
digests on python-build-standalone builds.

Route the EEST keccak helpers in `base_types.Bytes.keccak256` and
`test_types.trie.keccak256` through `ethereum.crypto.hash`, so the
dispatch lives in one place. Widen `keccak256` and `keccak512` parameter
types from `Bytes | bytearray` to `bytes | bytearray` so the EEST `Bytes`
subclass passes through without coercion.
Copy link
Copy Markdown
Member

@danceratopz danceratopz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me and think it's worth including. As we only use uv-managed Pythons in CI, this speed-up (~7%; see updated description for details) will be visible across all CI jobs that fill. I don't have a strong opinion though. I would just like to get resolution on this PR 🙂

@SamWilsn, could you please re-review the recent changes which added the fallback to pycryptodome you requested?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, hashlib.algorithms_available doesn't return "keccak-256" even when it is available, so I went for a try-except.

@danceratopz danceratopz requested a review from SamWilsn May 11, 2026 14:10
@danceratopz danceratopz added A-spec-specs Area: Specification—The Ethereum specification itself (eg. `src/ethereum/*`) A-test-tests Area: tests for packages/testing A-test-types Area: execution_testing.base_types and execution_testing.test_types C-refactor Category: refactor labels May 11, 2026
@danceratopz danceratopz changed the title chore(crypto): replace pycryptodome with hashlib for keccak refactor(spec-specs,test-types): replace pycryptodome with hashlib for keccak May 11, 2026
@danceratopz danceratopz marked this pull request as ready for review May 11, 2026 14:11
The previous `try/except` block defined `_keccak256_digest` and
`_keccak512_digest` inside its branches. `docc`'s static analyzer does
not traverse `try/except` at module scope when discovering module-level
names, so it could not resolve the references from `keccak256` and
`keccak512`, breaking the `Build Documentation` job with:

    docc.plugins.references.ReferenceError: in `src/ethereum/crypto/hash.py`,
    undefined identifier: `ethereum.crypto.hash._keccak256_digest`

Define the helpers unconditionally at module scope and branch on a
module-level `_USE_HASHLIB` flag inside their bodies. The probe runs
once at import; the runtime branch adds about 20ns per call, dwarfed by
the hash itself. `pycryptodome` is now imported unconditionally, which
is free since it is already a hard dep via `elliptic_curve.py`.

Update the dispatch tests to assert on `_USE_HASHLIB` instead of the
presence of `_pycryptodome_keccak` (now always imported).
@github-actions github-actions Bot removed the stale The Issue/PR has not had any activity for 60 days. PRs will be automatically closed. label May 13, 2026
Comment on lines +64 to +65
sys.modules.pop("ethereum.crypto.hash", None)
return importlib.import_module("ethereum.crypto.hash")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sys.modules.pop("ethereum.crypto.hash", None)
return importlib.import_module("ethereum.crypto.hash")
mod = importlib.import_module("ethereum.crypto.hash")
return importlib.reload(mod)

I think this would make me more comfortable?

Comment on lines +42 to +51
def _keccak256_digest(buffer: bytes | bytearray) -> bytes:
if _USE_HASHLIB:
return hashlib.new("keccak-256", buffer).digest()
return _pycryptodome_keccak.new(digest_bits=256).update(buffer).digest()


def _keccak512_digest(buffer: bytes | bytearray) -> bytes:
if _USE_HASHLIB:
return hashlib.new("keccak-512", buffer).digest()
return _pycryptodome_keccak.new(digest_bits=512).update(buffer).digest()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we invert these? For example:

if _USE_HASHLIB:
    def _keccak256_digest(buffer: bytes | bytearray) -> bytes:
        return hashlib.new("keccak-256", buffer).digest()
else:
    def _keccak256_digest(buffer: bytes | bytearray) -> bytes:
        return _pycryptodome_keccak.new(digest_bits=256).update(buffer).digest()

That way there's no conditional on every function call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-spec-specs Area: Specification—The Ethereum specification itself (eg. `src/ethereum/*`) A-test-tests Area: tests for packages/testing A-test-types Area: execution_testing.base_types and execution_testing.test_types C-refactor Category: refactor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants