Skip to content

Polygon mainnet: Erigon v3.3.7 intermittently falls behind by several thousand blocks occasionally #112

@sokiaobag

Description

@sokiaobag

System information

Erigon version: v3.3.7

OS & Version: Ubuntu Server 22.04 LTS "Jammy Jellyfish"

Commit hash:

Erigon Command (with flags/config):

      - '--chain'
      - bor-mainnet
      - '--datadir'
      - /mnt/erigon/data
      - '--bor.heimdall'
      - http://127.0.0.1:1317/
      - '--port'
      - '30303'
      - '--http'
      - '--http.addr'
      - 0.0.0.0
      - '--http.port'
      - '8545'
      - '--http.api'
      - 'admin,eth,debug,net,trace,web3,erigon,bor'
      - '--http.vhosts'
      - '*'
      - '--http.corsdomain'
      - '*'
      - '--ws'
      - '--torrent.port'
      - '42069'
      - '--txpool.pricelimit'
      - '30000000000'
      - --bootnodes=enode://b8f1cc9c5d4403703fbf377116469667d2b1823c0daf16b7250aa576bacf399e42c3930ccfcb02c5df6879565a2b8931335565f0e8d3f8e72385ecf4a4bf160a@3.36.224.80:30303,enode://8729e0c825f3d9cad382555f3e46dcff21af323e89025a0e6312df541f4a9e73abfa562d64906f5e59c51fe6f0501b3e61b07979606c56329c020ed739910759@54.194.245.5:30303,enode://07bc4cf87ff8f4e7dc51280991809940f26e846c944609ae4726309be73742a830040cd783989f6941e1b41c02405834bc6365059403a59ca9255ac695156235@34.89.75.187:30303,enode://2c3be2e637a68dc694498a44b6e0d57b5c762925ea97f941079a91f8a080b032fe2eb9e6c3230076e9fb046f626b5dcd3fb045dc9c194689a359aa7167ae0f6c@34.142.43.249:30303,enode://a0bc4dd2b59370d5a375a7ef9ac06cf531571005ae8b2ead2e9aaeb8205168919b169451fb0ef7061e0d80592e6ed0720f559bd1be1c4efb6e6c4381f1bdb986@35.246.99.203:30303,enode://f2b0d50e0b843d38ddcab59614f93065e2c82130100032f86ae193eb874505de12fcaf12502dfd88e339b817c0b374fa4b4f7c4d5a4d1aa04f29c503d95e0228@35.197.233.240:30303,enode://8a3f21c293c913a1148116a295aa69fdf41b9c5b0b0628d49be751aa8c025ae2ec1973d6d84cea8e2aba5541b5d76219dfaae41a124d42d0f56d4e1af50b74f8@35.246.95.65:30303,enode://f5cfe35f47ed928d5403aa28ee616fd64ed7daa527b5ae6a7bc412ca25eaad9b6bf2f776144fd9f8e7e9c80b5360a9c03b67f1d47ea88767def7d391cc7e0cd1@34.105.180.11:30303,enode://fc7624241515f9d5e599a396362c29de92b13a048ad361c90dd72286aa4cca835ba65e140a46ace70cc4dcb18472a476963750b3b69d958c5f546d48675880a8@34.147.169.102:30303,enode://7400e4bc70c56de26d5d240474a1b78af0bf8f0db567edfa851c9724ed697ca7692a92483369e9633d4342a036d10223958007160765d0317a1073f86f2a80c8@34.89.55.74:30303
      - '--torrent.upload.rate=1024mb'
      - '--torrent.download.rate=1024mb'
      - '--batchSize'
      - 512MB
      - '--etl.bufferSize'
      - 512MB
      - '--bodies.cache'
      - '8GB'
      - '--verbosity'
      - info
      - '--log.console.verbosity'
      - info
      - '--db.size.limit=16TB'
      - '--db.pagesize=8KB'
      - '--db.read.concurrency=512'
      - '--rpc.batch.concurrency=512'
      - '--pprof'
      - '--prune.mode=archive'
      - '--discovery.dns=enrtree://AKUEZKN7PSKVNR65FZDHECMKOJQSGPARGTPPBI7WS2VUL4EGR6XPC@pos.polygon-peers.io'

Consensus Layer: heimdall-v2@v0.5.6

Consensus Layer Command (with flags/config):

      - start
      - '--chain=mainnet'
      - '--moniker'
      - [DUMMY]
      - '--p2p.seeds'
      - 'a0ef6f328949adc077c59ab1f6b03711ae8d32d2@34.185.209.56:26656,f1e632758dfaf616a833900c0b8845bb2547b7c2@34.185.162.14:26656,e49bb5d9cb22943fb2b9f49a4c5d0f773917efaf@34.179.171.228:26656,babb8151d6fae45fcbb9229bd9faba173f3feaf3@35.246.166.189:26656,9c92984a5aad02c43955da94bb0a979a8dadbcfe@34.142.28.190:26656,3643aeae6a5965053709303e97257f62012fdd9c@34.39.56.114:26656,830d44b0d11ab25c9a03135859049d55daf73a03@34.147.169.102:26656,b0e795afc432ea3557b377d7763f6fb6dd102e60@34.105.180.11:26656'
      - '--rpc.laddr'
      - 'tcp://0.0.0.0:26657'

Chain/Network: Polygon Mainnet

Expected behaviour

The node should stay in sync with the Polygon mainnet continuously under normal load.

Actual behaviour

About once per day, the node falls behind by several thousand blocks. When this happens, recovery takes a few hours. Outside of these periods, the node operates normally.

The server specs are:

  • CPU: 16 cores / 32 threads (4.2 GHz base / 5.7 GHz boost)
  • RAM: 128 GB
  • Deployment: Bare metal

When the node falls behind, logs like the following are observed.

[INFO] [01-26|17:41:59.342] [backward-block-downloader] downloading initial header from all peers hash=0x433602a8d7929052df5c798ce35f160937f84c7d902e51643a9de0e6de1f8821
[INFO] [01-26|17:41:59.366] [backward-block-downloader] downloading header chain backward from initial header num=82164558 hash=0x433602a8d7929052df5c798ce35f160937f84c7d902e51643a9de0e6de1f8821
[INFO] [01-26|17:41:59.882] [backward-block-downloader] downloading initial header from all peers hash=0x8a4837a53ea3b22690af3aef58b974cd91e3c31924d73634b09aae42e2914559
[INFO] [01-26|17:41:59.906] [backward-block-downloader] downloading header chain backward from initial header num=82164559 hash=0x8a4837a53ea3b22690af3aef58b974cd91e3c31924d73634b09aae42e2914559
[INFO] [01-26|17:42:00.010] [backward-block-downloader] fetching headers backward periodic progress num=82163715 hash=0x8c695511e0ab41a0d194b8b09056711c9e0803219d810cf2fd465790edd0b50e amount=256 peerId=f0f899fd7ee954e8977183c30705a9753b7a18ea14edf980c4aade35196e4353611d1f257bd03ab486a04090037be31c17b6ff40790fb5c3f2d15dce4ce6f1bd
[INFO] [01-26|17:42:00.533] [backward-block-downloader] fetching headers backward periodic progress num=82163716 hash=0x804b3bc3f581943c5beb79086e195642235110858a7674389bc03037ce696f72 amount=256 peerId=f0f899fd7ee954e8977183c30705a9753b7a18ea14edf980c4aade35196e4353611d1f257bd03ab486a04090037be31c17b6ff40790fb5c3f2d15dce4ce6f1bd
[INFO] [01-26|17:42:01.010] [backward-block-downloader] downloading initial header from all peers hash=0x78d73e829dab9a5d2b6bd2f87c99310e351fdee42123af51b9e7a421d01593d9
[INFO] [01-26|17:42:01.035] [backward-block-downloader] downloading header chain backward from initial header num=82164560 hash=0x78d73e829dab9a5d2b6bd2f87c99310e351fdee42123af51b9e7a421d01593d9
[INFO] [01-26|17:42:01.037] [backward-block-downloader] fetching headers backward periodic progress num=82163717 hash=0x280b229d13c2c8ebca111f5db1b412026c876261bf280f6b50c3bef56a058dee amount=256 peerId=f0f899fd7ee954e8977183c30705a9753b7a18ea14edf980c4aade35196e4353611d1f257bd03ab486a04090037be31c17b6ff40790fb5c3f2d15dce4ce6f1bd
[INFO] [01-26|17:42:01.195] [bor.heimdall] scraper progress          name=milestones rangeStart=6481713 rangeEnd=6481713 priorLastKnownId=6481712 newLast=6481713
[INFO] [01-26|17:42:01.600] [backward-block-downloader] downloading initial header from all peers hash=0xc8efe6261d570e235cd8d9d1f95a740c3acbe30792fd1068d4a9b537a864e77a
[INFO] [01-26|17:42:01.625] [backward-block-downloader] downloading header chain backward from initial header num=82164561 hash=0xc8efe6261d570e235cd8d9d1f95a740c3acbe30792fd1068d4a9b537a864e77a
[INFO] [01-26|17:42:01.807] [backward-block-downloader] downloading initial header from all peers hash=0x4576c46a853b7b77fc857300a37b51efbd5a04713c3b1a9d0d51f9cdf5251789
[INFO] [01-26|17:42:01.832] [backward-block-downloader] downloading header chain backward from initial header num=82164562 hash=0x4576c46a853b7b77fc857300a37b51efbd5a04713c3b1a9d0d51f9cdf5251789
[INFO] [01-26|17:42:02.701] [backward-block-downloader] fetching headers backward periodic progress num=82163720 hash=0x4e7c1962400b67347a33ec2a328bc18665a19e686a413813014a043d78f50565 amount=256 peerId=f0f899fd7ee954e8977183c30705a9753b7a18ea14edf980c4aade35196e4353611d1f257bd03ab486a04090037be31c17b6ff40790fb5c3f2d15dce4ce6f1bd
[INFO] [01-26|17:42:02.959] [backward-block-downloader] starting forward downloading of blocks count=992 fromNum=82163419 fromHash=0x25629c0476122578dbdd50c0ffaa6415a6ba1ed8ac244a8a47d98177279c7799 toNum=82164410 toHash=0x27f447d6105d4a54dbf3b199dd8a184ff77a470ce2561ab08b9b0fa9219a40d4
[INFO] [01-26|17:42:03.070] [backward-block-downloader] fetching headers backward periodic progress num=82163721 hash=0x92122c93f617e55d3154f48226a17c66410840e6a47698467d5bc1d8ca9b9a28 amount=256 peerId=f0f899fd7ee954e8977183c30705a9753b7a18ea14edf980c4aade35196e4353611d1f257bd03ab486a04090037be31c17b6ff40790fb5c3f2d15dce4ce6f1bd

Additionally, during these events, the polygon_erigon container’s memory usage reaches approximately 106 GiB out of a 108 GiB limit, suggesting that memory pressure may be involved.

CONTAINER ID   NAME                 CPU %     MEM USAGE / LIMIT   MEM %     NET I/O           BLOCK I/O         PIDS
5c190eb6e568   polygon_erigon       182.96%   106.2GiB / 108GiB   98.38%    0B / 0B           15.6TB / 17.9TB   139
b48d12a84769   polygon_heimdall     21.66%    7.248GiB / 48GiB    15.10%    0B / 0B           754GB / 8.45TB    15

Steps to reproduce the behaviour

The issue occurs intermittently during normal operation (approximately once per day).
No specific manual steps are required to trigger it.

Backtrace

[backtrace]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions