My testnet validator posts many log messages of the form:
2022-01-17T19:34:58Z INF backing off (retry in 6.950446643s ): rpc error: code = InvalidArgument desc = account sequence mismatch, expected 1623, got 1622: incorrect account sequence: invalid request module=vald
ie. sequence mismatch off by 1 for a couple hundred attempts now for hours. This behaviour persists even after restarting the node. (This puts the validator in missed-too-many-blocks status.)
More logs:
2022-01-17T20:04:58Z INF operator axelarvaloper1ckg9aus8tv9xayyc9pzz8yxvrpty9mmnc09yak sending heartbeat acknowledging keys: [avax-master-1 avax-master-genesis avax-secondary-1 avax-secondary-genesis eth-master-1 eth-master-genesis eth-secondary-1 eth-secondary-genesis ftm-master-1 ftm-master-genesis ftm-secondary-1 ftm-secondary-genesis moonbeam-master-1 moonbeam-master-genesis moonbeam-secondary-1 moonbeam-secondary-genesis poly-master-1 poly-master-genesis poly-secondary-1 poly-secondary-genesis] listener=tss module=vald
2022-01-17T20:04:58Z INF operator axelarvaloper1ckg9aus8tv9xayyc9pzz8yxvrpty9mmnc09yak sending heartbeat acknowledging keys: [avax-master-1 avax-master-genesis avax-secondary-1 avax-secondary-genesis eth-master-1 eth-master-genesis eth-secondary-1 eth-secondary-genesis ftm-master-1 ftm-master-genesis ftm-secondary-1 ftm-secondary-genesis moonbeam-master-1 moonbeam-master-genesis moonbeam-secondary-1 moonbeam-secondary-genesis poly-master-1 poly-master-genesis poly-secondary-1 poly-secondary-genesis] listener=tss module=vald
2022-01-17T20:05:08Z INF backing off (retry in 5s ): RPC error -32603 - Internal error: timed out waiting for tx to be included in a block module=vald
2022-01-17T20:05:08Z INF backing off (retry in 5s ): RPC error -32603 - Internal error: timed out waiting for tx to be included in a block module=vald
2022-01-17T20:05:16Z INF backing off (retry in 6.903285946s ): rpc error: code = InvalidArgument desc = account sequence mismatch, expected 1626, got 1625: incorrect account sequence: invalid request module=vald
2022-01-17T20:05:16Z INF backing off (retry in 6.903285946s ): rpc error: code = InvalidArgument desc = account sequence mismatch, expected 1626, got 1625: incorrect account sequence: invalid request module=vald
...
2022-01-17T20:08:16Z ERR handler goroutine: failure to broadcast outgoing heartbeat msg: aborting after 10 retries: rpc error: code = InvalidArgument desc = account sequence mismatch, expected 1626, got 1625: incorrect account sequence: invalid request module=vald
2022-01-17T20:08:16Z ERR handler goroutine: failure to broadcast outgoing heartbeat msg: aborting after 10 retries: rpc error: code = InvalidArgument desc = account sequence mismatch, expected 1626, got 1625: incorrect account sequence: invalid request module=vald
A while later I also see
2022-01-17T21:16:03Z ERR tx response with error: : tx already in mempool module=vald
2022-01-17T21:16:03Z ERR job panicked:runtime error: invalid memory address or nil pointer dereference module=vald
2022-01-17T21:16:03Z ERR tx response with error: : tx already in mempool module=vald
2022-01-17T21:16:03Z ERR job panicked:runtime error: invalid memory address or nil pointer dereference module=vald
@milapsheth suggests that perhaps there are two instances of vald running. My investigation supports this. A ps inside the vald container shows:
docker exec -it vald sh
/ # ps
PID USER TIME COMMAND
1 root 0:00 {vald-entrypoint} /bin/sh /home/axelard/shared/vald-entrypoint.sh startValdProc
30 root 0:00 {vald-entrypoint} /bin/sh /home/axelard/shared/vald-entrypoint.sh startValdProc
33 root 0:20 axelard vald-start --tofnd-host tofnd --node http://axelar-core:26657 --validator-addr axelarvaloper1ckg9aus8tv9xayyc9pzz8yxvrpty9mmnc09yak
44 root 0:00 sh
51 root 0:00 ps
suggesting that startValdProc is being called twice.
Look at lines 341 and 344 of the following snippet:
|
docker run \ |
|
-d \ |
|
--name vald \ |
|
--network "${docker_network}" \ |
|
--user 0:0 \ |
|
--restart unless-stopped \ |
|
--env TOFND_HOST=tofnd \ |
|
--env HOME=/home/axelard \ |
|
--env VALIDATOR_HOST=http://axelar-core:26657 \ |
|
--env PRESTART_SCRIPT=/home/axelard/shared/init-vald.sh \ |
|
--env CONFIG_PATH=/home/axelard/shared/ \ |
|
--env SLEEP_TIME=2s \ |
|
--env VALIDATOR_ADDR="${validator}" \ |
|
--env RECOVERY_FILE=/home/axelard/.axelar/recovery.json \ |
|
--env KEYRING_PASSWORD="${KEYRING_PASSWORD}" \ |
|
--env AXELAR_MNEMONIC_PATH=/home/axelard/shared/broadcaster.txt \ |
|
--env AXELARD_CHAIN_ID="${chain_id}" \ |
|
--entrypoint /home/axelard/shared/vald-entrypoint.sh \ |
|
-v "${vald_directory}/:/home/axelard/.axelar" \ |
|
-v "${shared_directory}/:/home/axelard/shared" \ |
|
"axelarnet/axelar-core:${axelar_core_version}" startValdProc |
344: run startValdProc
341: call vald-entrypoint.sh which does:
My testnet validator posts many log messages of the form:
ie. sequence mismatch off by 1 for a couple hundred attempts now for hours. This behaviour persists even after restarting the node. (This puts the validator in
missed-too-many-blocksstatus.)More logs:
A while later I also see
@milapsheth suggests that perhaps there are two instances of vald running. My investigation supports this. A
psinside the vald container shows:suggesting that
startValdProcis being called twice.Look at lines 341 and 344 of the following snippet:
axelarate-community/scripts/validator-tools-docker.sh
Lines 324 to 344 in fddf430
344: run
startValdProc341: call
vald-entrypoint.shwhich does:axelarate-community/configuration/vald-entrypoint.sh
Line 67 in fddf430