Skip to content

incorrect account sequence errors in vald #321

@ggutoski

Description

@ggutoski

My testnet validator posts many log messages of the form:

2022-01-17T19:34:58Z INF backing off (retry in 6.950446643s ): rpc error: code = InvalidArgument desc = account sequence mismatch, expected 1623, got 1622: incorrect account sequence: invalid request module=vald

ie. sequence mismatch off by 1 for a couple hundred attempts now for hours. This behaviour persists even after restarting the node. (This puts the validator in missed-too-many-blocks status.)

More logs:

2022-01-17T20:04:58Z INF operator axelarvaloper1ckg9aus8tv9xayyc9pzz8yxvrpty9mmnc09yak sending heartbeat acknowledging keys: [avax-master-1 avax-master-genesis avax-secondary-1 avax-secondary-genesis eth-master-1 eth-master-genesis eth-secondary-1 eth-secondary-genesis ftm-master-1 ftm-master-genesis ftm-secondary-1 ftm-secondary-genesis moonbeam-master-1 moonbeam-master-genesis moonbeam-secondary-1 moonbeam-secondary-genesis poly-master-1 poly-master-genesis poly-secondary-1 poly-secondary-genesis] listener=tss module=vald
2022-01-17T20:04:58Z INF operator axelarvaloper1ckg9aus8tv9xayyc9pzz8yxvrpty9mmnc09yak sending heartbeat acknowledging keys: [avax-master-1 avax-master-genesis avax-secondary-1 avax-secondary-genesis eth-master-1 eth-master-genesis eth-secondary-1 eth-secondary-genesis ftm-master-1 ftm-master-genesis ftm-secondary-1 ftm-secondary-genesis moonbeam-master-1 moonbeam-master-genesis moonbeam-secondary-1 moonbeam-secondary-genesis poly-master-1 poly-master-genesis poly-secondary-1 poly-secondary-genesis] listener=tss module=vald
2022-01-17T20:05:08Z INF backing off (retry in 5s ): RPC error -32603 - Internal error: timed out waiting for tx to be included in a block module=vald
2022-01-17T20:05:08Z INF backing off (retry in 5s ): RPC error -32603 - Internal error: timed out waiting for tx to be included in a block module=vald
2022-01-17T20:05:16Z INF backing off (retry in 6.903285946s ): rpc error: code = InvalidArgument desc = account sequence mismatch, expected 1626, got 1625: incorrect account sequence: invalid request module=vald
2022-01-17T20:05:16Z INF backing off (retry in 6.903285946s ): rpc error: code = InvalidArgument desc = account sequence mismatch, expected 1626, got 1625: incorrect account sequence: invalid request module=vald

...

2022-01-17T20:08:16Z ERR handler goroutine: failure to broadcast outgoing heartbeat msg: aborting after 10 retries: rpc error: code = InvalidArgument desc = account sequence mismatch, expected 1626, got 1625: incorrect account sequence: invalid request module=vald
2022-01-17T20:08:16Z ERR handler goroutine: failure to broadcast outgoing heartbeat msg: aborting after 10 retries: rpc error: code = InvalidArgument desc = account sequence mismatch, expected 1626, got 1625: incorrect account sequence: invalid request module=vald

A while later I also see

2022-01-17T21:16:03Z ERR tx response with error: : tx already in mempool module=vald
2022-01-17T21:16:03Z ERR job panicked:runtime error: invalid memory address or nil pointer dereference module=vald
2022-01-17T21:16:03Z ERR tx response with error: : tx already in mempool module=vald
2022-01-17T21:16:03Z ERR job panicked:runtime error: invalid memory address or nil pointer dereference module=vald

@milapsheth suggests that perhaps there are two instances of vald running. My investigation supports this. A ps inside the vald container shows:

docker exec -it vald sh
/ # ps
PID   USER     TIME  COMMAND
    1 root      0:00 {vald-entrypoint} /bin/sh /home/axelard/shared/vald-entrypoint.sh startValdProc
   30 root      0:00 {vald-entrypoint} /bin/sh /home/axelard/shared/vald-entrypoint.sh startValdProc
   33 root      0:20 axelard vald-start --tofnd-host tofnd --node http://axelar-core:26657 --validator-addr axelarvaloper1ckg9aus8tv9xayyc9pzz8yxvrpty9mmnc09yak
   44 root      0:00 sh
   51 root      0:00 ps

suggesting that startValdProc is being called twice.

Look at lines 341 and 344 of the following snippet:

docker run \
-d \
--name vald \
--network "${docker_network}" \
--user 0:0 \
--restart unless-stopped \
--env TOFND_HOST=tofnd \
--env HOME=/home/axelard \
--env VALIDATOR_HOST=http://axelar-core:26657 \
--env PRESTART_SCRIPT=/home/axelard/shared/init-vald.sh \
--env CONFIG_PATH=/home/axelard/shared/ \
--env SLEEP_TIME=2s \
--env VALIDATOR_ADDR="${validator}" \
--env RECOVERY_FILE=/home/axelard/.axelar/recovery.json \
--env KEYRING_PASSWORD="${KEYRING_PASSWORD}" \
--env AXELAR_MNEMONIC_PATH=/home/axelard/shared/broadcaster.txt \
--env AXELARD_CHAIN_ID="${chain_id}" \
--entrypoint /home/axelard/shared/vald-entrypoint.sh \
-v "${vald_directory}/:/home/axelard/.axelar" \
-v "${shared_directory}/:/home/axelard/shared" \
"axelarnet/axelar-core:${axelar_core_version}" startValdProc

344: run startValdProc
341: call vald-entrypoint.sh which does:

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions