Skip to content

DHIS2 server unresponsive after heavy concurrent requests

Arnau Sanchez edited this page Aug 11, 2017 · 2 revisions

Description

The problem appears when a lot of concurrent requests hit an endpoint which stresses the database. For example, this ab command leaves the server unresponsive (50 concurrent requests):

$ ab -q -s 9999 -A admin:district -n 50 -c 50 -m GET \
  https://play.dhis2.org/android-previous1/api/reportTables/xIWpSo5jjT1/data.html

Preliminary analysis

The problem needs further investigation, just as a starting point, some notes:

  • There are no logs in tomcat/DHIS2 with errors that could give hints of the problem.
  • When the server is unresponsive, there are as many processes "postgres: dhis previous1 127.0.0.1(33822) idle in transaction", as simultaneous requests. This may be a signal of a deadlock in transactions from dhis2/hibernate code (caveat: or the consequence of an error somewhere else!)

Testing snippet:

watch '
  echo -n "pg_locks: "
  echo "select * from pg_locks;" | sudo -u postgres psql -t previous1  | grep "." | wc -l
  echo -n "idle in transaction: "
  ps awx | grep "[i]dle in transaction" | wc -l
'

When idle:

pg_locks: 2
idle in transaction: 0

With 40 requests from ab, it freezes the server and you get this data:

pg_locks: 4102
idle in transaction: 40

Maybe relevant:

https://stackoverflow.com/questions/32255557/postgresql-hang-forever-on-serializable-transaction

https://dba.stackexchange.com/questions/118922/select-1-idle-in-transaction

Clone this wiki locally