Bug Report: Hardcoded 4096-byte STATUS_QUERY_MAX_SIZE_BYTES rejects legitimate gateway /status polling (1.8.1+)
Summary
The /status endpoint in indexer-service-rs has a hardcoded 4096-byte body size limit (STATUS_QUERY_MAX_SIZE_BYTES) that rejects the indexingProgress query sent by Graph protocol gateways when the indexer has more than ~75 active allocations. The gateway batches deployment IDs into groups of 100, but 100 deployment IDs produce a ~5,275-byte body that exceeds the limit. This results in HTTP 400 errors for the majority of status polling requests.
We have observed 394,724 rejected requests over 61.5 hours (~107/min) from multiple Graph gateway IPs.
Environment
- indexer-service-rs: latest release (Rust rewrite)
- Active allocations: ~825
- Affected endpoint:
POST /status
Source IPs (Graph Gateway Infrastructure)
All rejected requests originate from known Graph protocol gateway infrastructure across multiple cloud providers and regions:
| Source IP |
Cloud Provider |
Region |
Rejected Requests |
| 34.185.191.203 |
GCP |
US |
49,158 |
| 116.202.192.158 |
Hetzner |
DE (Germany) |
47,829 |
| 34.116.224.212 |
GCP |
PL (Poland) |
46,848 |
| 35.221.213.148 |
GCP |
TW (Taiwan) |
40,191 |
| 35.226.217.37 |
GCP |
US |
36,487 |
| 34.106.165.124 |
GCP |
US |
29,090 |
| 35.245.28.4 |
GCP |
US |
28,330 |
| 34.106.30.165 |
GCP |
US |
27,424 |
| 35.221.97.60 |
GCP |
TW (Taiwan) |
25,040 |
| 34.86.113.72 |
GCP |
US |
23,600 |
| 136.111.129.249 |
GCP |
US |
21,256 |
| 35.200.111.181 |
GCP |
TW (Taiwan) |
17,400 |
| (+ 5 others) |
|
|
2,063 |
| Total |
|
|
394,716 |
Observation window: 2026-02-06T00:30:13Z to 2026-02-08T13:59:26Z (61.5 hours).
The Rejected Query (captured via tcpdump)
The gateway sends the following indexingProgress query to poll subgraph sync status. The query template is small (~210 bytes of GraphQL), but it includes all deployment IDs for the current batch in the $deployments variable:
{
"query": "\n query indexingProgress($deployments: [String!]!) {\n indexingStatuses(subgraphs: $deployments) {\n subgraph\n chains {\n network\n latestBlock { number }\n earliestBlock { number }\n }\n }\n }",
"variables": {
"deployments": [
"QmXZQLa1fVsZyTU1asb4NaHy1WJxDMpTGSP5RmZcdm379u",
"QmdCKcx2br3W7XbcMjLwwfXRZxrgL6WvRsqLnmAh4qkGGH",
"... (100 deployment IDs total)"
]
}
}
Total body size: 5,275 bytes (exceeds the 4,096-byte limit by 1,179 bytes).
Batching Behavior
The gateway splits deployment IDs into batches of 100. With ~825 allocations:
| Batch |
Deployments |
Body Size |
HTTP Status |
Result |
| 1-8 |
100 each |
5,275 bytes |
400 |
Rejected |
| 9 (remainder) |
25 |
1,600 bytes |
200 |
Success |
| Version check |
0 |
35 bytes |
200 |
Success |
8 out of 9 batches are rejected per polling cycle. The gateway only receives indexing status for 25 out of 825 subgraphs.
Root Cause in Code
The limit is a hardcoded constant, not configurable via TOML config or environment variables:
crates/service/src/constants.rs:
/// 4KB is generous for legitimate status queries, which are typically
/// under 500 bytes. Complex queries with many fields rarely exceed 2KB.
pub const STATUS_QUERY_MAX_SIZE_BYTES: usize = 4096;
crates/service/src/routes/status.rs:
pub async fn status(
State(state): State<GraphNodeState>,
body: Bytes,
) -> Result<impl IntoResponse, SubgraphServiceError> {
if body.len() > STATUS_QUERY_MAX_SIZE_BYTES {
return Err(SubgraphServiceError::InvalidStatusQuery(anyhow::anyhow!(
"Query exceeds maximum size of {} bytes",
STATUS_QUERY_MAX_SIZE_BYTES
)));
}
// ...
The check is on the raw HTTP body (JSON envelope + query + all variables), not just the GraphQL query string.
Note: the existing max_request_body_size config option (default 2MB) only applies to the main /subgraphs/id/{id} query endpoint, not to the /status route.
Why the Limit Is Too Low
- Each IPFS CID (deployment ID) is 46 characters + 3 bytes for JSON encoding (
",",,) = ~49 bytes per ID
- Query template overhead: ~380 bytes
- Maximum deployments per batch at 4,096 bytes: (4096 - 380) / 49 = ~75 deployments
- The gateway sends 100 per batch, producing 5,275 bytes
- Any indexer with more than ~75 allocations will hit this on every status poll
The comment in the code ("typically under 500 bytes") reflects a status query without the $deployments variable (e.g., { version { version } } which is 35 bytes). The indexingProgress query with deployment IDs is a legitimate and expected gateway query pattern.
Secondary Issue: costModels Batch Limit
During investigation, a related issue was also observed. The gateway sends a costModels query with all 825 deployment IDs in a single request, which hits a separate batch limit:
{"data":null,"errors":[{"message":"Batch size 825 exceeds maximum allowed (200)","locations":[{"line":3,"column":17}],"path":["costModels"]}]}
This limit (max_cost_model_batch_size) is configurable in the TOML config (default: 200), but the gateway does not batch this query — it sends all deployment IDs at once.
Suggested Fix
-
Make STATUS_QUERY_MAX_SIZE_BYTES configurable via the [service] TOML config section (e.g., status_query_max_size_bytes), similar to how max_request_body_size is configurable for the query endpoint.
-
Increase the default to something that accommodates the gateway's batch size of 100 deployment IDs. A value of 8,192 bytes (8KB) would support up to ~159 deployments per batch, giving comfortable headroom. Alternatively, align it with max_request_body_size since the same DoS protections (rate limiting, authentication) apply to the /status route.
-
Consider the costModels batching gap: the gateway sends all deployment IDs in a single costModels request without respecting max_cost_model_batch_size. Either the gateway needs to batch, or the default needs to be higher.
Impact
- Gateways cannot determine indexing progress for the vast majority of allocated subgraphs
- This likely affects query routing decisions, as the gateway has incomplete sync status data
- Every indexer with >75 allocations is affected
- The error rate scales with the number of allocations and the number of gateway nodes polling
How This Was Captured
Traffic was captured on the reverse proxy host using tcpdump on the plain HTTP segment between Traefik and the upstream indexer-service (port 7610), then reassembled with tshark:
tcpdump -i enp6s18 -s 0 -w capture.pcap 'port 7610'
tshark -r capture.pcap -qz "follow,tcp,ascii,<stream_id>"
Bug Report: Hardcoded 4096-byte STATUS_QUERY_MAX_SIZE_BYTES rejects legitimate gateway /status polling (1.8.1+)
Summary
The
/statusendpoint inindexer-service-rshas a hardcoded 4096-byte body size limit (STATUS_QUERY_MAX_SIZE_BYTES) that rejects theindexingProgressquery sent by Graph protocol gateways when the indexer has more than ~75 active allocations. The gateway batches deployment IDs into groups of 100, but 100 deployment IDs produce a ~5,275-byte body that exceeds the limit. This results in HTTP 400 errors for the majority of status polling requests.We have observed 394,724 rejected requests over 61.5 hours (~107/min) from multiple Graph gateway IPs.
Environment
POST /statusSource IPs (Graph Gateway Infrastructure)
All rejected requests originate from known Graph protocol gateway infrastructure across multiple cloud providers and regions:
Observation window: 2026-02-06T00:30:13Z to 2026-02-08T13:59:26Z (61.5 hours).
The Rejected Query (captured via tcpdump)
The gateway sends the following
indexingProgressquery to poll subgraph sync status. The query template is small (~210 bytes of GraphQL), but it includes all deployment IDs for the current batch in the$deploymentsvariable:{ "query": "\n query indexingProgress($deployments: [String!]!) {\n indexingStatuses(subgraphs: $deployments) {\n subgraph\n chains {\n network\n latestBlock { number }\n earliestBlock { number }\n }\n }\n }", "variables": { "deployments": [ "QmXZQLa1fVsZyTU1asb4NaHy1WJxDMpTGSP5RmZcdm379u", "QmdCKcx2br3W7XbcMjLwwfXRZxrgL6WvRsqLnmAh4qkGGH", "... (100 deployment IDs total)" ] } }Total body size: 5,275 bytes (exceeds the 4,096-byte limit by 1,179 bytes).
Batching Behavior
The gateway splits deployment IDs into batches of 100. With ~825 allocations:
8 out of 9 batches are rejected per polling cycle. The gateway only receives indexing status for 25 out of 825 subgraphs.
Root Cause in Code
The limit is a hardcoded constant, not configurable via TOML config or environment variables:
crates/service/src/constants.rs:crates/service/src/routes/status.rs:The check is on the raw HTTP body (JSON envelope + query + all variables), not just the GraphQL query string.
Note: the existing
max_request_body_sizeconfig option (default 2MB) only applies to the main/subgraphs/id/{id}query endpoint, not to the/statusroute.Why the Limit Is Too Low
",",,) = ~49 bytes per IDThe comment in the code ("typically under 500 bytes") reflects a status query without the
$deploymentsvariable (e.g.,{ version { version } }which is 35 bytes). TheindexingProgressquery with deployment IDs is a legitimate and expected gateway query pattern.Secondary Issue: costModels Batch Limit
During investigation, a related issue was also observed. The gateway sends a
costModelsquery with all 825 deployment IDs in a single request, which hits a separate batch limit:{"data":null,"errors":[{"message":"Batch size 825 exceeds maximum allowed (200)","locations":[{"line":3,"column":17}],"path":["costModels"]}]}This limit (
max_cost_model_batch_size) is configurable in the TOML config (default: 200), but the gateway does not batch this query — it sends all deployment IDs at once.Suggested Fix
Make
STATUS_QUERY_MAX_SIZE_BYTESconfigurable via the[service]TOML config section (e.g.,status_query_max_size_bytes), similar to howmax_request_body_sizeis configurable for the query endpoint.Increase the default to something that accommodates the gateway's batch size of 100 deployment IDs. A value of 8,192 bytes (8KB) would support up to ~159 deployments per batch, giving comfortable headroom. Alternatively, align it with
max_request_body_sizesince the same DoS protections (rate limiting, authentication) apply to the/statusroute.Consider the
costModelsbatching gap: the gateway sends all deployment IDs in a singlecostModelsrequest without respectingmax_cost_model_batch_size. Either the gateway needs to batch, or the default needs to be higher.Impact
How This Was Captured
Traffic was captured on the reverse proxy host using
tcpdumpon the plain HTTP segment between Traefik and the upstream indexer-service (port 7610), then reassembled withtshark: