[kafka-connect]: Add troubleshooting entry on handling large connection counts#6158
[kafka-connect]: Add troubleshooting entry on handling large connection counts#6158kurnoolsaketh wants to merge 2 commits intomainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
3 Skipped Deployments
|
| High insertion frequency may result in many open connections to your database. This is common in large task count or distributed connector deployments where the insert rate to a single ClickHouse instance is high. In ClickHouse Cloud, a common symptom of this issue is requests being rate limited by the cloud proxy/load balancer. | ||
|
|
||
| Some strategies to reduce the number of open connections are: | ||
| 1. Adjust the connection pool settings on the Java client (note that these may reduce overall throughput): |
There was a problem hiding this comment.
@chernser are java client options configurable on the connector?
There was a problem hiding this comment.
yes, they are configured via jdbcConnectionProperties this is similar to JDBC url. However if configuration has some unknown property it may be treated as ClickHouse settings for V1
| transforms.keyToValue.field=_key | ||
| ``` | ||
|
|
||
| #### "There are too many open connections to my ClickHouse instance" {#too-many-open-connections} |
There was a problem hiding this comment.
I would name it shorter "Too many DB connections".
| ``` | ||
|
|
||
| #### "There are too many open connections to my ClickHouse instance" {#too-many-open-connections} | ||
| High insertion frequency may result in many open connections to your database. This is common in large task count or distributed connector deployments where the insert rate to a single ClickHouse instance is high. In ClickHouse Cloud, a common symptom of this issue is requests being rate limited by the cloud proxy/load balancer. |
There was a problem hiding this comment.
This needs careful explanation because there may be many records and we are fine with that and there can be many small batches.
Current text declares problem way too broadly and do not explain why it happens.
It may be worth having structuring the record in way
<Description>
Symptoms on prem:
Symptoms on cloud:
Metrics to check:
<What configuration to change, how to troubleshoot>
| - `max_open_connections`: defaults to 10. Reducing this will bound the number of open connections per task. | ||
| - `connection_ttl`: defaults to -1 (no ttl). Setting this to >0 will eagerly reclaim connections after the ttl expires. | ||
|
|
||
| See the [Java client Connection & Endpoints configuration tab](https://clickhouse.com/docs/integrations/language-clients/java/client#configuration) for more details. |
There was a problem hiding this comment.
Java client is not configured via endpoint.
But troubleshooting guide expects that user know how to configure client in the connector - if it is not explained before then it should be fixed.
|
|
||
| See the [Java client Connection & Endpoints configuration tab](https://clickhouse.com/docs/integrations/language-clients/java/client#configuration) for more details. | ||
|
|
||
| 2. Increase `bufferCount`: in high data volume/throughput deployments, this will increase the number of records buffered between inserts and reduce the frequency of insert queries sent to your database. This will reduce the number of new connections needed to write to your database. |
There was a problem hiding this comment.
Is ti really for data volume and throughput?
This is not - before that sink was handling big batches just fine.
Summary
Adds an entry addressing an edge case in which high data volume connector deployments may result in a large number of open connections to ClickHouse.