-
Notifications
You must be signed in to change notification settings - Fork 143
Description
Occasionally we get HTTP-TPC pull transfer failures
socket timeout on GET (received 1611825152 B (1.50 GiB) of data; 0 B pending): Read timed out
This is usually somehow related to the source (same error for multiple transfers), but it is not tight to one specific transfer source implementation (dCache, StoRM, XRootD). It seems to me that we have more these errors for the long-distance transfers (e.g. EU<->US), so links with higher latency / may be packet loss(?).
Would it be possible to improve this error message and signalize clearly what exactly failed? 0 B pending looks seems suspicious...
Also as was already written in th GGUS ticket, it is not completely clear to me:
- which side (source vs. destination) is responsible for “stuck transfer” at the end of dCache active party HTTP GET / why dCache waits when all data were transferred
- why socket timeout kicks in after “random” time (I would expect ~ 200s based on dCache GET_RETRY_DURATION code)
e.g. FTS transfer from AGLT2 to DESY-HH that produced following log
INFO Mon, 19 Jan 2026 01:26:39 +0100; Davix: Hop: https://dcache-atlas-webdav-wan.desy.de:2880/pnfs/desy.de/atlas/dq2/atlasdatadisk/rucio/mc20_13TeV/ec/e5/AOD.35308546._000142.pool.root.1
INFO Mon, 19 Jan 2026 01:26:40 +0100; Davix:
INFO Mon, 19 Jan 2026 01:26:40 +0100; Davix: > COPY /pnfs/desy.de/atlas/dq2/atlasdatadisk/rucio/mc20_13TeV/ec/e5/AOD.35308546._000142.pool.root.1 HTTP/1.1
INFO Mon, 19 Jan 2026 01:26:40 +0100; Davix: > Host: dcache-atlas-webdav-wan.desy.de:2880
INFO Mon, 19 Jan 2026 01:26:40 +0100; Davix: > Accept: */*
INFO Mon, 19 Jan 2026 01:26:40 +0100; Davix: > Source: https://webdav.aglt2.org:2880/atlasdatadisk/rucio/mc20_13TeV/ec/e5/AOD.35308546._000142.pool.root.1?<redacted>
INFO Mon, 19 Jan 2026 01:26:40 +0100; Davix: > X-Number-Of-Streams: 1
INFO Mon, 19 Jan 2026 01:26:40 +0100; Davix: > Secure-Redirection: 1
INFO Mon, 19 Jan 2026 01:26:40 +0100; Davix: > ClientInfo: job-id=807c5e22-f4cd-11f0-932c-fa163e52bca9;file-id=9270280509;retry=0
INFO Mon, 19 Jan 2026 01:26:40 +0100; Davix: > TransferHeaderAuthorization: XXXXX
INFO Mon, 19 Jan 2026 01:26:40 +0100; Davix: > Credential: none
INFO Mon, 19 Jan 2026 01:26:40 +0100; Davix: > Authorization: XXXXX
INFO Mon, 19 Jan 2026 01:26:40 +0100; Davix: > RequireChecksumVerification: false
INFO Mon, 19 Jan 2026 01:26:40 +0100; Davix: > User-Agent: libdavix/0.8.10 libcurl/7.76.1
INFO Mon, 19 Jan 2026 01:26:40 +0100; Davix:
INFO Mon, 19 Jan 2026 01:26:40 +0100; Davix: < HTTP/1.1 202 Accepted
INFO Mon, 19 Jan 2026 01:26:40 +0100; Davix: < Date: Mon, 19 Jan 2026 00:26:40 GMT
INFO Mon, 19 Jan 2026 01:26:40 +0100; Davix: < Server: dCache/10.2.10
INFO Mon, 19 Jan 2026 01:26:40 +0100; Davix: < Content-Type: text/perf-marker-stream
INFO Mon, 19 Jan 2026 01:26:40 +0100; Davix: < Transfer-Encoding: chunked
INFO Mon, 19 Jan 2026 01:26:50 +0100; [1768782410084] DEST http_plugin IPV6 TRUE
INFO Mon, 19 Jan 2026 01:26:50 +0100; Gfal2: Event triggered: DESTINATION http_plugin IPV6 TRUE
INFO Mon, 19 Jan 2026 01:26:50 +0100; bytes: 3473408, avg KiB/sec:0, inst KiB/sec:0, elapsed sec:0
INFO Mon, 19 Jan 2026 01:26:50 +0100; Davix: PerformanceMarker:
Perf Marker
Timestamp: 1768782405
State: Running
State description: Mover created
Stripe Index: 0
Stripe Start Time: 1768782400
Stripe Last Transferred: 1768782405
Stripe Transfer Time: 4
Stripe Bytes Transferred: 3473408
Stripe Status: RUNNING
Total Stripe Count: 1
RemoteConnections: tcp:[2001:48a8:68f7:8001:192:41:236:73]:2880
End
...
INFO Mon, 19 Jan 2026 02:06:05 +0100; bytes: 1611825152, avg KiB/sec:666.969, inst KiB/sec:73.6025, elapsed sec:2360
INFO Mon, 19 Jan 2026 02:06:05 +0100; Davix: PerformanceMarker:
Perf Marker
Timestamp: 1768784765
State: Running
State description: Mover created
Stripe Index: 0
Stripe Start Time: 1768782400
Stripe Last Transferred: 1768784761
Stripe Transfer Time: 2364
Stripe Bytes Transferred: 1611825152
Stripe Status: RUNNING
Total Stripe Count: 1
RemoteConnections: tcp:[2001:48a8:68f7:8001:192:41:236:73]:2880
End
INFO Mon, 19 Jan 2026 02:06:10 +0100; bytes: 1611825152, avg KiB/sec:665.559, inst KiB/sec:0, elapsed sec:2365
INFO Mon, 19 Jan 2026 02:06:10 +0100; Davix: PerformanceMarker:
Perf Marker
Timestamp: 1768784770
State: Running
State description: Mover created
Stripe Index: 0
Stripe Start Time: 1768782400
Stripe Last Transferred: 1768784761
Stripe Transfer Time: 2369
Stripe Bytes Transferred: 1611825152
Stripe Status: RUNNING
Total Stripe Count: 1
RemoteConnections: tcp:[2001:48a8:68f7:8001:192:41:236:73]:2880
End
...
INFO Mon, 19 Jan 2026 02:19:15 +0100; bytes: 1611825152, avg KiB/sec:499.697, inst KiB/sec:0, elapsed sec:3150
INFO Mon, 19 Jan 2026 02:19:15 +0100; Davix: PerformanceMarker:
Perf Marker
Timestamp: 1768785555
State: Running
State description: Mover created
Stripe Index: 0
Stripe Start Time: 1768782400
Stripe Last Transferred: 1768784761
Stripe Transfer Time: 3154
Stripe Bytes Transferred: 1611825152
Stripe Status: RUNNING
Total Stripe Count: 1
RemoteConnections:
End
INFO Mon, 19 Jan 2026 02:19:15 +0100; Davix: PerformanceMarker:
failure: socket timeout on GET (received 1611825152 B (1.50 GiB) of data; 0 B pending): Read timed out
As you can see all data were transferred after ~ 40 minutes and than dCache active party waited 16minutes before killing "stuck" transfer. During this transfer the throughput was fluctuating between 60kB - 2MB, full details in attached log