Add relay snapshot test scenario with low bandwidth on data socket#149
Add relay snapshot test scenario with low bandwidth on data socket#149frdeso wants to merge 3 commits intolttng:masterfrom
Conversation
PSRCode
left a comment
There was a problem hiding this comment.
Minor nitpick here and there.
As for the pull request description and commit:
"This test highlights a race in the relay snapshot mode. This testcase triggers a
race where the trace is unreadable because tracing is stopped eventhough
data is still in flight."
If I remember the problem correctly the main problem is the lack of "synchronization" (data pending phase) at the end of a snapshot record command. The client (lttng-cli) get a response that the snapshot was recorded correctly even if data is still "in-flight" toward the relayd. This results with inconsistent trace (snapshot) on the relayd side if the snapshot is read before all data made its way to the relayd.
At no point tracing must be "stopped" for the problem to occur.
Do you agree?
Setting low bandwidth on the data port only accentuate the problem.
The big question is: what is the guaranteed regarding trace validity offered by the return of lttng-snapshot-record command?
Prepare for addition of new test Signed-off-by: Francis Deslauriers <[email protected]>
Signed-off-by: Francis Deslauriers <[email protected]>
3aae351 to
9f1f618
Compare
|
@PSRCode Are you okay with the first two commits (cleanup)? I could merge those and wait for a fix to merge the new test. |
This commit adds a testcase that simulates a snapshot on a relayd with a data socket with very low bandwidth. This configuration can trigger a race where the trace is unreadable because the snapshot is reported as completed even though data is still in flight. As of right now, this testcase fails because the trace is unreadable. Babeltrace outputs the following error: [error] Packet size (4194304 bits) is larger than remaining file size (175104 bits) in trace with UUID "d9e6182e6469405094f839a08f438c3b", at path: "/tmp/tmp.sY3M2G54Oy/raton/snapshot-1-20181203-100618-0/ust/uid/0/64-bit", within stream id 0, at relative path: "chan1_1". Signed-off-by: Francis Deslauriers <[email protected]>
9f1f618 to
1bf4cb5
Compare
|
I updated the commit message but we will need to update it again when we come up with a fix. |
This test highlights a race in the relay snapshot mode. This testcase triggers a
race where the trace is unreadable because tracing is stopped eventhough
data is still in flight.
This PR only provides a testcase/reproducer and does not provide a fix.
This PR also includes some cleanups of the snapshot testcases.