Skip to content

feat(telemetry): add :telemetry.span for handle_failed/2 callback#369

Merged
josevalim merged 1 commit intodashbitco:mainfrom
yordis:yordis/handle-failed-telemetry-span
Apr 4, 2026
Merged

feat(telemetry): add :telemetry.span for handle_failed/2 callback#369
josevalim merged 1 commit intodashbitco:mainfrom
yordis:yordis/handle-failed-telemetry-span

Conversation

@yordis
Copy link
Copy Markdown
Contributor

@yordis yordis commented Apr 2, 2026

Add [:broadway, :handle_failed, :start | :stop | :exception] telemetry span events for the handle_failed/2 callback. It will help with the o11y around handling the failure case, we can hook metrics, and OTEL spans as well

@yordis yordis force-pushed the yordis/handle-failed-telemetry-span branch 3 times, most recently from b22bfe7 to 61375f6 Compare April 2, 2026 20:24
@yordis yordis marked this pull request as draft April 2, 2026 20:24
@yordis yordis force-pushed the yordis/handle-failed-telemetry-span branch 5 times, most recently from f76f8e5 to c22a417 Compare April 2, 2026 20:28
@yordis yordis marked this pull request as ready for review April 2, 2026 20:29
@yordis yordis force-pushed the yordis/handle-failed-telemetry-span branch from c22a417 to a989ad9 Compare April 2, 2026 20:30
Comment thread lib/broadway/acknowledger.ex Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This try-catch most likely should be outside of the telemetry, otherwise you are hiding failures. We should probably have a test for that.

Comment thread lib/broadway/acknowledger.ex Outdated
fn ->
result = do_handle_failed_messages(messages, module, context)
{result, Map.put(metadata, :messages, result)}
end
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please investigate if the handle_failed is not included in the other telemetries. It may be. In this case, a regular event may be better than a span.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josevalim
Copy link
Copy Markdown
Member

Thank you. Comments inline. Keep in mind you can implement handle_failed today in your own callbacks.

@yordis
Copy link
Copy Markdown
Contributor Author

yordis commented Apr 2, 2026

@josevalim I am aware of the own callback; I am doing that today. The reason I added it to Broadway is that it is the recurrent situation of creating macros and other things for stuff like this. After a while, I feel that the pattern holds water, primarily so that most engineers do not need to worry as much about doing the right thing.

Should I continue working on it, or do you prefer I continue copy+pasting? I do not have strong preferences, just this time decided to open a PR to at least bring it upstream. I have been working on OTEL and o11y, my goal is to have very good insights from the getgo.

@josevalim
Copy link
Copy Markdown
Member

I think we can merge it if the concerns above are addressed, but I also need to catch up with the codebase and see if there are any inconsistencies by doing this change.

@yordis
Copy link
Copy Markdown
Contributor Author

yordis commented Apr 2, 2026

@josevalim I have another PR I want to make, same vane, OTEL stuff to be able to instrument OTEL a bit better, I will keep going 😄

Addressing your concerns at the moment

@yordis yordis force-pushed the yordis/handle-failed-telemetry-span branch from a989ad9 to a2364af Compare April 2, 2026 22:14
Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>
@yordis yordis force-pushed the yordis/handle-failed-telemetry-span branch from a2364af to 09cf1de Compare April 2, 2026 22:15
@yordis yordis requested a review from josevalim April 2, 2026 22:15
@yordis
Copy link
Copy Markdown
Contributor Author

yordis commented Apr 2, 2026

@josevalim sorry for the diff-ing 😭 if you have a better way, let me know. I tried to add enough tests to cover the situation

@josevalim
Copy link
Copy Markdown
Member

If you have another PR, then please let me know or submit a proposal. I'd rather understand all the telemetry events that need to be added, so I can take a holistic view, rather than adding incompatible ones one by one.

@yordis
Copy link
Copy Markdown
Contributor Author

yordis commented Apr 3, 2026

main...yordis:broadway:yordis/add-producer-to-telemetry

@josevalim they are unrelated, the other one is simply adding the producer to existing one, no new spans or anything.

@josevalim josevalim merged commit 966c59a into dashbitco:main Apr 4, 2026
2 checks passed
@josevalim
Copy link
Copy Markdown
Member

💚 💙 💜 💛 ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants