From: Basecamp CLI team
To: BC3 Rails development
Re: Surfacing <bc-attachment> metadata as structured data in API responses
Context: CLI PR #326 — basecamp attachments download
The CLI needs to let agents and humans download inline file attachments
(images, PDFs, etc.) embedded in messages, todos, cards, and documents.
Today the only way to discover these attachments is to parse the HTML body
and regex out <bc-attachment> elements:
<bc-attachment sgid="BAh7CEk"
content-type="application/pdf"
href="https://storage.3.basecamp.com/123/blobs/abc/download/report.pdf"
filename="report.pdf"
filesize="12345">
</bc-attachment>This works, but it's fragile client-side work that every API consumer has to replicate independently — and it depends on Trix/Basecamp internal HTML conventions that aren't part of the documented API contract.
The CLI's richtext.ExtractAttachments(html) function:
- Regex-scans for
<bc-attachment ...>opening tags - Skips mentions (
content-type="application/vnd.basecamp.mention") - Skips tags without
href(not downloadable) - Extracts
href,filename,filesize,content-type,sgid
This produces an []InlineAttachment array that the CLI surfaces as an
inline_attachments field in show command responses and uses as the
download manifest for basecamp attachments download.
Content field ambiguity. The rich-text body lives in different fields depending on recording type:
| Type | Plain-text field | Rich HTML field |
|---|---|---|
| Todo | content |
description |
| Message | subject |
content |
| Card | — | content |
| Document | title |
content |
The CLI has to sniff both content and description, check which one
contains HTML, and pick the right one. This is the kind of thing that
breaks silently.
Storage URL opacity. The href values in <bc-attachment> tags are
storage URLs (https://storage.3.basecamp.com/...). The SDK rewrites
these through the API host for auth, then follows a redirect to a signed
S3 URL. This two-hop dance works, but the URLs aren't guaranteed stable —
they're an implementation detail of how Trix stores blob references.
No discoverability. An API consumer can't tell whether a recording has inline attachments without fetching and parsing the full HTML body. For agents that want to decide whether to download images for multimodal analysis, this is a wasted round-trip.
Add an inline_attachments array to recording responses that contain rich
text. Return it alongside the existing content/description fields.
{
"id": 789,
"type": "Message",
"subject": "Q4 Report",
"content": "<p>See attached: <bc-attachment ...>report.pdf</bc-attachment></p>",
"inline_attachments": [
{
"sgid": "BAh7CEk",
"filename": "report.pdf",
"content_type": "application/pdf",
"byte_size": 12345,
"download_url": "https://3.basecampapi.com/123/blobs/abc/download/report.pdf"
}
]
}Fields:
sgid— the signed global ID (already in the HTML, used for ActionText references)filename— original upload filenamecontent_type— MIME typebyte_size— integer, not string (the HTMLfilesizeattribute is a string today)download_url— a stable API-routable URL that the client can GET with auth headers, rather than a raw storage URL that requires rewriting. Ideally the same URL shape thatUpload#download_urlalready returns.
Only file attachments. Mentions (application/vnd.basecamp.mention) are
excluded. Tags without a downloadable blob reference are excluded.
Any type whose API response includes a rich-text HTML body:
Message(field:content)Todo(field:description)Kanban::Card(field:content)Document(field:content)Comment(field:content)Question::Answer(field:content)
The field would appear only when attachments are present (empty array or omitted when none).
- CLI: Drop the regex parser, use structured data, remove the content-vs-description sniffing
- SDK: Add
InlineAttachmentsfield to recording structs - Agents: Discover attachments from list/show responses without parsing HTML — enables "does this message have images I should look at?" decisions
- Third-party integrations: Any API consumer that wants to mirror or process attachments gets a stable contract instead of HTML scraping
Additive. The HTML body continues to contain <bc-attachment> elements
as before. The new field is additional structured metadata derived from
the same source. No breaking changes.
If this ships, the CLI can:
- Check for
inline_attachmentsin the API response - Fall back to
richtext.ExtractAttachments(html)when absent (older API versions, or types not yet covered) - Eventually remove the regex path once the API field is universal
Question for BC3: Is this something that could be derived at the
serializer level (walking the ActionText body's <bc-attachment> nodes
and emitting structured metadata), or does it need deeper plumbing through
the blob/attachment infrastructure?