Skip to content

Optimize message write path and DB footprint #5542

@chibenwa

Description

@chibenwa

Why ?

  • We needlessly store information that is either in the header or easy to opbtain in Messagev3 properties. This is taking space on Cassandra...

Tables (not tiered, with per message entries) for 66 million emails (3 nodes RF=3) :

  • messagev3 table 17 GB
  • imapuidtable table 10GB
  • messageidtable table 7 GB
  • email_query_view_received_at table 2 GB
  • firstunseen table 287 MB
  • thread_lookup_3 6GB

We see a footprint of ~2-3KB (replicated, tiered) per message.

We can expect a 33% reduction of messagev3 size by removing the content description and properties field. Translating to a 10-13% space saving. At scale for 10 billion messages this means 20TB -> 18TB... Sad for something that is useful only for IMAP FETCH BODYSTRUCTURE and could be easily recomputed.

  • We count line with unoptimized input stream for each message with content type text/* reading byte per byte (PERF KILLER!) while it is useful only upon IMAP FETCH BODYSTRUCTURE - we'd rather move it at read time.

  • At last MessageStorer calls parsing for each and every message. We could easily cary other (after removing PropertyBuilder) the content type and trigger this expensive parsing IF and only IF content type is multipart/* or content-disposition is attachment in main headers, saving CPU on the write path.

How ?

Remove propertyBuider from Message POJOs.

IMAP FETCH BODYSTRUCTURE operates on full content: we can easily recompute this in MessageResult POJO when (and only when) needed.

Take care to still carry other contentType and ContentDescription for the unrelated but connex and interesting MessageStorer optimization.

Expected gains

Significant CPU gains for text/* message APPEND / reception

~ 10% data reduction on Cassandra

Metadata

Metadata

Assignees

Labels

perfContributes some performance enhencements

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions