Data structure to represent summarized cell-level data for one slide

For memory- and time-efficient manipulation, a simple binary data structure is used to represent the cell data for one slide.

The memory map is as follows:

Byte range start	Byte range end	Number of bytes	Section	Description	Data type
1	4	4	Header	The number of cells represented by this serialization.	32-bit integer, big-endian byte order
5	8	4		Minimum first pixel coordinate occurring in this file.	32-bit integer, big-endian byte order
9	12	4		Maximum first pixel coordinate occurring in this file.	32-bit integer, big-endian byte order
13	16	4		Minimum second pixel coordinate occurring in this file.	32-bit integer, big-endian byte order
17	20	4		Maximum second pixel coordinate occurring in this file.	32-bit integer, big-endian byte order
21	24	4	Cell 1	Cell 1 index integer.	32-bit integer, big-endian byte order
25	28	4		Cell 1 location's first pixel coordinate integer.	32-bit integer, big-endian byte order
29	32	4		Cell 1 location's second pixel coordinate integer.	32-bit integer, big-endian byte order
33	40	8		Cell 1 phenotype membership bit-mask, up to 64 channels.	64-bit mask
41	44	4	Cell 2	Cell 2 location's first pixel coordinate integer.	32-bit integer, big-endian byte order
...	...	...	...	...

The ellipsis represents repetition of the per-cell section once for each cell. This is 4 + 4 + 4 + 8 = 20 bytes per cell. The "header" preceding the per-cell sections is 20 bytes.

A representation of an example of the cell sections can be found here.

There is a convenient way to preview the contents at the command line using xxd:

tail -c +21 payload.bin | xxd -b -c 20

(The initial tail command strips out the header.)

Whole-study subsample aggregate

A random subsample from the cells in a given study is provided to reduce overall size for whole-study analysis purposes. The file format for this consists of two sections, delimited by the ASCII file separator character (decimal 28):

JSON metadata section
binary section with intensity data for subsampled cells

The JSON metadata section is structured as in the following example (exact model is here):

{
  "subsample_counts": [
    {
      "specimen": "ABC_001",
      "count": 2500,
      "thresholds": [5, 240, ...]
    },
    {
      "specimen": "ABC_002",
      "count": 2000,
      "thresholds": [50, 100, ...]
    },
    ...
  ],
  "channel_order": [
    "CD3",
    "CD4",
    ...
  ]
}

The integers above use the custom 8-bit float format.

The binary section is as follows:

Byte range start	Byte range end	Number of bytes	Description	Data types
1	N	N	The intensity values for each of the N channels, for first cell.	Custom 8-bit float format
N+1	2N	N	The intensity values for each of the N channels, for second cell.	Custom 8-bit float format
...	...	...	...	...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data structure to represent summarized cell-level data for one slide

Whole-study subsample aggregate

FilesExpand file tree

cells.md

Latest commit

History

cells.md

File metadata and controls

Data structure to represent summarized cell-level data for one slide

Whole-study subsample aggregate