Spectr’s Main Index File

The index file used to rapidly find relevant scans and the location of specific scans in the binary scan data file. This file is, itself, binary, to save space. All data necessary to regenerate this index file are present in the data file.

Purpose

Note that the spectr binary data formats are not meant to be used as a generalized format for representing mass spectrometry data. It is designed specifically to serve as a backend storage format for use by spectr, and all access to the data should be done via the web services provided by spectr. The documentation of these formats are purely informational.

File Name / AWS S3 Object Name

Spectr may be installed on a server with locally attached storage or use Amazon AWS S3 for storage. In either case, the filename or object name for the index file is the SHA-384 has of the uploaded file plus “.index”. E.g., 98c11ffdfdd540676b1a137cb1a22b2a70350c9a44171d6b1180c6be5cbb2ee3f79d532c8a1dd9ef2e8e08e752a3babb.index. Since this is the same hash string used to query the data, spectr can rapidly find the required file in the configured storage directory or S3 bucket.

Endianness

All shorts, integers, and longs are written high byte first (big-endian). All floats and doubles are represented as ints or longs according to IEEE 754 floating-point “double format” bit layout, and written as ints and longs. See writeByte, writeShort, writeInt, writeLong, writeFloat, writeDouble at https://docs.oracle.com/javase/8/docs/api/java/io/DataOutputStream.html for more information.

File Header

This section appears once at the beginning of the file and contains information describing this file.

Header sections:

Name

Data Type

Bytes

Description

Version

short

2

The file format version for this file. Currently, there is only one version (3).

Full write indicator

byte

1

Is the binary file fully written? 0 = no, 1 = yes, 2 = undefined (always 2 in S3)

Centroided?

byte

1

A whole-file designation for centroidedness. 0 = only no, 1 = only yes, and 2 = mixed.

Count of scan levels

byte

1

Number of scan levels. E.g., 2 for ms1 and ms2.

Then for each scan level:

Name

Data Type

Bytes

Description

Scan level

byte

1

The scan level. E.g., 1 for ms1 or 2 for ms2.

Number of scans

integer

4

Number of scan for this scan level

Total ion current

double

8

Total ion current for this scan level (sum of intensity of all peaks)

Then continuing:

Name

Data Type

Bytes

Description

Scan number sorted?

byte

1

0 if not sorted by scan number. 1 if sorted by scan number.

Ret. time sorted?

byte

1

0 if not sorted by retention time. 1 if sorted by retention time.

Scan count

integer

4

Total number of scans in file

Total scan data size

long

8

Total size of data file, excluding header.

First scan number

integer

4

First scan number in file

First scan location

long

8

Byte location in data file of first scan

Scan number offset type

byte

1

The data type used to store the offset between scan numbers below:
  • 1 = byte

  • 2 = short

  • 3 = integer

  • 8 = none. There is no offset stored. Assumed that offset between scans is 1.

Scan size type

byte

1

The data type used to store the scan size below:
  • 1 = byte

  • 2 = short

  • 3 = integer

Then for each scan:

Name

Data Type

Bytes

Description

Scan size

See above

The number of bytes for this scan in the data file (including header).

Scan number offset

See above

Offset from previous scan number (ie: scan number - previous scan number). Not present in type above is 8, which assumes all offsets are 1

Scan level

byte

1

The scan level. E.g., 1 for ms1 or 2 for ms2.

Retention time

float

4

Retention time for this scan.