Spectr’s Main Index File¶
The index file used to rapidly find relevant scans and the location of specific scans in the binary scan data file. This file is, itself, binary, to save space. All data necessary to regenerate this index file are present in the data file.
Purpose¶
Note that the spectr binary data formats are not meant to be used as a generalized format for representing mass spectrometry data. It is designed specifically to serve as a backend storage format for use by spectr, and all access to the data should be done via the web services provided by spectr. The documentation of these formats are purely informational.
File Name / AWS S3 Object Name¶
Spectr may be installed on a server with locally attached storage or use Amazon AWS S3 for storage.
In either case, the filename or object name for the index file is the SHA-384 has of the uploaded file plus “.index”. E.g., 98c11ffdfdd540676b1a137cb1a22b2a70350c9a44171d6b1180c6be5cbb2ee3f79d532c8a1dd9ef2e8e08e752a3babb.index.
Since this is the same hash string used to query the data, spectr can rapidly find the required file in the configured storage directory or S3
bucket.
Endianness¶
All shorts, integers, and longs are written high byte first (big-endian). All floats and doubles are represented as ints or longs
according to IEEE 754 floating-point “double format” bit layout, and written as ints and longs. See writeByte,
writeShort, writeInt, writeLong, writeFloat, writeDouble at https://docs.oracle.com/javase/8/docs/api/java/io/DataOutputStream.html for more information.
File Version¶
The latest version is 5. All newly created files will be that version. There were versions 3 and 4. An existing installation of Spectral Storage Service may have files with those versions.
File Format¶
File Header¶
This section appears once at the beginning of the file and contains information describing this file.
Header sections:
Name
Data Type
Bytes
Description
Version
short
2
The file format version for this file. Version 5.
Full write indicator
byte
1
Indicates if the binary file is fully written: 0 = no, 1 = yes, 2 = undefined (always 2 in S3).
All Scans: Total Ion Current Computed
byte
1
Whether the Total Ion Current per scan is computed by the Spectral Storage Service Importer: 0 = no, 1 = yes.
All Scans: Ion injection time NOT populated
byte
1
Indicates if the Ion Injection Time per scan is missing: 0 = no, 1 = yes.
Count of scan levels
byte
1
The number of scan levels. For example, 2 for ms1 and ms2.
- Then for each scan level:
Name
Data Type
Bytes
Description
Scan level
byte
1
The scan level. E.g., 1 for ms1 or 2 for ms2.
Number of scans
integer
4
Number of scans for this scan level.
Centroided?
byte
1
Designation for centroidedness at this scan level: 0 = only no, 1 = only yes, and 2 = mixed.
Ion injection time set?
byte
1
All scans at this scan level have Ion Injection Time populated: 0 = no, 1 = yes, 2 = some no, some yes.
Total ion current
double
8
- Total ion current for this scan level from one of the following:
Sum of Total Ion Current field for all scans with this scan level (Total Ion Current Computed == 0)
Same as Total ion current sum of scan peaks (next field) (Total Ion Current Computed == 1)
Total ion current sum of scan peaks
double
8
Sum of intensity of all peaks for all scans with this scan level.
Then continuing:
Name
Data Type
Bytes
Description
Scan numbers sequential
byte
1
Scan numbers are sequential (1,2,3,…,n-1,n). 0 = no, 1 = yes
Ret. time sorted?
byte
1
0 if not sorted by retention time. 1 if sorted by retention time.
Scan count
integer
4
Total number of scans in file
Total scan data size
long
8
Total size of data file, excluding header.
First scan number
integer
4
First scan number in file
First scan location
long
8
Byte location in data file of first scan
Scan number offset type
byte
1
- The data type used to store the offset between scan numbers below:
1 = byte
2 = short
3 = integer
8 = none. There is no offset stored. Assumed that offset between scan numbers is 1.
Scan size type
byte
1
- The data type used to store the scan size below:
1 = byte
2 = short
3 = integer
Then for each scan:
Name
Data Type
Bytes
Description
Scan size
See above
The number of bytes for this scan in the data file (including header).
Scan number offset
See above
Offset from previous scan number (ie: scan number - previous scan number). Not present in type above is 8, which assumes all offsets are 1
Scan level
byte
1
The scan level. E.g., 1 for ms1 or 2 for ms2.
Retention time
float
4
Retention time for this scan.