REIMS Datasets¶
This framework utilizes a series of datasets derived from Rapid Evaporative Ionization Mass Spectrometry (REIMS) analysis of seafood samples.
Note
The source code of this project is open-source (MIT), but the dataset itself is private research data. Authorized users can download the data using the 3. Download the Private Dataset: command.
Data Source and Acquisition¶
The REIMS data were provided by AgResearch, New Zealand, as part of research into quality assurance systems for marine biomass processing.
The analysis utilized a Laser-Assisted REIMS setup, coupling a CO2 laser interface to a Xevo G2 XS quadrupole time-of-flight mass spectrometer (Waters Ltd). Samples were received frozen and analyzed within 10 minutes of removal to prevent lipid oxidation.
Mode: Negative ionization (MS1 only).
Scan Range: m/z 50–1200.
Resolution: 2,080 distinct m/z features.
Data Curation and Preprocessing¶
Data processing included baseline removal and lock mass correction (using oleic acid, C18:1). Each spectrum comprises 2,080 distinct m/z features, spanning a range from approximately 77.04 m/z to 999.32 m/z.
Preprocessing includes: 1. TIC Normalization: Accounting for sample-to-sample variations in ionization efficiency. 2. Min-Max Scaling: Normalizing feature intensities to the range [0, 1].
Analytical Tasks¶
The curated data is split into five distinct datasets tailored for specific analytical tasks:
1. Fish Species Identification¶
Goal: Distinguish between Hoki and Mackerel.
Samples: 106.
Significance: Fraud prevention and food authentication.
2. Body Part Identification¶
Goal: Identify 7 fish parts (Fillets, Heads, Livers, Skins, Gonads, Guts, Frames).
Samples: 33.
Significance: Process automation and biomass value maximization.
3. Oil Contamination Detection¶
Goal: Predict oil concentration at 7 levels (0% to 50%).
Samples: 126.
Significance: Equipment safety and lubricant detection.
4. Cross-species Adulteration¶
Goal: Detect if premium species (Hoki) have been diluted with cheaper ones (Mackerel).
Samples: 144.
Significance: Economic fraud detection.
5. Batch Detection (Pairwise)¶
Goal: Identify if two samples originate from the same processing batch.
Format: 2,556 pairwise comparisons.
Significance: Traceability and food safety.
Summary Table¶
Dataset |
Examples |
Features |
Class Labels |
Split Type |
|---|---|---|---|---|
Fish Species |
106 |
2,080 |
Hoki, Mackerel |
5-Fold CV |
Fish Body Part |
33 |
2,080 |
7 Categories |
3-Fold CV |
Oil Contamination |
126 |
2,080 |
7 Levels |
5-Fold CV |
Cross-species |
144 |
2,080 |
Pure/Mixed |
5-Fold CV |
Batch Detection |
2,556 |
2,080 |
Same/Different |
60/20/20 Fixed |