Spectrum

class ims.gcims.Spectrum(name, values, ret_time, drift_time, time, meta_attr)[source]

Bases: object

Represents one GCIMS-Spectrum with the data matrix, retention and drift time coordinates. Sample or file name and timestamp are included unique identifiers.

This class contains all methods that can be applied on a per spectrum basis, like I/O, plotting and some preprocessing tools. Methods that return a Spectrum change the instance inplace. Use the copy method.

Parameters:
  • name (str) – File or sample name as a unique identifier. Reader methods set this attribute to the file name without extension.

  • values (numpy.array) – Intensity matrix.

  • ret_time (numpy.array) – Retention time coordinate.

  • drift_time (numpy.array) – Drift time coordinate.

  • time (datetime object) – Timestamp when the spectrum was recorded.

Example

>>> import ims
>>> sample = ims.Spectrum.read_mea("sample.mea")
>>> sample.plot()
asymcorr(lam=10000000.0, p=0.001, niter=20)[source]

Retention time baseline correction using asymmetric least squares.

Parameters:
  • lam (float, optional) – Controls smoothness. Larger numbers return smoother curves, by default 1e7

  • p (float, optional) – Controls asymmetry, by default 1e-3

  • niter (int, optional) – Number of iterations during optimization, by default 20

Return type:

Spectrum

binning(n=2)[source]

Downsamples spectrum by binning the array with factor n. Similar to ims.Spectrum.resampling but works on both dimensions simultaneously. If the dimensions are not divisible by the binning factor shortens it by the remainder at the long end. Very effective data reduction because a factor n=2 already reduces the number of features to a quarter.

Parameters:

n (int, optional) – Binning factor, by default 2.

Returns:

Downsampled data matrix.

Return type:

Spectrum

Example

>>> import ims
>>> sample = ims.Spectrum.read_mea("sample.mea")
>>> print(sample.shape)
(4082, 3150)
>>> sample.binning(2)
>>> print(sample.shape)
(2041, 1575)
calc_reduced_mobility(T=318.15, p=1013.25, Ud=2132, L=5.3)[source]

Calculates the reduced mobility values for the drift times denoted in the peak table. The formula for the calculation of the reduced mobility values originates from Ahrens, A., Zimmermann, S. Towards a hand-held, fast, and sensitive gas chromatograph-ion mobility spectrometer for detecting volatile compounds. Anal Bioanal Chem 413, 1009–1016 (2021). https://doi.org/10.1007/s00216-020-03059-9

Parameters:
  • T (float, optional) – Temperature of the IMS cell, by default 318.15

  • p (float, optional) – pressure in the IMS cell, by default 1013.25

  • Ud (int, optional) – Drift voltage, by default 2132

  • L (float, optional) – Length of the drift tube, by default 5.3

Returns:

returns the original dataframe from the find_peaks method, but adds a column for the reduced mobility

Return type:

pandas.DataFrame

copy()[source]

Uses deepcopy from the copy module in the standard library. Most operations happen inplace. Use this method if you do not want to change the original variable.

Returns:

deepcopy of self.

Return type:

Spectrum

Example

>>> import ims
>>> sample = ims.Spectrum.read_mea("sample.mea")
>>> new_variable = sample.copy()
cut_dt(start, stop=None)[source]

Cuts data along drift time coordinate. Range in between start and stop is kept. If stop is not given uses the end of the array instead. Combination with RIP relative drift time values makes it easier to cut the RIP away and focus on the peak area.

Parameters:
  • start (int or float) – Start value on drift time coordinate.

  • stop (int or float, optional) – Stop value on drift time coordinate. If None uses the end of the array, by default None.

Returns:

New drift time range.

Return type:

Spectrum

Example

>>> import ims
>>> sample = ims.Spectrum.read_mea("sample.mea")
>>> print(sample.shape)
(4082, 3150)
>>> sample.riprel().cut_dt(1.05, 2)
>>> print(sample.shape)
(4082, 1005)
cut_rt(start, stop=None)[source]

Cuts data along retention time coordinate. Range in between start and stop is kept. If stop is not given uses the end of the array instead.

Parameters:
  • start (int or float) – Start value on retention time coordinate.

  • stop (int or float, optional) – Stop value on retention time coordinate. If None uses the end of the array, by default None.

Returns:

New retention time range.

Return type:

Spectrum

Example

>>> import ims
>>> sample = ims.Spectrum.read_mea("sample.mea")
>>> print(sample.shape)
(4082, 3150)
>>> sample.cut_rt(80, 500)
>>> print(sample.shape)
(2857, 3150)
detect_peaks(threshold_rel=0.5, peak_size=10)[source]

Fast peak detection using simple thresholding and connected components. Returns a labeled mask and a list of peak outlines. Make sure to cut out the RIP before using this method to avoid incorrect thresholding.

Parameters:
  • threshold_rel (float, default=0.5) – Relative threshold for peak detection. Decrease to be more sensitive, increase to to detect more intense peaks only.

  • peak_size (int, default=10) – Minimum pixel peak size (number of connected pixels) required for a region to be considered a peak. Peaks with fewer pixels than this threshold will be filtered out as noise.

Returns:

self – The spectrum with updated peaklist attribute containing peak information

Return type:

Spectrum

Example

>>> import ims
>>> sample = ims.Spectrum.read_mea("sample.mea")
>>> sample.riprel().cut_dt().cut_rt()
>>> sample.detect_peaks()
>>> sample.plot_thresholding()
export_plot(path=None, dpi=300, file_format='jpg', **kwargs)[source]

Saves the figure as image file. See the docs for matplotlib savefig function for supported file formats and kwargs (https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.savefig.html).

Parameters:
  • path (str, optional) – Directory to save the image, by default current working directory.

  • file_format (str, optional) – See matplotlib savefig docs for information about supported formats, by default ‘jpg’.

Example

>>> import ims
>>> sample = ims.Spectrum.read_mea("sample.mea")
>>> sample.export_plot()
find_peaks(limit=None, denoise='fastnl', window=30, verbose=0)[source]

Automated GC-IMS peak detection based on persistent homology.

Parameters:
  • spectrum (ims.Spectrum) – GC-IMS spectrum to use.

  • limit (float) – Values > limit are active search areas to detect regions of interest (ROI). If None limit is selected by the minimum persistence score, by default None.

  • denoise (string, (default : ‘fastnl’, None to disable)) –

    Filtering method to remove noise:
    • None

    • ‘fastnl’

    • ‘bilateral’

    • ‘lee’

    • ‘lee_enhanced’

    • ‘kuan’

    • ‘frost’

    • ‘median’

    • ‘mean’

  • window (int, (default : 30)) – Denoising window. Increasing the window size may removes noise better but may also removes details of image in certain denoising methods.

  • verbose (int (default : 3)) – Print to screen. 0: None, 1: Error, 2: Warning, 3: Info, 4: Debug, 5: Trace.

Returns:

Peak table with drift and retention times, the correspondig x and y indices, the maximum intensity of the peak, birth and death levels and scores.

Return type:

pandas.DataFrame

References

Taskesen, E. (2020). findpeaks is for the detection of peaks and valleys in a 1D vector and 2D array (image). (Version 2.3.1) [Computer software]. https://erdogant.github.io/findpeaks

normalization()[source]

Normalize a single spectrum by scaling its values to the range [0, 1].

Example

>>> import ims
>>> sample = ims.Spectrum.read_mea("sample.mea")
>>> sample.normalization()
Returns:

self – The spectrum with normalized values.

Return type:

Spectrum

plot(vmin=30, vmax=400, width=6, height=6)[source]

Plots Spectrum using matplotlibs imshow. Use %matplotlib widget in IPython or %matplotlib notebook in jupyter notebooks to get an interactive plot widget. Returning the figure is needed to make the export plot utilities possible.

Parameters:
  • vmin (int, optional) – Minimum of color range, by default 30.

  • vmax (int, optional) – Maximum of color range, by default 300.

  • width (int, optional) – Width in inches, by default 9.

  • height (int, optional) – Height in inches, by default 10.

Returns:

(matplotlib.figure.Figure, matplotlib.pyplot.axes)

Return type:

tuple

Example

>>> import ims
>>> sample = ims.Spectrum.read_mea("sample.mea")
>>> fig, ax = sample.plot()
plot_peaks()[source]

Plots GC-IMS spectrum with peak labels from findpeaks method.

Return type:

matplotlib.pyplot.axes

plot_persistence()[source]

Persistance plot of birth vs death levels from findpeak method.

Return type:

matplotlib.pyplot.axes

plot_thresholding(outline_color='yellow', linewidth=2, annotate=False, **kwargs)[source]

Plots GC-IMS spectrum with peak outlines from detect_peaks method. Must be called after detect_peaks method.

Parameters:
  • outline_color (str, default=”yellow”) – Color for the peak outlines

  • linewidth (float, default=2) – Width of the outline lines

  • annotate (bool, default=False) – If True, display peak labels on the plot

  • **kwargs – Additional keyword arguments passed to the plot() method: - vmin : int, default=30

    Minimum intensity for the colormap

    • vmax : int, default=400 Maximum intensity for the colormap

    • width : int, default=6 Figure width in inches

    • height : int, default=6 Figure height in inches

Returns:

The axes object with the plotted spectrum and peak outlines

Return type:

matplotlib.pyplot.axes

Example

>>> sample.detect_peaks()
>>> sample.plot_thresholding(annotate=True)
>>> plt.show()
classmethod read_csv(path)[source]

Reads generic csv files. The first row must be the drift time values and the first column must be the retention time values. Values inbetween are the intensity matrix. Uses the time when the file was created as timestamp.

Parameters:

path (str) – Absolute or relative file path.

Return type:

Spectrum

Example

>>> import ims
>>> sample = ims.Spectrum.read_csv("sample.csv")
>>> print(sample)
GC-IMS Spectrum: sample
classmethod read_hdf5(path)[source]

Reads hdf5 files exported by the to_hdf5 method. Convenient way to store preprocessed spectra. Especially useful for larger datasets as preprocessing requires more time. Preferred to csv because of very fast read and write speeds.

Parameters:

path (str) – Absolute or relative file path.

Return type:

Spectrum

Example

>>> import ims
>>> sample = ims.Spectrum.read_mea("sample.mea")
>>> sample.to_hdf5()
>>> sample = ims.Spectrum.read_hdf5("sample.hdf5")
classmethod read_mea(path)[source]

Reads mea files from G.A.S Dortmund instruments. Alternative constructor for ims.Spectrum class. Much faster than reading csv files and therefore preferred.

Parameters:

path (str) – Absolute or relative file path.

Return type:

Spectrum

Example

>>> import ims
>>> sample = ims.Spectrum.read_mea("sample.mea")
>>> print(sample)
GC-IMS Spectrum: sample
classmethod read_zip(path)[source]

Reads zipped csv and json files from G.A.S Dortmund mea2zip converting tool. Present for backwards compatibility. Reading mea files is much faster and saves the manual extra step of converting.

Parameters:

path (str) – Absolute or relative file path.

Return type:

Spectrum

Example

>>> import ims
>>> sample = ims.Spectrum.read_zip("sample.mea")
>>> print(sample)
GC-IMS Spectrum: sample
resample(n=2)[source]

Resamples spectrum by calculating means of every n rows. If the length of the retention time is not divisible by n it and the data matrix get cropped by the remainder at the long end.

Parameters:

n (int, optional) – Number of rows to mean, by default 2.

Returns:

Resampled values.

Return type:

Spectrum

Example

>>> import ims
>>> sample = ims.Spectrum.read_mea("sample.mea")
>>> print(sample.shape)
(4082, 3150)
>>> sample.resample(2)
>>> print(sample.shape)
(2041, 3150)
rip_scaling()[source]

Scales values relative to global maximum. Can be useful to directly compare spectra from instruments with different sensitivity.

Returns:

With scaled values.

Return type:

Spectrum

riprel()[source]

Replaces drift time coordinate with RIP relative values. Useful to cut away the RIP because it´s position is set to 1.

Does not interpolate the data matrix to a completly artificial axis like ims.Dataset.interp_riprel.

Returns:

RIP relative drift time coordinate otherwise unchanged.

Return type:

Spectrum

savgol(window_length=10, polyorder=2, direction='both')[source]

Applys a Savitzky-Golay filter to intensity values. Can be applied in the drift time, retention time or both directions.

Parameters:
  • window_length (int, optional) – The length of the filter window, by default 10

  • polyorder (int, optional) – The order of the polynomial used to fit the samples, by default 2

  • direction (str, optional) – The direction in which to apply the filter. Can be ‘drift time’, ‘retention time’ or ‘both’. By default ‘both’

Return type:

Spectrum

property shape

Shape property of the data matrix. Equivalent to ims.Spectrum.values.shape.

sub_first_rows(n=1)[source]

Subtracts first n rows from every row in spectrum. Effective and simple baseline correction if RIP tailing is a concern but can hide small peaks.

Return type:

Spectrum

to_hdf5(path=None)[source]

Exports spectrum as hdf5 file. Useful to save preprocessed spectra, especially for larger datasets. Preferred to csv format because of very fast read and write speeds.

Parameters:

path (str, optional) – Directory to export files to, by default the current working directory.

Example

>>> import ims
>>> sample = ims.Spectrum.read_mea("sample.mea")
>>> sample.to_hdf5()
>>> sample = ims.Spectrum.read_hdf5("sample.hdf5")
tophat(size=15)[source]

Applies white tophat filter on data matrix as a baseline correction. Size parameter is the diameter of the circular structuring element. (Slow with large size values.)

Parameters:

size (int, optional) – Size of structuring element, by default 15

Return type:

Spectrum

watershed_segmentation(threshold)[source]

Finds boundaries for overlapping peaks using watershed segmentation. Requires peak_table for starting coordinates.

Parameters:

threshold (int) – Threshold is used to binarize the intensity values to calculate the distances.

Returns:

Labels array with same shape as intensity values.

Return type:

numpy.ndarray

wavecompr(direction='ret_time', wavelet='db3', level=3)[source]

Data reduction by wavelet compression. Can be applied to drift time, retention time or both axis.

Parameters:
  • direction (str, optional) – The direction in which to apply the filter. Can be ‘drift time’, ‘retention time’ or ‘both’. By default ‘ret_time’.

  • wavelet (str, optional) – Wavelet object or name string, by default “db3”.

  • level (int, optional) – Decomposition level (must be >= 0), by default 3.

Return type:

Spectrum

Raises:

ValueError – When direction is neither ‘ret_time’, ‘drift_time’ or ‘both’.