PCA_Model

class ims.pca.PCA_Model(dataset, n_components=None, svd_solver='auto', **kwargs)[source]

Bases: object

PCA_Model is a wrapper class around the scikit-learn PCA implementation and provides prebuilt plots for GC-IMS datasets.

See the original scikit-learn documentation for a detailed description: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html

Parameters
  • dataset (ims.Dataset) – The dataset is needed for the retention and drift time coordinates.

  • n_components (int or float, optional) – Number of components to keep. If None all components are kept, by default None.

  • svd_solver (str, optional) – “auto”, “full”, “arpack” or “randomised” are valid, by default “auto”.

  • **kwargs (optional) – Additional key word arguments are passed to the scikit-learn PCA. See the original documentation for valid parameters.

scores

X with dimension reduction applied.

Type

numpy.ndarray of shape (n_samples, n_features)

loadings

PCA loadings already corrected when a scaling method was applied on the dataset.

Type

numpy.ndarray of shape (n_components, n_features)

explainded_variance

The amount of variance explained by each component.

Type

numpy.ndarray of shape (n_components,)

explained_variance_ratio

Percentage of variance explained by each component.

Type

numpy.nd_array of shape (n_components,)

singular_values

The singular values corresponding to each component.

Type

numpy.ndarray of shape (n_components,)

mean

Per feature mean estimated from training data.

Type

numpy.ndarray of shape (n_features,)

Example

>>> import ims
>>> ds = ims.Dataset.read_mea("IMS_data")
>>> X, _ = ds.get_xy()
>>> pca = ims.PCA_Model(ds, n_components=20)
>>> pca.fit(X)
>>> pca.plot()
Tsq_Q_plot(annotate=False)[source]

Plots T square values and Q residuals with 95 % confidence limits for outlier detection. Q confidence limit is determined empirically, the T square limit is calculated using the f distribution.

Parameters

annotate (bool, optional) – Annotates markers with sample names when True, by default False.

Return type

matplotlib.pyplot.axes

fit(X_train)[source]

Fit the PCA model with training data.

Parameters

X_train (numpy.ndarray of shape (n_samples, n_features)) – The training data.

Returns

The fitted model.

Return type

self

plot(PC_x=1, PC_y=2, annotate=False)[source]

Scatter plot of selected principal components.

Parameters
  • PC_x (int, optional) – PC x axis, by default 1.

  • PC_y (int, optional) – PC y axis, by default 2.

  • annotate (bool, optional) – label data points with sample name, by default False.

Return type

matplotlib.pyplot.axes

plot_loadings(PC=1, color_range=0.1, width=6, height=6)[source]

Plots loadings of a principle component with the original retention and drift time coordinates.

Parameters
  • PC (int, optional) – principal component, by default 1.

  • color_range (int, optional) – color_scale ranges from - color_range to + color_range centered at 0.

  • width (int or float, optional) – plot width in inches, by default 9.

  • height (int or float, optional) – plot height in inches, by default 10.

Return type

matplotlib.pyplot.axes

scree_plot()[source]

Plots the explained variance ratio per principal component and cumulatively.

Return type

matplotlib.pyplot.axes