PCA_Model¶

class ims.pca.PCA_Model(dataset, n_components=None, svd_solver='auto', **kwargs)[source]¶

Bases: object

PCA_Model is a wrapper class around the scikit-learn PCA implementation and provides prebuilt plots for GC-IMS datasets.

See the original scikit-learn documentation for a detailed description: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html

Parameters

dataset (ims.Dataset) – The dataset is needed for the retention and drift time coordinates.
n_components (int or float, optional) – Number of components to keep. If None all components are kept, by default None.
svd_solver (str, optional) – “auto”, “full”, “arpack” or “randomised” are valid, by default “auto”.
**kwargs (optional) – Additional key word arguments are passed to the scikit-learn PCA. See the original documentation for valid parameters.

scores¶

X with dimension reduction applied.

Type: numpy.ndarray of shape (n_samples, n_features)

loadings¶

PCA loadings already corrected when a scaling method was applied on the dataset.

Type: numpy.ndarray of shape (n_components, n_features)

explainded_variance¶

The amount of variance explained by each component.

Type: numpy.ndarray of shape (n_components,)

explained_variance_ratio¶

Percentage of variance explained by each component.

Type: numpy.nd_array of shape (n_components,)

singular_values¶

The singular values corresponding to each component.

Type: numpy.ndarray of shape (n_components,)

mean¶

Per feature mean estimated from training data.

Type: numpy.ndarray of shape (n_features,)

Example

>>> import ims
>>> ds = ims.Dataset.read_mea("IMS_data")
>>> X, _ = ds.get_xy()
>>> pca = ims.PCA_Model(ds, n_components=20)
>>> pca.fit(X)
>>> pca.plot()

Tsq_Q_plot(annotate=False)[source]¶

Plots T square values and Q residuals with 95 % confidence limits for outlier detection. Q confidence limit is determined empirically, the T square limit is calculated using the f distribution.

Parameters: annotate (bool, optional) – Annotates markers with sample names when True, by default False.
Return type: matplotlib.pyplot.axes

fit(X_train)[source]¶

Fit the PCA model with training data.

Parameters: X_train (numpy.ndarray of shape (n_samples, n_features)) – The training data.
Returns: The fitted model.
Return type: self

plot(PC_x=1, PC_y=2, annotate=False)[source]¶

Scatter plot of selected principal components.

Parameters

PC_x (int, optional) – PC x axis, by default 1.
PC_y (int, optional) – PC y axis, by default 2.
annotate (bool, optional) – label data points with sample name, by default False.

Return type

matplotlib.pyplot.axes

plot_loadings(PC=1, color_range=0.1, width=6, height=6)[source]¶

Plots loadings of a principle component with the original retention and drift time coordinates.

Parameters

PC (int, optional) – principal component, by default 1.
color_range (int, optional) – color_scale ranges from - color_range to + color_range centered at 0.
width (int or float, optional) – plot width in inches, by default 9.
height (int or float, optional) – plot height in inches, by default 10.

Return type

matplotlib.pyplot.axes

scree_plot()[source]¶

Plots the explained variance ratio per principal component and cumulatively.

Return type: matplotlib.pyplot.axes