PCA_Model¶
- class ims.pca.PCA_Model(dataset, n_components=None, svd_solver='auto', **kwargs)[source]¶
Bases:
object
PCA_Model is a wrapper class around the scikit-learn PCA implementation and provides prebuilt plots for GC-IMS datasets.
See the original scikit-learn documentation for a detailed description: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
- Parameters
dataset (ims.Dataset) – The dataset is needed for the retention and drift time coordinates.
n_components (int or float, optional) – Number of components to keep. If None all components are kept, by default None.
svd_solver (str, optional) – “auto”, “full”, “arpack” or “randomised” are valid, by default “auto”.
**kwargs (optional) – Additional key word arguments are passed to the scikit-learn PCA. See the original documentation for valid parameters.
- scores¶
X with dimension reduction applied.
- Type
numpy.ndarray of shape (n_samples, n_features)
- loadings¶
PCA loadings already corrected when a scaling method was applied on the dataset.
- Type
numpy.ndarray of shape (n_components, n_features)
- explainded_variance¶
The amount of variance explained by each component.
- Type
numpy.ndarray of shape (n_components,)
- explained_variance_ratio¶
Percentage of variance explained by each component.
- Type
numpy.nd_array of shape (n_components,)
- singular_values¶
The singular values corresponding to each component.
- Type
numpy.ndarray of shape (n_components,)
- mean¶
Per feature mean estimated from training data.
- Type
numpy.ndarray of shape (n_features,)
Example
>>> import ims >>> ds = ims.Dataset.read_mea("IMS_data") >>> X, _ = ds.get_xy() >>> pca = ims.PCA_Model(ds, n_components=20) >>> pca.fit(X) >>> pca.plot()
- Tsq_Q_plot(annotate=False)[source]¶
Plots T square values and Q residuals with 95 % confidence limits for outlier detection. Q confidence limit is determined empirically, the T square limit is calculated using the f distribution.
- Parameters
annotate (bool, optional) – Annotates markers with sample names when True, by default False.
- Return type
matplotlib.pyplot.axes
- fit(X_train)[source]¶
Fit the PCA model with training data.
- Parameters
X_train (numpy.ndarray of shape (n_samples, n_features)) – The training data.
- Returns
The fitted model.
- Return type
self
- plot(PC_x=1, PC_y=2, annotate=False)[source]¶
Scatter plot of selected principal components.
- Parameters
PC_x (int, optional) – PC x axis, by default 1.
PC_y (int, optional) – PC y axis, by default 2.
annotate (bool, optional) – label data points with sample name, by default False.
- Return type
matplotlib.pyplot.axes
- plot_loadings(PC=1, color_range=0.1, width=6, height=6)[source]¶
Plots loadings of a principle component with the original retention and drift time coordinates.
- Parameters
PC (int, optional) – principal component, by default 1.
color_range (int, optional) – color_scale ranges from - color_range to + color_range centered at 0.
width (int or float, optional) – plot width in inches, by default 9.
height (int or float, optional) – plot height in inches, by default 10.
- Return type
matplotlib.pyplot.axes