PLS_DA¶

class ims.plsda.PLS_DA(dataset, n_components=2, **kwargs)[source]¶

Bases: object

PLS-DA classifier built using the scikit-learn PLSRegression implementation. Provides prebuilt plots and feature selection via variable importance in projection (VIP) scores.

See the scikit-learn documentation for more details: https://scikit-learn.org/stable/modules/generated/sklearn.cross_decomposition.PLSRegression.html

Parameters

dataset (ims.Dataset) – Needed for the retention and drift time coordinates in the plots.
n_components (int, optional) – Number of components to keep, by default 2.
kwargs (optional) – Additional key word arguments are passed on to the scikit-learn PLSRegression.

x_scores¶

X scores.

Type: numpy.ndarray of shape (n_samples, n_components)

y_scores¶

y scores.

Type: numpy.ndarray of shape (n_samples, n_components)

x_weights¶

The left singular vectors of the cross-covariance matrices of each iteration.

Type: numpy.ndarray of shape (n_features, n_components)

y_weights¶

The right singular vectors of the cross-covariance matrices of each iteration.

Type: numpy.ndarray of shape (n_targets, n_components)

x_loadings¶

The loadings of X. When scaling was applied on the dataset, corrects the loadings using the weights.

Type: numpy.ndarray of shape (n_features, n_components)

y_loadings¶

The loadings of y.

Type: numpy.ndarray of shape (n_targes, n_components)

coefficients¶

The coefficients of the linear model.

Type: numpy.ndarray of shape (n_features, n_targets)

vip_scores¶

Variable importance in projection (VIP) scores.

Type: numpy.ndarray of shape (n_features,)

y_pred_train¶

Stores the predicted values from the training data for the plot method.

Type: numpy.ndarray

Example

>>> import ims
>>> ds = ims.Dataset.read_mea("IMS_data")
>>> X_train, X_test, y_train, y_test = ds.train_test_split()
>>> model = ims.PLS_DA(ds, n_components=5)
>>> model.fit(X_train, y_train)
>>> model.predict(X_test, y_test)
>>> model.plot()

fit(X_train, y_train)[source]¶

Fits the model with training data. Converts the labels into a binary matrix.

Parameters

X_train (numpy.ndarray of targets (n_samples, n_features)) – Training vectors with features.
y_train (numpy.ndarray of shape (n_samples,)) – True class labels for training data.

Returns

Fitted model.

Return type

self

plot(x_comp=1, y_comp=2, annotate=False)[source]¶

Plots PLS components as scatter plot.

Parameters

x_comp (int, optional) – Component x axis, by default 1.
y_comp (int, optional) – Component y axis, by default 2.
annotate (bool, optional) – If True adds sample names to markers, by default False.

Return type

matplotlib.pyplot.axes

plot_coefficients(group=0, width=6, height=6)[source]¶

Plots PLS coefficients of selected group as image with retention and drift time axis.

Parameters

group (int or str, optional) – Index or name of group, by default 0.
width (int or float, optional) – Width of the plot in inches, by default 8.
height (int or float, optional) – Height of the plot in inches, by default 8.

Return type

matplotlib.pyplot.axes

plot_loadings(component=1, color_range=0.02, width=6, height=6)[source]¶

Plots PLS x loadings as image with retention and drift time coordinates.

Parameters

component (int, optional) – Component to plot, by default 1.
color_range (float, optional) – Minimum and Maximum to adjust to different scaling methods, by default 0.02.
width (int or float, optional) – Width of the plot in inches, by default 8.
height (int or float, optional) – Height of the plot in inches, by default 8.

Return type

matplotlib.pyplot.axes

plot_vip_scores(threshold=None, width=6, height=6)[source]¶

Plots VIP scores as image with retention and drift time axis.

Parameters

threshold (int) – Only plots VIP scores above threshold if set. Values below are displayed as 0, by default None.
width (int or float, optional) – Width of the plot in inches, by default 8.
height (int or float, optional) – Height of the plot in inches, by default 8.

Return type

matplotlib.pyplot.axes

predict(X_test)[source]¶

Predicts class labels for test data. Converts back from binary labels matrix to a list of class names. If y_test is set also calculates accuracy, precision and recall and stores them as attributes.

Parameters: X_test (numpy.ndarray of shape (n_samples, n_features)) – Feature vectors of test dataset.
Returns: Predicted class labels.
Return type: numpy.ndarray of shape (n_samples,)

score(X_test, y_test, sample_weight=None)[source]¶

Calculates accuracy score for predicted data.

Parameters

X_test (numpy.ndarray of shape (n_samples, n_features)) – Feature vectors of the test data.
y_test (numpy.ndarray of shape (n_samples,)) – True classification labels.
sample_weight (array-like of shape (n_samples,), optional) – Sample weights, by default None.

Returns

score – Mean accuracy score.

Return type

float

transform(X, y=None)[source]¶

Apply the dimensionality reduction.

Parameters

X (numpy.ndarray of shape (n_samples, n_features)) – Feature matrix.
y (numpy.ndarray of shape (n_samples, n_targtets), optional) – Dependend variables, by default None

Returns

X_scores

Return type

tuple