PLSR

class ims.plsr.PLSR(dataset, n_components=2, **kwargs)[source]

Bases: object

Applies a scikit-learn PLSRegression to GC-IMS data and provides prebuilt plots as well as a feature selection via variable importance in projection (VIP) scores.

See the scikit-learn documentation for more details: https://scikit-learn.org/stable/modules/generated/sklearn.cross_decomposition.PLSRegression.html

Parameters:
  • dataset (ims.Dataset) – Needed for the retention and drift time coordinates in the plots.

  • n_components (int, optional) – Number of components to keep, by default 2.

  • kwargs (optional) – Additional key word arguments are passed on to the scikit-learn PLSRegression.

x_scores

X scores.

Type:

numpy.ndarray of shape (n_samples, n_components)

y_scores

y scores.

Type:

numpy.ndarray of shape (n_samples, n_components)

x_weights

The left singular vectors of the cross-covariance matrices of each iteration.

Type:

numpy.ndarray of shape (n_features, n_components)

y_weights

The right singular vectors of the cross-covariance matrices of each iteration.

Type:

numpy.ndarray of shape (n_targets, n_components)

x_loadings

The loadings of X. When scaling was applied on the dataset, corrects the loadings using the weights.

Type:

numpy.ndarray of shape (n_features, n_components)

y_loadings

The loadings of y.

Type:

numpy.ndarray of shape (n_targes, n_components)

coefficients

The coefficients of the linear model.

Type:

numpy.ndarray of shape (n_features, n_targets)

y_pred_train

Stores the predicted values from the training data for the plot method.

Type:

numpy.ndarray

Example

>>> import ims
>>> import pandas as pd
>>> ds = ims.Dataset.read_mea("IMS_data")
>>> responses = pd.read_csv("responses.csv")
>>> ds.labels = responses
>>> X_train, X_test, y_train, y_test = ds.train_test_split()
>>> model = ims.PLSR(ds, n_components=5)
>>> model.fit(X_train, y_train)
>>> model.predict(X_test, y_test)
>>> model.plot()
fit(X_train, y_train)[source]

Fits the model with training data.

Parameters:
  • X_train (numpy.ndarray of targets (n_samples, n_features)) – Training vectors with features.

  • y_train (numpy.ndarray of shape (n_samples, n_targets)) – Target vectors with response variables.

Returns:

Fitted model.

Return type:

self

plot(annotate=False)[source]

Plots predicted vs actual values and shows regression line. Recommended to predict with test data first.

annotatebool, optional

If True annotates plot with sample names, by default False.

Return type:

matplotlib.pyplot.axes

plot_coefficients(width=6, height=6)[source]

Plots PLS coefficients as image with retention and drift time axis.

Parameters:
  • width (int or float, optional) – Width of the plot in inches, by default 8.

  • height (int or float, optional) – Height of the plot in inches, by default 8.

Return type:

matplotlib.pyplot.axes

plot_loadings(component=1, color_range=0.01, width=6, height=6)[source]

Plots PLS x loadings as image with retention and drift time coordinates.

Parameters:
  • component (int, optional) – Component to plot, by default 1.

  • color_range (float, optional) – Minimum and maximum to adjust to different scaling methods, by default 0.02.

  • width (int or float, optional) – Width of the plot in inches, by default 8.

  • height (int or float, optional) – Height of the plot in inches, by default 8.

Return type:

matplotlib.pyplot.axes

plot_selectivity_ratio(threshold=None, width=6, height=6)[source]

Plots VIP scores as image with retention and drift time axis.

Parameters:
  • threshold (int) – Only plots VIP scores above threshold if set. Values below are displayed as 0, by default None.

  • width (int or float, optional) – Width of the plot in inches, by default 8.

  • height (int or float, optional) – Height of the plot in inches, by default 8.

Return type:

matplotlib.pyplot.axes

plot_vip_scores(threshold=None, width=6, height=6)[source]

Plots VIP scores as image with retention and drift time axis.

Parameters:
  • threshold (int) – Only plots VIP scores above threshold if set. Values below are displayed as 0, by default None.

  • width (int or float, optional) – Width of the plot in inches, by default 8.

  • height (int or float, optional) – Height of the plot in inches, by default 8.

Return type:

matplotlib.pyplot.axes

Raises:

ValueError – If VIP scores have not been calculated prior.

predict(X_test, y_test=None)[source]

Predicts responses for features of the test data.

Parameters:
  • X_test (numpy.ndarray of shape (n_samples, n_features)) – Features of test data.

  • y_train (numpy.ndarray of shape (n_samples, n_targets), optional) – True labels for test data. If set allows automatic plotting of validation data, by default None.

Returns:

Predicted responses for test data.

Return type:

numpy.ndarray of shape (n_samples, n_targets)

score(X_test, y_test, sample_weight=None)[source]

Calculates R^2 score score for predicted data.

Parameters:
  • X_test (numpy.ndarray of shape (n_samples, n_features)) – Feature vectors of the test data.

  • y_test (numpy.ndarray of shape (n_samples, n_targets)) – True regression responses.

  • sample_weight (array-like of shape (n_samples,), optional) – Sample weights, by default None.

Returns:

score – R^2 score.

Return type:

float

transform(X, y=None)[source]

Apply the dimensionality reduction.

Parameters:
  • X (numpy.ndarray of shape (n_samples, n_features)) – Feature matrix.

  • y (numpy.ndarray of shape (n_samples, n_targtets), optional) – Dependend variables, by default None

Return type:

X_scores