PLSR

class ims.plsr.PLSR(dataset, n_components=2, **kwargs)[source]

Bases: object

Applies a scikit-learn PLSRegression to GC-IMS data and provides prebuilt plots as well as a feature selection via variable importance in projection (VIP) scores.

See the scikit-learn documentation for more details: https://scikit-learn.org/stable/modules/generated/sklearn.cross_decomposition.PLSRegression.html

Parameters
  • dataset (ims.Dataset) – Needed for the retention and drift time coordinates in the plots.

  • n_components (int, optional) – Number of components to keep, by default 2.

  • kwargs (optional) – Additional key word arguments are passed on to the scikit-learn PLSRegression.

x_scores

X scores.

Type

numpy.ndarray of shape (n_samples, n_components)

y_scores

y scores.

Type

numpy.ndarray of shape (n_samples, n_components)

x_weights

The left singular vectors of the cross-covariance matrices of each iteration.

Type

numpy.ndarray of shape (n_features, n_components)

y_weights

The right singular vectors of the cross-covariance matrices of each iteration.

Type

numpy.ndarray of shape (n_targets, n_components)

x_loadings

The loadings of X. When scaling was applied on the dataset, corrects the loadings using the weights.

Type

numpy.ndarray of shape (n_features, n_components)

y_loadings

The loadings of y.

Type

numpy.ndarray of shape (n_targes, n_components)

coefficients

The coefficients of the linear model.

Type

numpy.ndarray of shape (n_features, n_targets)

y_pred_train

Stores the predicted values from the training data for the plot method.

Type

numpy.ndarray

Example

>>> import ims
>>> import pandas as pd
>>> ds = ims.Dataset.read_mea("IMS_data")
>>> responses = pd.read_csv("responses.csv")
>>> ds.labels = responses
>>> X_train, X_test, y_train, y_test = ds.train_test_split()
>>> model = ims.PLSR(ds, n_components=5)
>>> model.fit(X_train, y_train)
>>> model.predict(X_test, y_test)
>>> model.plot()
fit(X_train, y_train)[source]

Fits the model with training data.

Parameters
  • X_train (numpy.ndarray of targets (n_samples, n_features)) – Training vectors with features.

  • y_train (numpy.ndarray of shape (n_samples, n_targets)) – Target vectors with response variables.

Returns

Fitted model.

Return type

self

plot(annotate=False)[source]

Plots predicted vs actual values and shows regression line. Recommended to predict with test data first.

annotatebool, optional

If True annotates plot with sample names, by default False.

Return type

matplotlib.pyplot.axes

plot_coefficients(width=6, height=6)[source]

Plots PLS coefficients as image with retention and drift time axis.

Parameters
  • width (int or float, optional) – Width of the plot in inches, by default 8.

  • height (int or float, optional) – Height of the plot in inches, by default 8.

Return type

matplotlib.pyplot.axes

plot_loadings(component=1, color_range=0.01, width=6, height=6)[source]

Plots PLS x loadings as image with retention and drift time coordinates.

Parameters
  • component (int, optional) – Component to plot, by default 1.

  • color_range (float, optional) – Minimum and maximum to adjust to different scaling methods, by default 0.02.

  • width (int or float, optional) – Width of the plot in inches, by default 8.

  • height (int or float, optional) – Height of the plot in inches, by default 8.

Return type

matplotlib.pyplot.axes

plot_selectivity_ratio(threshold=None, width=6, height=6)[source]

Plots VIP scores as image with retention and drift time axis.

Parameters
  • threshold (int) – Only plots VIP scores above threshold if set. Values below are displayed as 0, by default None.

  • width (int or float, optional) – Width of the plot in inches, by default 8.

  • height (int or float, optional) – Height of the plot in inches, by default 8.

Return type

matplotlib.pyplot.axes

plot_vip_scores(threshold=None, width=6, height=6)[source]

Plots VIP scores as image with retention and drift time axis.

Parameters
  • threshold (int) – Only plots VIP scores above threshold if set. Values below are displayed as 0, by default None.

  • width (int or float, optional) – Width of the plot in inches, by default 8.

  • height (int or float, optional) – Height of the plot in inches, by default 8.

Return type

matplotlib.pyplot.axes

Raises

ValueError – If VIP scores have not been calculated prior.

predict(X_test, y_test=None)[source]

Predicts responses for features of the test data.

Parameters
  • X_test (numpy.ndarray of shape (n_samples, n_features)) – Features of test data.

  • y_train (numpy.ndarray of shape (n_samples, n_targets), optional) – True labels for test data. If set allows automatic plotting of validation data, by default None.

Returns

Predicted responses for test data.

Return type

numpy.ndarray of shape (n_samples, n_targets)

score(X_test, y_test, sample_weight=None)[source]

Calculates R^2 score score for predicted data.

Parameters
  • X_test (numpy.ndarray of shape (n_samples, n_features)) – Feature vectors of the test data.

  • y_test (numpy.ndarray of shape (n_samples, n_targets)) – True regression responses.

  • sample_weight (array-like of shape (n_samples,), optional) – Sample weights, by default None.

Returns

score – R^2 score.

Return type

float

transform(X, y=None)[source]

Apply the dimensionality reduction.

Parameters
  • X (numpy.ndarray of shape (n_samples, n_features)) – Feature matrix.

  • y (numpy.ndarray of shape (n_samples, n_targtets), optional) – Dependend variables, by default None

Return type

X_scores