Nyström Approximation

While linear techniques operating in the primal (input) space are able to achieve good generalization capabilities in some specific application areas, one cannot in general approximate with the linear model more complex or highly nonlinear functions. We apply a Fixed-Size approach [FS2010] and Nyström approximation [WS2001] to approximate a kernel-induced feature map with some higher dimensional explicit and approximate feature vector.

We select prototype vectors (a small working sample of size m \ll n) and construct, for instance an RBF kernel matrix K with

K_{ij} = e^{-\frac{\Vert x_i-x_j \Vert ^2}{2\sigma^2} }.

By following the approach in [WS2001] an expression for the entries of the approximated feature map \hat{\Phi}(x) : \mathbb{R}^d \rightarrow \mathbb{R}^m, with \hat{\Phi}(x) = (\hat{\Phi}_1(x),\ldots,\hat{\Phi}_m(x))^T is given by

\hat{\Phi}_i(x) = \frac{1}{\sqrt{\lambda_{i,m}}} \sum_{t=1}^m u_{ti,m}k(x_t,x),

where \lambda_{i,m} and u_{i,m} denote the i-th eigenvalue and the i-th eigenvector of K.

Available API

AFEm(Xs, kernel, X)

Performs Automatic Feature Extraction (AFE) by Nyström method [WS2001] using a subsample X_s \in X. We restrict kernel <: Kernel to be a subclass of Kernel, for instance RBFKernel.

Parameters:
  • Xs – subset which is used to construct kernel matrix K
  • kernel – kernel function, e.g. RBFKernel(), used to construct kernel matrix K
  • X – full dataset
Returns:

new dataset X_f derived from stacking together feature maps for every x_i \in X

entropy_subset(X, kernel, subset_size)

Performs maximization of the quadratic Rényi Entropy by the representative points selection from X which can be supplied to AFEm as Xs subset.

Parameters:
  • X – full dataset
  • kernel – kernel function, e.g. RBFKernel(), used to construct kernel matrix K over which we compute Rényi Entropy
  • subset_size – number of representative data points

Available Kernel Functions

LinearKernel()

Defines an implementation of the Linear Kernel, i.e. k(x,y) = \langle x,y \rangle.

PolynomialKernel()

Defines an implementation of the Polynomial Kernel, i.e. k(x,y) = (\langle x,y \rangle + \tau)^d.

RBFKernel()

Defines an implementation of the Radial Basis Function (RBF) Kernel, i.e. k(x,y) = \exp(-\frac{\|x - y\|^2}{2\sigma^2}).

[WS2001](1, 2, 3) Williams C. and Seeger M., “Using the Nyström method to speed up kernel machines”, in Proceedings of the 14th Annual Conference on Neural Information Processing (NIPS), pp. 682-688, 2001.