Welcome to SALSA’s documentation!¶

SALSA: Software Lab for Advanced Machine Learning with Stochastic Algorithms is a native Julia implementation of stochastic algorithms for:

linear and non-linear Support Vector Machines
sparse linear modelling

SALSA is an open source project available at Github under the GPLv3 license.

Installation¶

The SALSA package can be installed from the Julia command line with Pkg.add("SALSA") or by running the same command directly with Julia executable by julia -e 'Pkg.add("SALSA")'.

Mathematical background¶

The SALSA package aims at stochastically learning a classifier or regressor via the Regularized Empirical Risk Minimization [Vapnik1992] framework. We approach a family of the well-known Machine Learning problems of the type:

$\min_{\bf w} \sum_{i=1}^n \ell({\bf w},\xi_i) + \Omega({\bf w}),$

where $\xi_i = ({\bf x}_i,y_i)$ is given as a pair of input-output variables and belongs to a set $\mathcal{S} = \{\xi_{t}\}_{1 \leq t \leq n}$ of independent observations, the loss functions $\ell({\bf w},\xi_i)$ measures the disagreement between the true target $y$ and the model prediction $\hat{y}$ while the regularization term $\Omega({\bf w})$ penalizes the complexity of the model ${\bf w}$ . We draw uniformly $\xi_i$ from $\mathcal{S}$ at most $T$ times due of the i.i.d. assumption and a fixed computational budget. Online passes and optimization with the full dataset are available too. The package includes stochastic algorithms for linear and non-linear Support Vector Machines [Boser1992] and sparse linear modelling [Hastie2015].

Particular choices of loss functions are (but are not restricted to the selection below):

Particular choices of the regularization term are:

$l_2$ -regularization, i.e. $\|w\|_2^2$
$l_1$ -regularization, i.e. $\|w\|_1$
reweighted $l_2$ -regularization
reweighted $l_1$ -regularization

References¶

SALSA is stemmed from the following algorithmic approaches:

Pegasos: S. Shalev-Shwartz, Y. Singer, N. Srebro, Pegasos: Primal Estimated sub-GrAdient SOlver for SVM, in: Proceedings of the 24th international conference on Machine learning, ICML ’07, New York, NY, USA, 2007, pp. 807–814.
RDA: L. Xiao, Dual averaging methods for regularized stochastic learning and online optimization, J. Mach. Learn. Res. 11 (2010), pp. 2543–2596.
Adaptive RDA: J. Duchi, E. Hazan, Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, J. Mach. Learn. Res. 12 (2011), pp. 2121–2159.
Reweighted RDA: V. Jumutc, J.A.K. Suykens, Reweighted stochastic learning, Neurocomputing Special Issue - ISNN2014, 2015. (In Press)

Dependencies¶

MLBase: to support generic Machine Learning routines
StatsBase: to support generic routines from Statistics
Distances: to support distance metrics between vectors
Distributions: to support sampling from various distributions
DataFrames: to support and process files instead of in-memory matrices
Clustering: to support Stochastic K-means Clustering (experimental feature)
ProgressMeter: to support progress bars and ETA of different routines

Indices and tables¶

[Vapnik1992]

Vapnik, Vladimir. “Principles of risk minimization for learning theory”, In Advances in neural information processing systems (NIPS), pp. 831-838. 1992.

[Boser1992]

Boser, B., Guyon, I., Vapnik, V. “A training algorithm for optimal margin classifiers”, In Proceedings of the fifth annual workshop on Computational learning theory - COLT‘92., pp. 144-152, 1992.

[Hastie2015]

Hastie T., Tibshirani R., Wainwright M. Statistical Learning with Sparsity: The Lasso and Generalizations, Chapman & Hall/CRC Monographs on Statistics & Applied Probability, 2015.