Data Preprocessing¶
This part of the package provides a simple set of preprocessing utilities.
Data Normalization¶
-
mapstd
(X)¶ Normalizes each column of
X
to zero mean and one standard deviation. Output normalized matrixX
with extracted column-wise means and standard deviations.using SALSA mapstd([0 1; -1 2]) # --> ([0.707107 -0.707107; -0.707107 0.707107], [-0.5 1.5], [0.707107 0.707107])
-
mapstd
(X, mean, std) Normalizes each column of
A
to the specified column-wisemean
andstd
. Output normalized matrixX
.using SALSA mapstd([0 1; -1 2], [-0.5 1.5], [0.707107 0.707107]) # --> [0.707107 -0.707107; -0.707107 0.707107]
Sparse Data Preparation¶
-
make_sparse
(tuples[, sizes, delim])¶ Creates
SparseMatrixCSC
object from matrix of tuplesMatrix{ASCIIString}
containingindex:value
pairs. The index and value pair can be separated bydelim
character, e.g.:
. The user can optionally specify final dimensions of theSparseMatrixCSC
object assizes
tuple.Parameters: - tuples – matrix of tuples
Matrix{ASCIIString}
containingindex:value
pairs - sizes – optional tuple of final dimensions, e.g.
(100000,10)
(empty by default) - delim – optional character separating index and value pair in each cell of
tuples
, default is ”:”
Returns: SparseMatrixCSC
object.- tuples – matrix of tuples
Data Management¶
-
DelimitedFile
(name, header, delim)¶ Creates a wrapper around any delimited file which can be passed to low-level routines, for instance
pegasos_alg()
.DelimitedFile
will be processed in the online mode regardless of theonline_pass==0
flag passed to low-level routines.Parameters: - name – file name
- header – flag indicating if a header is present
- delim – delimiting character