Data Preprocessing¶
This part of the package provides a simple set of preprocessing utilities.
Data Normalization¶
-
mapstd(X)¶ Normalizes each column of
Xto zero mean and one standard deviation. Output normalized matrixXwith extracted column-wise means and standard deviations.using SALSA mapstd([0 1; -1 2]) # --> ([0.707107 -0.707107; -0.707107 0.707107], [-0.5 1.5], [0.707107 0.707107])
-
mapstd(X, mean, std) Normalizes each column of
Ato the specified column-wisemeanandstd. Output normalized matrixX.using SALSA mapstd([0 1; -1 2], [-0.5 1.5], [0.707107 0.707107]) # --> [0.707107 -0.707107; -0.707107 0.707107]
Sparse Data Preparation¶
-
make_sparse(tuples[, sizes, delim])¶ Creates
SparseMatrixCSCobject from matrix of tuplesMatrix{ASCIIString}containingindex:valuepairs. The index and value pair can be separated bydelimcharacter, e.g.:. The user can optionally specify final dimensions of theSparseMatrixCSCobject assizestuple.Parameters: - tuples – matrix of tuples
Matrix{ASCIIString}containingindex:valuepairs - sizes – optional tuple of final dimensions, e.g.
(100000,10)(empty by default) - delim – optional character separating index and value pair in each cell of
tuples, default is ”:”
Returns: SparseMatrixCSCobject.- tuples – matrix of tuples
Data Management¶
-
DelimitedFile(name, header, delim)¶ Creates a wrapper around any delimited file which can be passed to low-level routines, for instance
pegasos_alg().DelimitedFilewill be processed in the online mode regardless of theonline_pass==0flag passed to low-level routines.Parameters: - name – file name
- header – flag indicating if a header is present
- delim – delimiting character