The new packages work just like the older versions if you don't use any of the new features. The file format is changes slightly but the old data files should work. The only change is the option to ignore some vector components. Some of the new features only work in an UNIX-like environment because they require the UNIX popen() function call. New features are:
.gz
at the end of the filename. The file is automatically uncompressed or
compressed as the file is being read or written. SOM/LVQ use
gzip for compressing and uncompressing. It can also read
files compressed with regular UNIX compress-command (since
gzip does it). The commands used for compressing and decompressing can
be changed with command line options or at compile time.Example: with vsom, to use a compressed data file for teaching:
vsom -din data.dat.gz ...Since compressing/uncompressing uses the popen() function to run gzip, compressed files won't work in some environments (MSDOS).
vsom -din - ...
For example:
vsom -cin "|randinit ..." ...Vsom would start the program randinit when it wants to read the initial codebook. However, the same thing could be done with:
randinit ... | vsom -cin - ...This feature is useful in the saving of snapshots. The reading and writing of compressed commands is actually a special case of this feature (the same restrictions about popen() apply).
Note that when the whole file has been read once and we want to read it again, the file has to be rewinded (for regular files) or the uncompressing command has to be rerun. This is done automatically and the user doesn't usually have to worry about it but it forces some restrictions on the input file: If the source is a pipe, it can't be rewinded. Regular files, compressed files and standard input (if it is a file) work. Using a pipe works fine if you don't have to rewind it, ie. the data doesn't end or the number of iterations is smaller than the number of data vectors.
Most programs support the buffered reading of data files. It is
ativated with the command line option -buffer
followed with the
maximum number of data vectors to be kept in memory. For example:
vsom ... -buffer 10000would read the input data file 10000 lines at a time.
1.1 2.0 0.5 4.0 5.5 1.3 6.0 x 2.9 x 1.9 1.5 0.1 0.3 xWhen vector distances are calculated or the winner is calculated or when adapting codebook vectors and with labeling the components marked with x are ignored, they are not adapted (the corresponding component in the codebook vector) or used in distance calculations. The string that indicates a component that should be ignored can be changed with a command line option or set at compile time.
vsom -snapinterval 10000 -snapfile "ex.%d.cod" ...gives you snapshots files every 10000 iterations with names staring with: ex.10000.cod, ex.20000.cod, ex.30000.cod, etc.
Another example:
./vsom -din ex.dat -cin ex2.cod -cout ex.cod -rlen 10000 -alpha 0.02 -radius 3 -snapfile "|./vcal -din ex_fts.dat -cin - -cout foo.%d.cod" -snapinterval 1000This command would teach the map file ex2.cod with data from file ex.dat with 10000 iterations. The teached codebook file is saved in file ex.cod. Every 1000 iterations the codebook is piped to vcal, which labels the codebook units with data from ex_fts.dat. The labeled codebooks are saved in files foo.1000.cod, foo.2000.cod, etc.
Randomized entry order
By default the data vectors are used in the order they appear in the
data file. To use them in random order use the -rand option
followed by a seed for the random number generator. For example
-rand 10 would initialize the random number generator with
the seed value of 10. Seed 0 initializes the random number generator
with the current time.
Random entry order works so that when data is read from file or the
data is reused the order of vectors is randomized. If the whole data
file is loaded into memory (not using buffered loading) the whole set
is randomized. If buffered loading is used the randomization is done
for the piece of file that is loaded into memory.
Common options
These options are common to all programs:
vsom ... -mask_str "*" ...would ignore components thats are marked with '*' instead of 'x'. Longer strings can also be used.