
NAME
	tseg - TreeSegment: optimal segmentation using tree models

USAGE
	tseg [options] [input-file]

DESCRIPTION
	tseg segments a file using tree models

OPTIONS
-M m: fitted model, where:
	model=vlmc: means fitting a VLMC according to each segment
	model=mc: means fitting a fixed-depth tree to each segment

-S s: scoring method, where: 
	s=bic:	means using the Context algorithm to obtain a VLMC and BIC criterion to score the tree
	s=kt:	means using the original version of the CTM algorithm to obtain a VLMC that optimizes KT 
		probability of the tree by considering a local decision: the parent node versus its children. 
	s=kt1:	means using the modified version of the CTm algorithm to obtain a VLMC that optimizes KT
		probability of the tree by considering a local decision: the parent node versus all subsets
		of children. Note that this method was mainly developed for DNA where the alphabet size
		|A|=4. For |A| > 4 this method is inefficient (exponential in |A|) and the gain in segmentation 
		accuracy may not be significant to justify using this modified method comparing to the original CTM
		algorithm (s=kt) 

-D d: the depth of the tree

-K k: maximum number of segments. The algorithm selects the optimal number of segment from 1 to K

-Q q: 	step size for computing the score of a segment only for segment boundaries that are a multiple of Q. 
	The default value of q is 1 and by setting |q| > 1 one can achieve a O(1/q^{2}) speedup of the
	segmentation algorithm

-P p: 	pseudocount for computing the BIC score (s=bic). It sets a pseudocount value for computing the likelihood of 
	a segment given a tree. The default value is 0 and p can range from 0 to 1.

-A a: alphabet size 

-V: (verbose) print trees corresponding to all segments

-fastaDNA: the input file is a DNA sequence in FASTA format

INPUT FILE FORMAT
	An ASCII file, where alphabet symbols as (integer alphabet indexes from 0 to |A|-1) 
	are separated with the new line (carriage return) symbol.

RESULTS
	Array containing pairs {partition point, <depth of the corresponding tree>] of the form
	I[k']:[{p1, <d1>}, {p2, <d2>},  {pk', <dk'>}], where k' is the optimal number of partition points

EXAMPLE
	tseg -M vlmc -S kt -D 3 -K 20 -A 4 -P 0.5 -fastaDNA Y50D4C.3


Any suggestions or bug reports are very appreciated.

Thank you,


Robert Gwadera
gwadera@cis.hut.fi 



