/**********************************************************************
    Copyright (C) 2004 Database Systems Lab, Supercomputer Education and
    Research Centre, Indian Institute of Science, Bangalore, INDIA.
    http://dsl.serc.iisc.ernet.in

    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program; if not, write to the Free Software
    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
***********************************************************************/

Persistent Suffix tree Implementation 			Release 1.0

Short Description
-----------------	
	This system is designed as a standalone system that provides
	storage management, buffer pool management and algorithms to
	construct Persistent suffix tree over DNA sequence. The code
	base also provides a simple algorithm for exact searching
	over the already constructed suffix tree, purely to illustrate
	the subsequent usage of the constructed tree.

	The buffering algorithms provided are TOPQ (convertible into
	TOP) and LRU. The implementation supports both array and 
	linked list based representation of suffix tree nodes.

	For further details on the algorithm, buffering techniques
	and performance numbers, refer the following publication:

	[1] S. Bedathur and J. Haritsa, "Engineering a Fast Online
	    Persistent Suffix tree Construction", ICDE 2004.

Author: Srikanta Bedathur <srikanta@dsl.serc.iisc.ernet.in>
        Database Systems Lab, Supercomputer Education and Research Centre,
        Indian Institute of Science, Bangalore, INDIA.

Compile Instructions
--------------------	
	At the top-level, type "make" and wait.

	The code-base has been compiled and tested under g++ (GCC) 3.2
	20020903 (Red Hat Linux 8.0 3.2-7).

	Dependencies:
	------------
	Except for the use of malloc.[Ch], this code is completely
	self-contained (i.e., no significant dependencies on any other
	package/library). These two files are used to support easy
	portability amongst platforms (esp. older Linux, where no
	single process can allocate memory larger than ~1.2 Mb - this
	is fixed in later kernels).

	Compilation Flags:
	------------------
	(Refer the top Makefile)
	There are primarily 3 classes of compilation options that
	would be of interest. The description of each class, and the
	flags in them follows:
	
	i) COMMONFLAGS
	These flags are for the compiler to perform the necessary
	optimizations, or generate debug code etc.

	ii) IMPLFLAGS
	Currently the codebase supports array-based and linked-list
	based internal node representations, with ARRAY being the
	default option. By including the LINKEDLIST definition into
	the CXXFLAGS, one can switch to linked-list mode.
	
	iii) STATFLAGS
	Inclusion of these flags turns the statistics collection on,
	and otherwise the statistics are not collected. Turning on the
	statistics collection does not significantly reduce the
	performance, but in production code, one may not want to print
	out the statistics.

  Since the code has NOT undergone any massive cleaning up, there are
  many compilation flags that might be floating around in the source
  code. Although many of them are self-explanatory (to me!), some
  may not be. Send me a mail if you think there is a
  compilation flag that you should know, but cannot figure out.

License
-------
	The code base is distributed under GPL. Please refer to
	COPYRIGHT.GPL in the top-level for the details about the license.

Directory Contents
------------------
CODE_OVERVIEW.TXT   Brief description of source files in the release
COPYRIGHT.GPL       GNU Public license
Makefile            The top makefile for the release.
src/                C++ Implementation source files
incl/               C++ Header files
obj/                object file directory
test/               contains sample test DNA sequence and a query file
suffixrc-ar.example rc-file for array representation over sample test 
                    sequence given in test/
suffixrc-ll.example rc-file for linked-list representation over sample test
                    sequence given in test/
suffixrc.template   Template file for rc-files


Structure of the rc-file
------------------------
datafile:<datafilename> ---> expected to be a "pure"sequence file,
			      one long text of characters.

datasize:<length of the sequence being indexed>
internalBasefile:<base filename for internal nodes>	-> will be extended 
						by appending ".0", ".1" etc..
leafBasefile:<base filename for leaf nodes>
internalNodeSize:32
leafNodeSize:12


note that there is no space between ":" and option value as well as
the option tag

The last two values viz., internalNodeSize and leafNodeSize, are
needed to ensure that indexes built with different page sizes are not
mixed up during construction and search phases.

Example suffixrc files for the test DNA sequence given in the test/
directory are provided as: suffixrc-ar.example, and suffixrc-ll.example

Reporting problems 
------------------
	send mail to srikanta@dsl.serc.iisc.ernet.in
	
	Please give me sometime before I can reply to your mails. Have
	patience.