----------------------------------------------------------------
Protein Structure Prediction Pipeline
Version: 1.0 (beta)

Michael S. Lee
Valmik Desai
Rajkumar Bondugula

Biotechnology HPC Software Applications Institute
Frederick, MD 21702
----------------------------------------------------------------
Installation: See PSPP_install.doc
----------------------------------------------------------------
Instructions
------------

The main programs to run are
seq_router.pl and mpi_seq_router.pl

Usage:   seq_router.pl [options] <single sequence file, FASTA-format>

Options:
         [-seqprop] 
           Predict full-sequence 1-D properties (slower)
           This includes transmembran helix, disorder, secondary structure, 
           and solvent accessibility. Compute time is dominated by PSI-BLAST run.

         [-dompred] <F, B, or C> 
           Predict domain boundary using FIEFDom with -dompred F
           Predict domain boundary using Bayes   with -dompred B
           Predict domain boundary using both    with -dompred C

         [-domain] <a,b,...> (domain boundaries)
  -or-   [-domain] <a:b,c:d,...> (domain regions)
           Manually delineate domain boundaries or domain regions

         [-nomodel] -or- [-nomodels]
           Don't make any structures (Annotations only)

         [-cmonly] 
           Do only comparative modeling. (Skip fold recognition and ab initio)

         [-fronly] 
           Do only fold recognition. (Skip comparative modeling and ab initio)

         [-aionly] 
           Do only ab initio. (Skip comparative modeling and fold recognition)

         [-cmthresh] <values 0 to 100>
           Percent homology threshold for building a CM model

         [-maxcm <max # of comparative models>]
           Specify maximum number of comparative models to generate

         [-frthresh] <values 0 to 20>
           Z-score threshold for building a Fold Recognition model
           See http://compbio.ornl.gov/structure/prospect2/output.html

         [-maxfr <max # of fold recogntion models>]
           Specify maximum number of fold recognition models to generate

         [-maxai] <# of best models>
           Specify number of "best" models to output and compare to SCOP 
           from ab initio folding

         [-ainum] <# of models>
           Specify total number of Rosetta AbInitio decoys to generate

         [-roshom]
           When specified, make sure to use homologous proteins in the fragment
           database for Rosetta AbInitio

         [-noai] 
           Don't run ab initio

         [-out <output directory>]
           Specify output directory to generate all of the results

         [-route] 
           Run CM, FR, and AI, in order, until atleast one model is found,
           then stop.

=========================
Usage:   mpi_seq_router.pl [options] <single or multiple sequence file, FASTA-format>

         Same as seq_router.pl except for extra option

         [-hostfile]
       
          Specify filename which lists hosts and number of CPU to use. 
          Example: host.txt 

                   node1 slots=2
                   node5 slots=2
                      .
                      .
                      .

          Also, mpi_seq_router.pl can accept multiple sequences in the same FASTA file.
          As of this version, the label assigned to each sequence is the first word on the
          header line. For example,
         
          >1SHG SH3-domain protein
        
          "1SHG" will be the protein label. An exception is if the header begins with the 
          >gi#, in which case, the fourth term is used.

          Note: Be sure to use simple one word headers or "gi" format, otherwise, the single 
          sequence filenames that will be produced could cause unpredictable errors.
 
          For non-MPI runs using seq_router.pl, the label
          is simply the FASTA filename without the .fa or .fasta suffix. 
----------------------------------------------------------------
