prss3 man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

PRSS3(1)							      PRSS3(1)

NAME
       prss - test a protein sequence similarity for significance

SYNOPSIS
       prss34  [-Q  -A	-f  # -g # -H -O file -s SMATRIX -w # -Z # -k # -v # ]
       sequence-file-1 sequence-file-2 [ #-of-shuffles ]

       prfx34 [-Q -A -f # -g # -H -O file -s SMATRIX -w # -z 1,3 -Z # -k #  -v
       # ] sequence-file-1 sequence-file-2 [ ktup ] [ #-of-shuffles ]

       prss34(_t)/prfx34(_t) [-AfghksvwzZ] - interactive mode

DESCRIPTION
       prss34  and  prfx34  are	 used  to  evaluate the significance of a pro‐
       tein:protein, DNA:DNA ( prss34 ), or translated-DNA:protein ( prfx34  )
       sequence	 similarity  score  by comparing two sequences and calculating
       optimal similarity scores, and then  repeatedly	shuffling  the	second
       sequence,  and  calculating  optimal similarity scores using the Smith-
       Waterman algorithm. An extreme value distribution is then  fit  to  the
       shuffled-sequence scores.  The characteristic parameters of the extreme
       value distribution are then used to estimate the probability that  each
       of  the	unshuffled  sequence scores would be obtained by chance in one
       sequence, or in a number of sequences equal to the number of  shuffles.
       This  program  is  derived  from rdf2, described by Pearson and Lipman,
       PNAS (1988) 85:2444-2448, and Pearson (Meth. Enz.  183:63-98).  Use  of
       the extreme value distribution for estimating the probabilities of sim‐
       ilarity scores  was  described  by  Altshul  and	 Karlin,  PNAS	(1990)
       87:2264-2268.   The and expectations calculated by prdf.	 prss34 calcu‐
       lates optimal scores using the same rigorous  Smith-Waterman  algorithm
       (Smith  and  Waterman,  J.  Mol.	 Biol. (1983) 147:195-197) used by the
       ssearch34 program.  prfx34 calculates scores using the FASTX  algorithm
       (Pearson et al. (1997) Genomics 46:24-36.

       prss34  and  prfx34  also  allow a more sophisticated shuffling method:
       residues can be shuffled within a local window, so that	the  order  of
       residues	 1-10,	11-20, etc, is destroyed but a residue in the first 10
       is never swapped with a residue outside the first ten, and  so  on  for
       each local window.

EXAMPLES
       (1)    prss34  -v 10 musplfm.aa lcbo.aa

       Compare	the  amino  acid  sequence in the file musplfm.aa with that in
       lcbo.aa, then shuffle lcbo.aa 200 times using a local  shuffle  with  a
       window  of  10.	Report the significance of the unshuffled musplfm/lcbo
       comparison scores with respect to the shuffled scores.

       (2)    prss34 musplfm.aa lcbo.aa 1000

       Compare the amino  acid	sequence  in  the  file	 musplfm.aa  with  the
       sequences  in the file lcbo.aa, shuffling lcbo.aa 1000 times.  Shuffles
       can also be specified with the -k # option.

       (3)    prfx34 mgstm1.esq xurt8c.aa 2 1000

       Translate the DNA sequence in the mgstm1.esq file in all six frames and
       compare	it  to	the  amino  acid sequence in the file xurt8c.aa, using
       ktup=2 and shuffling xurt8c.aa 1000 times.  Each	 comparison  considers
       the best forward or reverse alignment with frameshifts, using the fastx
       algorithm (Pearson et al (1997) Genomics 46:24-36).

       (4)    prss34/prfx34

       Run prss in interactive mode.  The program will	prompt	for  the  file
       name  of	 the two query sequence files and the number of shuffles to be
       used.

OPTIONS
       prss34/prfx34 can be directed to change the scoring matrix, gap	penal‐
       ties,  and  shuffle  parameters by entering options on the command line
       (preceeded by a `-'). All of the options should preceed the file	 names
       number of shuffles.

       -A     Show unshuffled alignment.

       -f #   Penalty for opening a gap (-10 by default for proteins).

       -g #   Penalty  for  additional	residues  in a gap (-2 by default) for
	      proteins.

       -H     Do not display histogram of similarity scores.

       -k #   Number of shuffles (200 is the default)

       -Q -q  "quiet" - do not prompt for filename.

       -O filename
	      send copy of results to "filename."

       -s str specify the scoring matrix.  BLOSUM50 is	used  by  default  for
	      proteins;	 +5/-4	is  used by defaul for DNA.  prss34 recognizes
	      the same scoring matrices as fasta34, ssearch34,	fastx34,  etc;
	      e.g.  BL50,  P250, BL62, BL80, MD10, MD20, and other matrices in
	      BLAST1.4 matrix format.

       -v #   Use a local window shuffle with a window size of #.

       -z #   Calculate	 statistical  significance  using  the	 mean/variance
	      (moments) approach used by fasta34/ssearch or from maximum like‐
	      lihood estimates of lambda and K.

       -Z #   Present statistical significance as if a '#' entry database  had
	      been searched (e.g. "-Z 50000" presents statistical significance
	      as if 50,000 sequences had been compared).

ENVIRONMENT VARIABLES
       (SMATRIX) the filename of an alternative scoring matrix file.  For pro‐
       tein  sequences,	 BLOSUM50  is used by default; PAM250 can be used with
       the command line option -s P250(or with -s pam250.mat).	 BLOSUM62  (-s
       BL62) and PAM120 (-S P120).

SEE ALSO
       ssearch3(1), fasta3(1).

AUTHOR
       Bill Pearson
       wrp@virginia.EDU

				     local			      PRSS3(1)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net