samtools man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

samtools(1)		     Bioinformatics tools		   samtools(1)

NAME
       samtools - Utilities for the Sequence Alignment/Map (SAM) format

SYNOPSIS
       samtools view -bt ref_list.txt -o aln.bam aln.sam.gz

       samtools sort -T /tmp/aln.sorted -o aln.sorted.bam aln.bam

       samtools index aln.sorted.bam

       samtools idxstats aln.sorted.bam

       samtools flagstat aln.sorted.bam

       samtools stats aln.sorted.bam

       samtools bedcov aln.sorted.bam

       samtools depth aln.sorted.bam

       samtools view aln.sorted.bam chr2:20,100,000-20,200,000

       samtools merge out.bam in1.bam in2.bam in3.bam

       samtools faidx ref.fasta

       samtools tview aln.sorted.bam ref.fasta

       samtools split merged.bam

       samtools quickcheck in1.bam in2.cram

       samtools dict -a GRCh38 -s "Homo sapiens" ref.fasta

       samtools fixmate in.namesorted.sam out.bam

       samtools mpileup -C50 -gf ref.fasta -r chr3:1,000-2,000 in1.bam in2.bam

       samtools flags PAIRED,UNMAP,MUNMAP

       samtools fastq input.bam > output.fastq

       samtools fasta input.bam > output.fasta

       samtools	 addreplacerg  -r 'ID:fish' -r 'LB:1334' -r 'SM:alpha' -o out‐
       put.bam input.bam

       samtools collate aln.sorted.bam aln.name_collated.bam

       samtools depad input.bam

DESCRIPTION
       Samtools is a set of utilities that manipulate alignments  in  the  BAM
       format. It imports from and exports to the SAM (Sequence Alignment/Map)
       format, does sorting, merging and  indexing,  and  allows  to  retrieve
       reads in any regions swiftly.

       Samtools	 is designed to work on a stream. It regards an input file `-'
       as the standard input (stdin) and an output file `-'  as	 the  standard
       output (stdout). Several commands can thus be combined with Unix pipes.
       Samtools always output warning and error messages to the standard error
       output (stderr).

       Samtools	 is  also able to open a BAM (not SAM) file on a remote FTP or
       HTTP server if the BAM file name starts	with  `ftp://'	or  `http://'.
       Samtools	 checks	 the  current working directory for the index file and
       will download the index upon absence. Samtools does  not	 retrieve  the
       entire alignment file unless it is asked to do so.

COMMANDS AND OPTIONS
       view	 samtools view [options] in.sam|in.bam|in.cram [region...]

		 With  no  options or regions specified, prints all alignments
		 in the specified input alignment file (in SAM, BAM,  or  CRAM
		 format) to standard output in SAM format (with no header).

		 You may specify one or more space-separated region specifica‐
		 tions after the input filename to  restrict  output  to  only
		 those	alignments  which overlap the specified region(s). Use
		 of region specifications  requires  a	coordinate-sorted  and
		 indexed input file (in BAM or CRAM format).

		 The  -b, -C, -1, -u, -h, -H, and -c options change the output
		 format from the default of headerless SAM, and the -o and  -U
		 options set the output file name(s).

		 The  -t and -T options provide additional reference data. One
		 of these two options is required when SAM input does not con‐
		 tain  @SQ  headers,  and  the	-T option is required whenever
		 writing CRAM output.

		 The -L, -r, -R, -q, -l, -m, -f, and  -F  options  filter  the
		 alignments  that will be included in the output to only those
		 alignments that match certain criteria.

		 The -x, -B, and -s options modify the data which is contained
		 in each alignment.

		 Finally,  the	-@  option  can be used to allocate additional
		 threads to be	used  for  compression,	 and  the  -?	option
		 requests a long help message.

       REGIONS:
		 Regions  can  be  specified as: RNAME[:STARTPOS[-ENDPOS]] and
		 all position coordinates are 1-based.

		 Important note: when multiple regions are given, some	align‐
		 ments	may be output multiple times if they overlap more than
		 one of the specified regions.

		 Examples of region specifications:

		 chr1	   Output  all	alignments  mapped  to	the  reference
			   sequence named `chr1' (i.e. @SQ SN:chr1).

		 chr2:1000000
			   The	region	on  chr2  beginning  at	 base position
			   1,000,000 and ending at the end of the chromosome.

		 chr3:1000-2000
			   The 1001bp region on chr3 beginning at  base	 posi‐
			   tion	 1,000	and  ending  at	 base  position	 2,000
			   (including both end positions).

		 '*'	   Output the unmapped reads at the end of  the	 file.
			   (This does not include any unmapped reads placed on
			   a reference sequence alongside their mapped mates.)

		 .	   Output all alignments.  (Mostly unnecessary as  not
			   specifying a region at all has the same effect.)

       OPTIONS:

		 -b	   Output in the BAM format.

		 -C	   Output in the CRAM format (requires -T).

		 -1	   Enable fast BAM compression (implies -b).

		 -u	   Output  uncompressed	 BAM.  This  option saves time
			   spent on compression/decompression and is thus pre‐
			   ferred when the output is piped to another samtools
			   command.

		 -h	   Include the header in the output.

		 -H	   Output the header only.

		 -c	   Instead of printing the alignments, only count them
			   and	print  the  total  number. All filter options,
			   such as -f, -F, and -q, are taken into account.

		 -?	   Output long help and exit immediately.

		 -o FILE   Output to FILE [stdout].

		 -U FILE   Write alignments that are not selected by the vari‐
			   ous	filter	options	 to FILE.  When this option is
			   used, all alignments (or all alignments  intersect‐
			   ing	the  regions  specified) are written to either
			   the output file or this file, but never both.

		 -t FILE   A tab-delimited FILE.  Each line must  contain  the
			   reference  name  in the first column and the length
			   of the reference in the  second  column,  with  one
			   line	 for  each distinct reference.	Any additional
			   fields beyond the second column are	ignored.  This
			   file	 also  defines	the  order  of	the  reference
			   sequences in sorting. If you run:  `samtools	 faidx
			   <ref.fa>',  the  resulting  index file <ref.fa>.fai
			   can be used as this FILE.

		 -T FILE   A FASTA  format  reference  FILE,  optionally  com‐
			   pressed  by	bgzip  and ideally indexed by samtools
			   faidx.  If an index is not  present,	 one  will  be
			   generated for you.

		 -L FILE   Only	 output	 alignments  overlapping the input BED
			   FILE [null].

		 -r STR	   Only output alignments in read group STR [null].

		 -R FILE   Output alignments in read  groups  listed  in  FILE
			   [null].

		 -q INT	   Skip alignments with MAPQ smaller than INT [0].

		 -l STR	   Only output alignments in library STR [null].

		 -m INT	   Only	 output	 alignments with number of CIGAR bases
			   consuming query sequence ≥ INT [0]

		 -f INT	   Only output alignments with all  bits  set  in  INT
			   present in the FLAG field.  INT can be specified in
			   hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/) or
			   in  octal  by  beginning with `0' (i.e. /^0[0-7]+/)
			   [0].

		 -F INT	   Do not output alignments with any bits set  in  INT
			   present in the FLAG field.  INT can be specified in
			   hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/) or
			   in  octal  by  beginning with `0' (i.e. /^0[0-7]+/)
			   [0].

		 -x STR	   Read tag to exclude from output (repeatable) [null]

		 -B	   Collapse the backward CIGAR operation.

		 -s FLOAT  Integer part is used to seed the random number gen‐
			   erator  [0].	 Part after the decimal point sets the
			   fraction of templates/pairs to subsample  [no  sub‐
			   sampling].

		 -@ INT	   Number  of  BAM compression threads to use in addi‐
			   tion to main thread [0].

		 -S	   Ignored for compatibility  with  previous  samtools
			   versions.   Previously  this option was required if
			   input was in SAM format, but now the correct format
			   is  automatically  detected	by examining the first
			   few characters of input.

       sort	 samtools sort [-l level] [-m maxMem] [-o out.bam] [-O format]
		 [-n] [-T tmpprefix] [-@ threads] [in.sam|in.bam|in.cram]

		 Sort alignments by leftmost coordinates, or by read name when
		 -n is used.  An appropriate @HD-SO sort order header tag will
		 be added or an existing one updated if necessary.

		 The  sorted  output is written to standard output by default,
		 or to the specified file (out.bam) when  -o  is  used.	  This
		 command  will also create temporary files tmpprefix.%d.bam as
		 needed when the entire alignment data cannot fit into	memory
		 (as controlled via the -m option).

		 Options:

		 -l INT	    Set	 the  desired  compression level for the final
			    output file, ranging from 0	 (uncompressed)	 or  1
			    (fastest  but minimal compression) to 9 (best com‐
			    pression  but  slowest  to	write),	 similarly  to
			    gzip(1)'s compression level setting.

			    If	-l  is not used, the default compression level
			    will apply.

		 -m INT	    Approximately  the	maximum	 required  memory  per
			    thread,  specified either in bytes or with a K, M,
			    or G suffix.  [768 MiB]

		 -n	    Sort by read names (i.e., the QNAME field)	rather
			    than by chromosomal coordinates.

		 -o FILE    Write the final sorted output to FILE, rather than
			    to standard output.

		 -O FORMAT  Write the final output as sam, bam, or cram.

			    By default, samtools  tries	 to  select  a	format
			    based  on  the -o filename extension; if output is
			    to standard output or no format  can  be  deduced,
			    bam is selected.

		 -T PREFIX  Write  temporary  files  to	 PREFIX.nnnn.bam.   By
			    default, any temporary files are written alongside
			    the input file, as in.bam.tmp.nnnn.bam.

		 -@ INT	    Set number of sorting and compression threads.  By
			    default, operation is single-threaded.

		 Historically samtools sort also accepted a less flexible  way
		 of specifying the final and temporary output filenames:

			samtools sort [-f] [-o] in.bam out.prefix

		 This  has now been removed.  The previous out.prefix argument
		 (and -f option, if any) should be changed to  an  appropriate
		 combination of -T PREFIX and -o FILE.	The previous -o option
		 should be removed, as output defaults to standard output.

       index	 samtools index [-bc] [-m INT] aln.bam|aln.cram [out.index]

		 Index a coordinate-sorted BAM or CRAM file  for  fast	random
		 access.  (Note that this does not work with SAM files even if
		 they are bgzip compressed — to index such files, use tabix(1)
		 instead.)

		 This  index is needed when region arguments are used to limit
		 samtools view and similar commands to particular  regions  of
		 interest.

		 If  an output filename is given, the index file will be writ‐
		 ten to out.index.  Otherwise, for a CRAM file aln.cram, index
		 file  aln.cram.crai  will be created; for a BAM file aln.bam,
		 either aln.bam.bai or aln.bam.csi will be created,  depending
		 on the index format selected.

		 Options:

		 -b	 Create	 a  BAI	 index.	 This is currently the default
			 when no format options are used.

		 -c	 Create a CSI index.  By default, the minimum interval
			 size  for the index is 2^14, which is the same as the
			 fixed value used by the BAI format.

		 -m INT	 Create a CSI index, with a minimum interval  size  of
			 2^INT.

       idxstats	 samtools idxstats in.sam|in.bam|in.cram

		 Retrieve  and	print stats in the index file corresponding to
		 the input file.  Before calling idxstats, the input BAM  file
		 must be indexed by samtools index.

		 The output is TAB-delimited with each line consisting of ref‐
		 erence sequence name, sequence length, # mapped reads	and  #
		 unmapped reads. It is written to stdout.

       flagstat	 samtools flagstat in.sam|in.bam|in.cram

		 Does  a  full	pass  through  the input file to calculate and
		 print statistics to stdout.

		 Provides counts for each of 13 categories based primarily  on
		 bit  flags  in the FLAG field. Each category in the output is
		 broken down into QC pass and QC fail, which is	 presented  as
		 "#PASS + #FAIL" followed by a description of the category.

		 The  first row of output gives the total number of reads that
		 are QC pass and fail (according to flag bit 0x200). For exam‐
		 ple:

		   122 + 28 in total (QC-passed reads + QC-failed reads)

		 Which	would  indicate that there are a total of 150 reads in
		 the input file, 122 of which are marked as QC pass and 28  of
		 which are marked as "not passing quality controls"

		 Following  this,  additional  categories  are given for reads
		 which are:

			 secondary
				0x100 bit set

			 supplementary
				0x800 bit set

			 duplicates
				0x400 bit set

			 mapped 0x4 bit not set

			 paired in sequencing
				0x1 bit set

			 read1	both 0x1 and 0x40 bits set

			 read2	both 0x1 and 0x80 bits set

			 properly paired
				both 0x1 and 0x2 bits set and 0x4 bit not set

			 with itself and mate mapped
				0x1 bit set and neither 0x4 nor 0x8 bits set

			 singletons
				both 0x1 and 0x8 bits set and bit 0x4 not set

		 And finally, two rows are given that additionally  filter  on
		 the  reference	 name (RNAME), mate reference name (MRNM), and
		 mapping quality (MAPQ) fields:

			 with mate mapped to a different chr
				0x1 bit set and neither 0x4 nor 0x8  bits  set
				and MRNM not equal to RNAME

			 with mate mapped to a different chr (mapQ>=5)
				0x1  bit  set and neither 0x4 nor 0x8 bits set
				and MRNM not equal to RNAME and MAPQ >= 5

       stats	 samtools stats [options] in.sam|in.bam|in.cram [region...]

		 samtools stats collects statistics from BAM files and outputs
		 in  a	text format.  The output can be visualized graphically
		 using plot-bamstats.

		 Options:

		 -c, --coverage MIN,MAX,STEP
			 Set coverage  distribution  to	 the  specified	 range
			 (MIN, MAX, STEP all given as integers) [1,1000,1]

		 -d, --remove-dups
			 Exclude from statistics reads marked as duplicates

		 -f, --required-flag STR|INT
			 Required flag, 0 for unset. See also `samtools flags`
			 [0]

		 -F, --filtering-flag STR|INT
			 Filtering flag,  0  for  unset.  See  also  `samtools
			 flags` [0]

		 --GC-depth FLOAT
			 the  size  of	GC-depth  bins	(decreasing  bin  size
			 increases memory requirement) [2e4]

		 -h, --help
			 This help message

		 -i, --insert-size INT
			 Maximum insert size [8000]

		 -I, --id STR
			 Include only listed read group or sample name []

		 -l, --read-length INT
			 Include in the statistics only reads with  the	 given
			 read length []

		 -m, --most-inserts FLOAT
			 Report only the main part of inserts [0.99]

		 -P, --split-prefix STR
			 A  path or string prefix to prepend to filenames out‐
			 put when creating categorised statistics  files  with
			 -S/--split.  [input filename]

		 -q, --trim-quality INT
			 The BWA trimming parameter [0]

		 -r, --ref-seq FILE
			 Reference  sequence  (required	 for GC-depth and mis‐
			 matches-per-cycle calculation).  []

		 -S, --split TAG
			 In addition to the complete statistics,  also	output
			 categorised  statistics based on the tagged field TAG
			 (e.g., use --split RG to split into read groups).

			 Categorised statistics are  written  to  files	 named
			 <prefix>_<value>.bamstat, where prefix is as given by
			 --split-prefix (or the input filename by default) and
			 value	has  been  encountered as the specified tagged
			 field's value in one or more alignment records.

		 -t, --target-regions FILE
			 Do stats in these regions  only.  Tab-delimited  file
			 chr,from,to, 1-based, inclusive.  []

		 -x, --sparse
			 Suppress outputting IS rows where there are no inser‐
			 tions.

       bedcov	 samtools bedcov region.bed in1.sam|in1.bam|in1.cram[...]

		 Reports read depth per genomic region, as  specified  in  the
		 supplied BED file.

       depth	 samtools     depth	[options]    [in1.sam|in1.bam|in1.cram
		 [in2.sam|in2.bam|in2.cram] [...]]

		 Computes the depth at each position or region.

		 Options:

		 -a	 Output	 all  positions	 (including  those  with  zero
			 depth)

		 -a -a, -aa
			 Output	 absolutely  all  positions,  including unused
			 reference sequences

		 -b FILE Compute depth at list	of  positions  or  regions  in
			 specified BED FILE.  []

		 -f FILE Use  the  BAM	files specified in the FILE (a file of
			 filenames, one file per line) []

		 -l INT	 Ignore reads shorter than INT

		 -m, -d INT
			 Truncate reported depth at a maximum  of  INT	reads.
			 [8000]

		 -q INT	 Only count reads with base quality greater than INT

		 -Q INT	 Only  count  reads  with mapping quality greater than
			 INT

		 -r CHR:FROM-TO
			 Only report depth in specified region.

       merge	 samtools merge [-nur1f] [-h inh.sam]  [-R  reg]  [-b  <list>]
		 <out.bam> <in1.bam> [<in2.bam> <in3.bam> ... <inN.bam>]

		 Merge	multiple  sorted  alignment  files, producing a single
		 sorted output file that contains all the  input  records  and
		 maintains the existing sort order.

		 If  -h	 is  specified	the @SQ headers of input files will be
		 merged into the specified  header,  otherwise	they  will  be
		 merged	 into  a composite header created from the input head‐
		 ers.  If in the process of merging @SQ lines  for  coordinate
		 sorted	 input	files,	a conflict arises as to the order (for
		 example input1.bam has	 @SQ  for  a,b,c  and  input2.bam  has
		 b,a,c)	 then  the  resulting  output file will need to be re-
		 sorted back into coordinate order.

		 Unless the -c or -p flags are specified then when merging @RG
		 and  @PG records into the output header then any IDs found to
		 be duplicates of existing IDs in the output header will  have
		 a  suffix  appended to them to diffientiate them from similar
		 header records from other files and the read records will  be
		 updated to reflect this.

		 OPTIONS:

		 -1	 Use zlib compression level 1 to compress the output.

		 -b FILE List of input BAM files, one file per line.

		 -f	 Force to overwrite the output file if present.

		 -h FILE Use  the lines of FILE as `@' headers to be copied to
			 out.bam, replacing any header lines that would other‐
			 wise  be  copied  from in1.bam.  (FILE is actually in
			 SAM format, though any alignment records it may  con‐
			 tain are ignored.)

		 -n	 The  input alignments are sorted by read names rather
			 than by chromosomal coordinates

		 -R STR	 Merge files in the specified region indicated by  STR
			 [null]

		 -r	 Attach	 an RG tag to each alignment. The tag value is
			 inferred from file names.

		 -u	 Uncompressed BAM output

		 -c	 When several input files contain @RG headers with the
			 same  ID,  emit  only one of them (namely, the header
			 line from the first file we find that ID in)  to  the
			 merged	 output file.  Combining these similar headers
			 is usually the right thing to do when the files being
			 merged originated from the same file.

			 Without  -c,  all  @RG	 headers  appear in the output
			 file, with random suffices added to their  IDs	 where
			 necessary to differentiate them.

		 -p	 Similarly,  for  each	@PG  ID in the set of files to
			 merge, use the @PG line of the	 first	file  we  find
			 that ID in rather than adding a suffix to differenti‐
			 ate similar IDs.

       faidx	 samtools faidx <ref.fasta> [region1 [...]]

		 Index reference sequence in the FASTA format or extract  sub‐
		 sequence  from	 indexed  reference  sequence. If no region is
		 specified,   faidx   will   index   the   file	  and	create
		 <ref.fasta>.fai  on  the  disk. If regions are specified, the
		 subsequences will be retrieved and printed to stdout  in  the
		 FASTA format.

		 The input file can be compressed in the BGZF format.

		 The  sequences	 in  the  input file should all have different
		 names.	 If they do not, indexing will emit  a	warning	 about
		 duplicate  sequences  and  retrieval will only produce subse‐
		 quences from the first sequence with the duplicated name.

       tview	 samtools  tview  [-p	chr:pos]   [-s	 STR]	[-d   display]
		 <in.sorted.bam> [ref.fasta]

		 Text  alignment viewer (based on the ncurses library). In the
		 viewer, press `?' for help and press `g' to check the	align‐
		 ment	 start	  from	 a   region   in   the	 format	  like
		 `chr10:10,000,000' or `=10,000,000'  when  viewing  the  same
		 reference sequence.

		 Options:

		 -d display    Output as (H)tml or (C)urses or (T)ext

		 -p chr:pos    Go directly to this position

		 -s STR	       Display	only  alignments  from	this sample or
			       read group

       split	 samtools split [options] merged.sam|merged.bam|merged.cram

		 Splits a file by read group.

		 Options:

		 -u FILE1      Put reads with no RG tag or an unrecognised  RG
			       tag into FILE1

		 -u FILE1:FILE2
			       As above, but assigns an RG tag as given in the
			       header of FILE2

		 -f STRING     Output  filename	 format	 string	 (see	below)
			       ["%*_%#.%."]

		 -v	       Verbose output

		 Format string expansions:

			     %%	  %
			     %*	  basename
			     %#	  @RG index
			     %!	  @RG ID
			     %.	  output format filename extension

       quickcheck
		 samtools quickcheck [options] in.sam|in.bam|in.cram [ ... ]

		 Quickly  check	 that  input files appear to be intact. Checks
		 that beginning of the file contains a valid header (all  for‐
		 mats)	containing at least one target sequence and then seeks
		 to the end of the file and checks that an  end-of-file	 (EOF)
		 is present and intact (BAM only).

		 Data  in  the middle of the file is not read since that would
		 be much more time consuming, so please note that this command
		 will  not detect internal corruption, but is useful for test‐
		 ing that files	 are  not  truncated  before  performing  more
		 intensive tasks on them.

		 This command will exit with a non-zero exit code if any input
		 files don't have a valid header or are missing an EOF	block.
		 Otherwise it will exit successfully (with a zero exit code).

		 Options:

		 -v	 Verbose  output: will additionally print the names of
			 all input files that don't pass the check to  stdout.
			 Multiple  -v  options	will cause additional messages
			 regarding check results to be printed to stderr.

       dict	 samtools dict <ref.fasta|ref.fasta.gz>

		 Create a sequence dictionary file from a fasta file.

		 OPTIONS:

		 -a, --assembly STR
			    Specify the assembly for the AS tag.

		 -H, --no-header
			    Do not print the @HD header line.

		 -o, --output FILE
			    Output to FILE [stdout].

		 -s, --species STR
			    Specify the species for the SP tag.

		 -u, --uri STR
			    Specify the URI for the UR tag.  Defaults  to  the
			    absolute  path  of	ref.fasta  unless reading from
			    stdin.

       fixmate	 samtools fixmate [-rpc] [-O format] in.nameSrt.bam out.bam

		 Fill in mate coordinates, ISIZE and mate related flags from a
		 name-sorted alignment.

		 OPTIONS:

		 -r	    Remove secondary and unmapped reads.

		 -p	    Disable FR proper pair check.

		 -c	    Add template cigar ct tag.

		 -O FORMAT  Write the final output as sam, bam, or cram.

			    By	default,  samtools  tries  to  select a format
			    based on the output filename extension; if	output
			    is to standard output or no format can be deduced,
			    bam is selected.

       mpileup	 samtools mpileup [-EBugp] [-C capQcoef] [-r reg]  [-f	in.fa]
		 [-l list] [-Q minBaseQ] [-q minMapQ] in.bam [in2.bam [...]]

		 Generate  VCF,	 BCF  or pileup for one or multiple BAM files.
		 Alignment records are grouped by sample (SM)  identifiers  in
		 @RG  header  lines.  If  sample  identifiers are absent, each
		 input file is regarded as one sample.

		 In the pileup format (without -u or -g), each line represents
		 a  genomic  position,	consisting of chromosome name, 1-based
		 coordinate, reference base, the number of reads covering  the
		 site, read bases, base qualities and alignment mapping quali‐
		 ties. Information on match, mismatch, indel, strand,  mapping
		 quality  and  start  and end of a read are all encoded at the
		 read base column. At this column, a dot stands for a match to
		 the reference base on the forward strand, a comma for a match
		 on the reverse strand, a '>' or '<'  for  a  reference	 skip,
		 `ACGTN'  for a mismatch on the forward strand and `acgtn' for
		 a mismatch on the reverse strand. A  pattern  `\+[0-9]+[ACGT‐
		 Nacgtn]+' indicates there is an insertion between this refer‐
		 ence position and the next reference position. The length  of
		 the  insertion	 is  given by the integer in the pattern, fol‐
		 lowed	by  the	 inserted  sequence.  Similarly,   a   pattern
		 `-[0-9]+[ACGTNacgtn]+'	 represents a deletion from the refer‐
		 ence. The deleted bases will be presented as `*' in the  fol‐
		 lowing	 lines.	 Also  at  the	read base column, a symbol `^'
		 marks the start of a read. The ASCII of the character follow‐
		 ing  `^'  minus  33  gives  the mapping quality. A symbol `$'
		 marks the end of a read segment.

		 Input Options:

		 -6, --illumina1.3+
			   Assume the quality is in the Illumina  1.3+	encod‐
			   ing.

		 -A, --count-orphans
			   Do  not  skip anomalous read pairs in variant call‐
			   ing.

		 -b, --bam-list FILE
			   List of input BAM files, one file per line [null]

		 -B, --no-BAQ
			   Disable probabilistic realignment for the  computa‐
			   tion	 of  base  alignment quality (BAQ). BAQ is the
			   Phred-scaled probability of a read base being  mis‐
			   aligned.  Applying  this  option  greatly  helps to
			   reduce false SNPs caused by misalignments.

		 -C, --adjust-MQ INT
			   Coefficient for  downgrading	 mapping  quality  for
			   reads containing excessive mismatches. Given a read
			   with a phred-scaled probability q of	 being	gener‐
			   ated	 from  the  mapped  position,  the new mapping
			   quality  is	about  sqrt((INT-q)/INT)*INT.  A  zero
			   value  disables this functionality; if enabled, the
			   recommended value for BWA is 50. [0]

		 -d, --max-depth INT
			   At a position, read maximally INT reads  per	 input
			   BAM. [250]

		 -E, --redo-BAQ
			   Recalculate BAQ on the fly, ignore existing BQ tags

		 -f, --fasta-ref FILE
			   The	faidx-indexed reference file in the FASTA for‐
			   mat. The  file  can	be  optionally	compressed  by
			   bgzip.  [null]

		 -G, --exclude-RG FILE
			   Exclude  reads  from readgroups listed in FILE (one
			   @RG-ID per line)

		 -l, --positions FILE
			   BED or position list	 file  containing  a  list  of
			   regions or sites where pileup or BCF should be gen‐
			   erated. If BED,  positions  are  0-based  half-open
			   [null]

		 -q, -min-MQ INT
			   Minimum mapping quality for an alignment to be used
			   [0]

		 -Q, --min-BQ INT
			   Minimum base quality for a base  to	be  considered
			   [13]

		 -r, --region STR
			   Only	 generate  pileup  in region. Requires the BAM
			   files to be indexed.	 If used in  conjunction  with
			   -l  then  considers	the  intersection  of  the two
			   requests.  STR [all sites]

		 -R, --ignore-RG
			   Ignore RG tags. Treat all reads in one BAM  as  one
			   sample.

		 --rf, --incl-flags STR|INT
			   Required  flags:  skip  reads  with mask bits unset
			   [null]

		 --ff, --excl-flags STR|INT
			   Filter  flags:  skip	 reads	with  mask  bits   set
			   [UNMAP,SECONDARY,QCFAIL,DUP]

		 -x, --ignore-overlaps
			   Disable read-pair overlap detection.

		 Output Options:

		 -o, --output FILE
			   Write pileup or VCF/BCF output to FILE, rather than
			   the default of standard output.

			   (The same short option is used for both --open-prob
			   and	--output.   If -o's argument contains any non-
			   digit characters other than a leading + or -	 sign,
			   it  is  interpreted as --output.  Usually the file‐
			   name extension will take care of this, but to write
			   to  an  entirely  numeric  filename use -o ./123 or
			   --output 123.)

		 -g, --BCF Compute genotype likelihoods and output them in the
			   binary call format (BCF).  As of v1.0, this is BCF2
			   which is incompatible with the BCF1 format produced
			   by previous (0.1.x) versions of samtools.

		 -v, --VCF Compute genotype likelihoods and output them in the
			   variant call format (VCF).	Output	is  bgzip-com‐
			   pressed VCF unless -u option is set.

		 Output Options for mpileup format (without -g or -v):

		 -O, --output-BP
			   Output base positions on reads.

		 -s, --output-MQ
			   Output mapping quality.

		 Output Options for VCF/BCF format (with -g or -v):

		 -D	   Output  per-sample  read depth [DEPRECATED - use -t
			   DP instead]

		 -S	   Output per-sample Phred-scaled strand bias  P-value
			   [DEPRECATED - use -t SP instead]

		 -t, --output-tags LIST
			   Comma-separated  list  of  FORMAT  and INFO tags to
			   output (case-insensitive): AD (Allelic depth,  FOR‐
			   MAT),  INFO/AD  (Total  allelic  depth,  INFO), ADF
			   (Allelic depths on  the  forward  strand,  FORMAT),
			   INFO/ADF  (Total  allelic  depths  on  the  forward
			   strand, INFO), ADR (Allelic depths on  the  reverse
			   strand,  FORMAT), INFO/ADR (Total allelic depths on
			   the reverse strand, INFO), DP (Number of high-qual‐
			   ity	bases, FORMAT), DV (Deprecated in favor of AD;
			   Number of high-quality  non-reference  bases,  FOR‐
			   MAT),  DPR  (Deprecated  in	favor of AD; Number of
			   high-quality bases for each observed	 allele,  FOR‐
			   MAT),  INFO/DPR  (Number  of high-quality bases for
			   each observed allele,  INFO),  DP4  (Deprecated  in
			   favor  of  ADF and ADR; Number of high-quality ref-
			   forward, ref-reverse, alt-forward  and  alt-reverse
			   bases,  FORMAT),  SP	 (Phred-scaled	strand bias P-
			   value, FORMAT) [null]

		 -u, --uncompressed
			   Generate uncompressed VCF/BCF output, which is pre‐
			   ferred for piping.

		 -V	   Output  per-sample  number  of  non-reference reads
			   [DEPRECATED - use -t DV instead]

		 Options for SNP/INDEL Genotype Likelihood Computation (for -g
		 or -v):

		 -e, --ext-prob INT
			   Phred-scaled	 gap extension sequencing error proba‐
			   bility. Reducing INT leads to longer indels. [20]

		 -F, --gap-frac FLOAT
			   Minimum fraction of gapped reads [0.002]

		 -h, --tandem-qual INT
			   Coefficient for modeling homopolymer errors.	 Given
			   an  l-long homopolymer run, the sequencing error of
			   an indel of size s is modeled as INT*s/l.  [100]

		 -I, --skip-indels
			   Do not perform INDEL calling

		 -L, --max-idepth INT
			   Skip INDEL calling if the average per-sample	 depth
			   is above INT.  [250]

		 -m, --min-ireads INT
			   Minimum  number  gapped  reads for indel candidates
			   INT.	 [1]

		 -o, --open-prob INT
			   Phred-scaled gap open sequencing error probability.
			   Reducing INT leads to more indel calls. [40]

			   (The same short option is used for both --open-prob
			   and --output.  When -o's argument contains only  an
			   optional + or - sign followed by the digits 0 to 9,
			   it is interpreted as --open-prob.)

		 -p, --per-sample-mF
			   Apply -m and -F thresholds per sample  to  increase
			   sensitivity	of  calling.   By default both options
			   are applied to reads pooled from all samples.

		 -P, --platforms STR
			   Comma-delimited list of  platforms  (determined  by
			   @RG-PL)  from  which indel candidates are obtained.
			   It is recommended to collect indel candidates  from
			   sequencing  technologies  that have low indel error
			   rate such as ILLUMINA. [all]

       flags	 samtools flags INT|STR[,...]

		 Convert between textual and numeric flag representation.

		 FLAGS:

		   0x1	 PAIRED		 paired-end (or multiple-segment) sequencing technology
		   0x2	 PROPER_PAIR	 each segment properly aligned according to the aligner
		   0x4	 UNMAP		 segment unmapped
		   0x8	 MUNMAP		 next segment in the template unmapped
		  0x10	 REVERSE	 SEQ is reverse complemented
		  0x20	 MREVERSE	 SEQ of the next segment in the template is reverse complemented
		  0x40	 READ1		 the first segment in the template
		  0x80	 READ2		 the last segment in the template
		 0x100	 SECONDARY	 secondary alignment
		 0x200	 QCFAIL		 not passing quality controls
		 0x400	 DUP		 PCR or optical duplicate
		 0x800	 SUPPLEMENTARY	 supplementary alignment

       fastq/a	 samtools fastq [options] in.bam
		 samtools fasta [options] in.bam

		 Converts a BAM or CRAM into  either  FASTQ  or	 FASTA	format
		 depending on the command invoked.

		 OPTIONS:

		 -n	 By  default,  either '/1' or '/2' is added to the end
			 of read names where the  corresponding	 BAM_READ1  or
			 BAM_READ2 flag is set.	 Using -n causes read names to
			 be left as they are.

		 -O	 Use quality values from  OQ  tags  in	preference  to
			 standard quality string if available.

		 -s FILE Write singleton reads in FASTQ format to FILE instead
			 of outputting them.

		 -t	 Copy RG, BC and QT tags to the FASTQ header line,  if
			 they exist.

		 -1 FILE Write	reads  with  the  BAM_READ1  flag  set to FILE
			 instead of outputting them.

		 -2 FILE Write reads with  the	BAM_READ2  flag	 set  to  FILE
			 instead of outputting them.

		 -0 FILE Write reads with both or neither of the BAM_READ1 and
			 BAM_READ2 flags set to	 FILE  instead	of  outputting
			 them.

		 -f INT	 Only  output  alignments  with	 all  bits  set in INT
			 present in the FLAG field.  INT can be	 specified  in
			 hex  by  beginning with `0x' (i.e. /^0x[0-9A-F]+/) or
			 in octal by beginning with `0' (i.e. /^0[0-7]+/) [0].

		 -F INT	 Do not output alignments with any  bits  set  in  INT
			 present  in  the FLAG field.  INT can be specified in
			 hex by beginning with `0x' (i.e.  /^0x[0-9A-F]+/)  or
			 in octal by beginning with `0' (i.e. /^0[0-7]+/) [0].

       collate	 samtools collate [options] in.sam|in.bam|in.cram [out.prefix]

		 Shuffles  and groups reads together by their names.  A faster
		 alternative to a full query name sort, collate	 ensures  that
		 reads	of  the	 same  name are grouped together in contiguous
		 groups, but doesn't make any guarantees about	the  order  of
		 read names between groups.

		 The output from this command should be suitable for any oper‐
		 ation that requires all reads from the same  template	to  be
		 grouped together.

		 Options:

		 -O	 Output	 to  stdout rather than to files starting with
			 out.prefix

		 -u	 Write uncompressed BAM output

		 -l INT	 Compression level.  [1]

		 -n INT	 Number of temporary files to use.  [64]

       reheader	 samtools reheader [-iP] in.header.sam in.bam

		 Replace  the  header	in   in.bam   with   the   header   in
		 in.header.sam.	  This	command	 is much faster than replacing
		 the header with a BAM→SAM→BAM conversion.

		 By default this command outputs the BAM or CRAM file to stan‐
		 dard  output  (stdout),  but for CRAM format files it has the
		 option to perform an in-place edit, both reading and  writing
		 to  the  same file.  No validity checking is performed on the
		 header, nor that it is suitable to use with the sequence data
		 itself.

		 OPTIONS:

		 -P, --no-PG
			 Do not generate an @PG header line.

		 -i, --in-place
			 Perform  the header edit in-place, if possible.  This
			 only works on CRAM files and only if there is	suffi‐
			 cient	room  to  store the new header.	 The amount of
			 space available will differ for each CRAM file.

       cat	 samtools cat [-h header.sam] [-o out.bam] <in1.bam> <in2.bam>
		 [ ... ]

		 Concatenate  BAMs.  The sequence dictionary of each input BAM
		 must be identical, although this command does not check this.
		 This  command	uses a similar trick to reheader which enables
		 fast BAM concatenation.

       rmdup	 samtools rmdup [-sS] <input.srt.bam> <out.bam>

		 Remove potential PCR duplicates: if multiple read pairs  have
		 identical  external  coordinates,  only  retain the pair with
		 highest mapping quality.  In the paired-end mode,  this  com‐
		 mand  ONLY  works  with  FR orientation and requires ISIZE is
		 correctly set. It does not work for unpaired reads (e.g.  two
		 ends mapped to different chromosomes or orphan reads).

		 OPTIONS:

		 -s	 Remove	 duplicates  for single-end reads. By default,
			 the command works for paired-end reads only.

		 -S	 Treat paired-end reads and single-end reads.

       addreplacerg
		 samtools addreplacerg [-r rg line | -R rg ID] [-m  mode]  [-l
		 level] [-o out.bam] <input.bam>

		 Adds or replaces read group tags in a file.

		 OPTIONS:

		 -r STRING
			 Allows	 you to specify a read group line to append to
			 the header and applies it to the reads	 specified  by
			 the  -m  option. If repeated it automatically adds in
			 tabs between invocations.

		 -R STRING
			 Allows you to specify the read group ID of an	exist‐
			 ing @RG line and applies it to the reads specified.

		 -m MODE If  you  choose orphan_only then existing RG tags are
			 not overwritten, if you choose overwrite_all,	exist‐
			 ing  RG  tags	are  overwritten. The default is over‐
			 write_all.

		 -o STRING
			 Write the final output to STRING. The default	is  to
			 write to stdout.

			 By  default,  samtools tries to select a format based
			 on the output filename extension;  if	output	is  to
			 standard  output  or no format can be deduced, bam is
			 selected.

       calmd	 samtools calmd [-Eeubr] [-C capQcoef] <aln.bam> <ref.fasta>

		 Generate the MD tag. If the MD tag is already	present,  this
		 command  will	give a warning if the MD tag generated is dif‐
		 ferent from the existing tag. Output SAM by default.

		 Calmd can also read and write CRAM  files  although  in  most
		 cases	it is pointless as CRAM recalculates MD and NM tags on
		 the fly.  The one exception to this case is where both	 input
		 and  output CRAM files have been / are being created with the
		 no_ref option.

		 OPTIONS:

		 -A	 When used jointly with -r this option overwrites  the
			 original base quality.

		 -e	 Convert  a  the  read base to = if it is identical to
			 the aligned reference base.  Indel  caller  does  not
			 support the = bases at the moment.

		 -u	 Output uncompressed BAM

		 -b	 Output compressed BAM

		 -C INT	 Coefficient  to  cap mapping quality of poorly mapped
			 reads. See the pileup command for details. [0]

		 -r	 Compute the BQ tag (without -A) or cap	 base  quality
			 by BAQ (with -A).

		 -E	 Extended  BAQ	calculation. This option trades speci‐
			 ficity for sensitivity, though the effect is minor.

       targetcut samtools targetcut [-Q minBaseQ] [-i inPenalty] [-0 em0]  [-1
		 em1] [-2 em2] [-f ref] <in.bam>

		 This  command identifies target regions by examining the con‐
		 tinuity of read depth, computes haploid  consensus  sequences
		 of targets and outputs a SAM with each sequence corresponding
		 to a target. When option -f is in use, BAQ will  be  applied.
		 This  command is only designed for cutting fosmid clones from
		 fosmid pool sequencing [Ref. Kitzman et al. (2010)].

       phase	 samtools phase [-AF] [-k len] [-b  prefix]  [-q  minLOD]  [-Q
		 minBaseQ] <in.bam>

		 Call and phase heterozygous SNPs.

		 OPTIONS:

		 -A	 Drop reads with ambiguous phase.

		 -b STR	 Prefix	 of  BAM  output.  When this option is in use,
			 phase-0 reads will be saved  in  file	STR.0.bam  and
			 phase-1 reads in STR.1.bam.  Phase unknown reads will
			 be randomly  allocated	 to  one  of  the  two	files.
			 Chimeric  reads  with	switch errors will be saved in
			 STR.chimeric.bam.  [null]

		 -F	 Do not attempt to fix chimeric reads.

		 -k INT	 Maximum length for local phasing. [13]

		 -q INT	 Minimum Phred-scaled LOD to call a heterozygote. [40]

		 -Q INT	 Minimum base quality to be used in het calling. [13]

       depad	 samtools depad [-SsCu1] [-T ref.fa] [-o output] <in.bam>

		 Converts a BAM aligned against a padded reference  to	a  BAM
		 aligned against the depadded reference.  The padded reference
		 may contain verbatim "*" bases in it, but "*" bases are  also
		 counted  in  the  reference  numbering.   This	 means	that a
		 sequence base-call aligned against a reference "*" is consid‐
		 ered  to be a cigar match ("M" or "X") operator (if the base-
		 call is "A", "C", "G" or "T").	 After depadding the reference
		 "*"  bases  are  deleted and such aligned sequence base-calls
		 become insertions.  Similarly transformations apply for dele‐
		 tions and padding cigar operations.

		 OPTIONS:

		 -S	Ignored	 for compatibility with previous samtools ver‐
			sions.	Previously this option was required  if	 input
			was in SAM format, but now the correct format is auto‐
			matically detected by examining the first few  charac‐
			ters of input.

		 -s	Output in SAM format.  The default is BAM.

		 -C	Output in CRAM format.	The default is BAM.

		 -u	Do  not compress the output.  Applies to either BAM or
			CRAM output format.

		 -1	Enable fastest compression level.  Only works for  BAM
			or CRAM output.

		 -T FILE
			Provides the padded reference file.  Note that without
			this the @SQ line lengths will be  incorrect,  so  for
			most  use  cases  this	option	will  be considered as
			mandatory.

		 -o FILE
			Specifies the output filename.	By default  output  is
			sent to stdout.

       help, --help
		 Display  a  brief usage message listing the samtools commands
		 available.  If the name of a command  is  also	 given,	 e.g.,
		 samtools help view,  the detailed usage message for that par‐
		 ticular command is displayed.

       --version Display the version numbers  and  copyright  information  for
		 samtools and the important libraries used by samtools.

       --version-only
		 Display  the  full samtools version number in a machine-read‐
		 able format.

GLOBAL OPTIONS
       Several long-options are shared between multiple samtools  subcommands:
       --input-fmt,  --input-fmt-options,  --output-fmt, --output-fmt-options,
       and --reference.	 The input format is typically auto-detected so speci‐
       fying  the format is usually unnecessary and the option is included for
       completeness.  Note that not all subcommands have all options.  Consult
       the subcommand help for more details.

       Format  strings	recognised  are	 "sam", "bam" and "cram".  They may be
       followed by a comma separated list of options as key or key=value.  See
       below for examples.

       The   fmt-options   arguments   accept	either	 a  single  option  or
       option=value.  Note that some options only work on  some	 file  formats
       and only on read or write streams.  If value is unspecified for a bool‐
       ean option, the value is assumed to be 1.  The  valid  options  are  as
       follows.

       nthreads=INT
	   Specifies  the  number  of  threads	to  use during encoding and/or
	   decoding.  For BAM this will be encoding only.  In CRAM the threads
	   are dynamically shared between encoder and decoder.

       reference=fasta_file
	   Specifies a FASTA reference file for use in CRAM encoding or decod‐
	   ing.	 It usually is not required for decoding except in the	situa‐
	   tion	 of the MD5 not being obtainable via the REF_PATH or REF_CACHE
	   environment variables.

       decode_md=0|1
	   CRAM decode only; defaults to 1  (on).   CRAM  does	not  typically
	   store MD and NM tags, preferring to generate them on the fly.  This
	   option controls this behaviour.

       ignore_md5=0|1
	   CRAM decode only; defaults to 0 (off).  When enabled, md5  checksum
	   errors  on  the reference sequence and block checksum errors within
	   CRAM are ignored.  Use of this option is strongly discouraged.

       required_fields=bit-field
	   CRAM decode only; specifies which SAM columns need to be populated.
	   By  default	all  fields are used.  Limiting the decode to specific
	   columns can have significant performance gains.  The bit-field is a
	   numerical value constructed from the following table.

				      0x1   SAM_QNAME
				      0x2   SAM_FLAG
				      0x4   SAM_RNAME
				      0x8   SAM_POS
				     0x10   SAM_MAPQ
				     0x20   SAM_CIGAR
				     0x40   SAM_RNEXT
				     0x80   SAM_PNEXT
				    0x100   SAM_TLEN
				    0x200   SAM_SEQ
				    0x400   SAM_QUAL
				    0x800   SAM_AUX
				   0x1000   SAM_RGAUX

       multi_seq_per_slice=0|1
	   CRAM	 encode	 only; defaults to 0 (off).  By default CRAM generates
	   one container per reference sequence, except in the	case  of  many
	   small references (such as a fragmented assembly).

       version=major.minor
	   CRAM	 encode	 only.	Specifies the CRAM version number.  Acceptable
	   values are "2.1" and "3.0".

       seqs_per_slice=INT
	   CRAM encode only; defaults to 10000.

       slices_per_container=INT
	   CRAM encode only; defaults to 1.  The  effect  of  having  multiple
	   slices  per	container  is  to  share  the compression header block
	   between multiple slices.  This is unlikely to have any  significant
	   impact  unless  the	number	of  sequences  per  slice  is reduced.
	   (Together these two	options	 control  the  granularity  of	random
	   access.)

       embed_ref=0|1
	   CRAM	 encode only; defaults to 0 (off).  If 1, this will store por‐
	   tions of the reference sequence in each  slice,  permitting	decode
	   without   having  requiring	an  external  copy  of	the  reference
	   sequence.

       no_ref=0|1
	   CRAM encode only; defaults to 0 (off).  If  1,  sequences  will  be
	   stored  verbatim with no reference encoding.	 This can be useful if
	   no reference is available for the file.

       For example:

	   samtools view --input-fmt-option decode_md=0
	       --output-fmt cram,version=3.0 --output-fmt-option embed_ref
	       --output-fmt-option seqs_per_slice=2000 -o foo.cram foo.bam

REFERENCE SEQUENCES
       The CRAM format requires use of a reference sequence for	 both  reading
       and writing.

       When  reading  a	 CRAM the @SQ headers are interrogated to identify the
       reference sequence MD5sum (M5: tag) and the  local  reference  sequence
       filename (UR: tag).  Note that http:// and ftp:// based URLs in the UR:
       field are not used, but local fasta filenames (with or without file://)
       can be used.

       To create a CRAM the @SQ headers will also be read to identify the ref‐
       erence sequences, but M5: and UR: tags may not be present. In this case
       the -T and -t options of samtools view may be used to specify the fasta
       or fasta.fai filenames respectively (provided the  .fasta.fai  file  is
       also backed up by a .fasta file).

       The search order to obtain a reference is:

       1. Use any local file specified by the command line options (eg -T).

       2. Look for MD5 via REF_CACHE environment variable.

       3. Look for MD5 in each element of the REF_PATH environment variable.

       4. Look for a local file listed in the UR: header tag.

ENVIRONMENT VARIABLES
       HTS_PATH
	      A	 colon-separated  list	of  directories in which to search for
	      HTSlib plugins.  If $HTS_PATH starts or ends  with  a  colon  or
	      contains	a  double colon (::), the built-in list of directories
	      is searched at that point in the search.

	      If no HTS_PATH variable is defined, the built-in list of	direc‐
	      tories  specified when HTSlib was built is used, which typically
	      includes /usr/local/libexec/htslib and similar directories.

       REF_PATH
	      A colon separated (semi-colon on Windows) list of	 locations  in
	      which  to	 look for sequences identified by their MD5sums.  This
	      can be either a list of directories or URLs. Note that if a  URL
	      is  included  then  the  colon  in  http://  and	ftp:// and the
	      optional port number will be treated as part of the URL and  not
	      a	 PATH field separator.	For URLs, the text %s will be replaced
	      by the MD5sum being read.

	      If  no  REF_PATH	has  been  specified  it   will	  default   to
	      http://www.ebi.ac.uk/ena/cram/md5/%s  and	 if  REF_CACHE is also
	      unset, it will be set to $XDG_CACHE_HOME/hts-ref/%2s/%2s/%s.  If
	      $XDG_CACHE_HOME is unset, $HOME/.cache (or a local system tempo‐
	      rary directory if no home directory is found) will be used simi‐
	      larly.

       REF_CACHE
	      This  can be defined to a single directory housing a local cache
	      of references.  Upon downloading a reference it will  be	stored
	      in  the location pointed to by REF_CACHE.	 When reading a refer‐
	      ence it will be looked for in this  directory  before  searching
	      REF_PATH.	  To  avoid many files being stored in the same direc‐
	      tory, a pathname may be constructed using %nums and %s notation,
	      consuming	  num	characters   of	  the	MD5sum.	  For  example
	      /local/ref_cache/%2s/%2s/%s will create 2 nested	subdirectories
	      with  the	 filenames  in the deepest directory being the last 28
	      characters of the md5sum.

	      The REF_CACHE directory will be searched for  before  attempting
	      to  load	via  the  REF_PATH  search  list.   If	no REF_PATH is
	      defined, both REF_PATH and REF_CACHE will be  automatically  set
	      (see  above),  but if REF_PATH is defined and REF_CACHE not then
	      no local cache is used.

	      To  aid  population  of  the  REF_CACHE	directory   a	script
	      misc/seq_cache_populate.pl is provided in the Samtools distribu‐
	      tion. This takes a fasta file or a directory of fasta files  and
	      generates the MD5sum named files.

EXAMPLES
       o Import SAM to BAM when @SQ lines are present in the header:

	   samtools view -bS aln.sam > aln.bam

	 If @SQ lines are absent:

	   samtools faidx ref.fa
	   samtools view -bt ref.fa.fai aln.sam > aln.bam

	 where ref.fa.fai is generated automatically by the faidx command.

       o Convert a BAM file to a CRAM file using a local reference sequence.

	   samtools view -C -T ref.fa aln.bam > aln.cram

       o Attach the RG tag while merging sorted alignments:

	   perl -e 'print "@RG\tID:ga\tSM:hs\tLB:ga\tPL:Illumina\n@RG\tID:454\tSM:hs\tLB:454\tPL:454\n"' > rg.txt
	   samtools merge -rh rg.txt merged.bam ga.bam 454.bam

	 The value in a RG tag is determined by the file name the read is com‐
	 ing from. In this example, in the merged.bam, reads from ga.bam  will
	 be  attached  RG:Z:ga,	 while	reads  from  454.bam  will be attached
	 RG:Z:454.

       o Call SNPs and short INDELs:

	   samtools mpileup -uf ref.fa aln.bam | bcftools call -mv > var.raw.vcf
	   bcftools filter -s LowQual -e '%QUAL<20 || DP>100' var.raw.vcf  > var.flt.vcf

	 The bcftools filter command marks low quality sites  and  sites  with
	 the  read  depth exceeding a limit, which should be adjusted to about
	 twice the average read depth (bigger  read  depths  usually  indicate
	 problematic regions which are often enriched for artefacts).  One may
	 consider to add -C50 to mpileup if mapping quality  is	 overestimated
	 for  reads containing excessive mismatches. Applying this option usu‐
	 ally helps BWA-short but may not other mappers.

	 Individuals are identified from the SM tags in the @RG header	lines.
	 Individuals  can  be pooled in one alignment file; one individual can
	 also be separated into multiple files. The -P option  specifies  that
	 indel	candidates  should be collected only from read groups with the
	 @RG-PL tag set to ILLUMINA.  Collecting indel candidates  from	 reads
	 sequenced  by an indel-prone technology may affect the performance of
	 indel calling.

       o Generate the consensus sequence for one diploid individual:

	   samtools mpileup -uf ref.fa aln.bam | bcftools call -c | vcfutils.pl vcf2fq > cns.fq

       o Phase one individual:

	   samtools calmd -AEur aln.bam ref.fa | samtools phase -b prefix - > phase.out

	 The calmd command  is	used  to  reduce  false	 heterozygotes	around
	 INDELs.

       o Dump BAQ applied alignment for other SNP callers:

	   samtools calmd -bAr aln.bam > aln.baq.bam

	 It  adds  and corrects the NM and MD tags at the same time. The calmd
	 command also comes with the -C option, the same as the one in	pileup
	 and mpileup.  Apply if it helps.

LIMITATIONS
       o Unaligned   words  used  in  bam_import.c,  bam_endian.h,  bam.c  and
	 bam_aux.c.

       o Samtools paired-end rmdup does not  work  for	unpaired  reads	 (e.g.
	 orphan	 reads	or ends mapped to different chromosomes). If this is a
	 concern, please use Picard's MarkDuplicates which  correctly  handles
	 these cases, although a little slower.

AUTHOR
       Heng  Li from the Sanger Institute wrote the original C version of sam‐
       tools.  Bob Handsaker from the Broad  Institute	implemented  the  BGZF
       library.	  James	 Bonfield from the Sanger Institute developed the CRAM
       implementation.	John Marshall  and  Petr  Danecek  contribute  to  the
       source  code and various people from the 1000 Genomes Project have con‐
       tributed to the SAM format specification.

SEE ALSO
       bcftools(1), sam(5), tabix(1)

       Samtools website: <http://www.htslib.org/>
       File  format  specification   of	  SAM/BAM,CRAM,VCF/BCF:	  <http://sam‐
       tools.github.io/hts-specs>
       Samtools latest source: <https://github.com/samtools/samtools>
       HTSlib latest source: <https://github.com/samtools/htslib>
       Bcftools website: <http://samtools.github.io/bcftools>

samtools-1.3		       15 December 2015			   samtools(1)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net