一个用来处理BAM (http://samtools.sourceforge.net) 格式的高通量测序数据的 (Java) 工具箱。
The Picard command-line tools are packaged as a single executable jar file. They require Java 1.6. They can be invoked as follows:
java jvm-args -jar picard.jar PicardCommandName OPTION1=value1 OPTION2=value2...
Most of the commands are designed to run in 2GB of JVM, so the JVM argument -Xmx2g is recommended.
The following options are relevant for most Picard programs:
Option | Description |
---|---|
--help | Displays options specific to this tool. |
--stdhelp | Displays options specific to this tool AND options common to all Picard command line tools. |
--version | Displays program version. |
TMP_DIR (File) | Default value: null. This option may be specified 0 or more times. |
VERBOSITY (LogLevel) | Control verbosity of logging. Default value: INFO. This option can be set to 'null' to clear the default value. Possible values: {ERROR, WARNING, INFO, DEBUG} |
QUIET (Boolean) | Whether to suppress job-summary info on System.err. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
VALIDATION_STRINGENCY (ValidationStringency) | Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default value: STRICT. This option can be set to 'null' to clear the default value. Possible values: {STRICT, LENIENT, SILENT} |
COMPRESSION_LEVEL (Integer) | Compression level for all compressed files created (e.g. BAM and GELI). Default value: 5. This option can be set to 'null' to clear the default value. |
MAX_RECORDS_IN_RAM (Integer) | When writing SAM files that need to be sorted, this will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort a SAM file, and increases the amount of RAM needed. Default value: 500000. This option can be set to 'null' to clear the default value. |
CREATE_INDEX (Boolean) | Whether to create a BAM index when writing a coordinate-sorted BAM file. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
CREATE_MD5_FILE (Boolean) | Whether to create an MD5 digest for any BAM or FASTQ files created. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
REFERENCE_SEQUENCE (File) | Reference sequence file. Default value: null. |
GA4GH_CLIENT_SECRETS (String) | Google Genomics API client_secrets.json file path. Default value: client_secrets.json. This option can be set to 'null' to clear the default value. |
Adds one or more comments to the header of a specified BAM file. Copies the file with the modified header to a specified output file. Note that a block copying method is used to ensure efficient transfer to the output file. SAM files are not supported
Option | Description |
---|---|
INPUT (File) | Input BAM file to add a comment to the header Required. |
OUTPUT (File) | Output BAM file to write results Required. |
COMMENT (String) | Comments to add to the BAM file Default value: null. This option may be specified 0 or more times. |
Replaces all read groups in the INPUT file with a single new read group and assigns all reads to this read group in the OUTPUT BAM
Option | Description |
---|---|
INPUT (String) | Input file (bam or sam or a GA4GH url). Required. |
OUTPUT (File) | Output file (bam or sam). Required. |
SORT_ORDER (SortOrder) | Optional sort order to output in. If not supplied OUTPUT is in the same order as INPUT. Default value: null. Possible values: {unsorted, queryname, coordinate, duplicate} |
RGID (String) | Read Group ID Default value: 1. This option can be set to 'null' to clear the default value. |
RGLB (String) | Read Group Library Required. |
RGPL (String) | Read Group platform (e.g. illumina, solid) Required. |
RGPU (String) | Read Group platform unit (eg. run barcode) Required. |
RGSM (String) | Read Group sample name Required. |
RGCN (String) | Read Group sequencing center name Default value: null. |
RGDS (String) | Read Group description Default value: null. |
RGDT (Iso8601Date) | Read Group run date Default value: null. |
RGPI (Integer) | Read Group predicted insert size Default value: null. |
RGPG (String) | Read Group program group Default value: null. |
RGPM (String) | Read Group platform model Default value: null. |
Create BFQ files from a BAM file for use by the Maq aligner.
Option | Description |
---|---|
INPUT (File) | The BAM file to parse. Required. |
ANALYSIS_DIR (File) | The analysis directory for the binary output file. Required. |
FLOWCELL_BARCODE (String) | Flowcell barcode (e.g. 30PYMAAXX). Required. Cannot be used in conjuction with option(s) OUTPUT_FILE_PREFIX |
LANE (Integer) | Lane number. Default value: null. Cannot be used in conjuction with option(s) OUTPUT_FILE_PREFIX |
OUTPUT_FILE_PREFIX (String) | Prefix for all output files Required. Cannot be used in conjuction with option(s) FLOWCELL_BARCODE (F) LANE (L) |
READS_TO_ALIGN (Integer) | Number of reads to align (null = all). Default value: null. |
READ_CHUNK_SIZE (Integer) | Number of reads to break into individual groups for alignment Default value: 2000000. This option can be set to 'null' to clear the default value. |
PAIRED_RUN (Boolean) | Whether this is a paired-end run. Required. Possible values: {true, false} |
RUN_BARCODE (String) | Deprecated option; use READ_NAME_PREFIX instead Default value: null. Cannot be used in conjuction with option(s) READ_NAME_PREFIX |
READ_NAME_PREFIX (String) | Prefix to be stripped off the beginning of all read names (to make them short enough to run in Maq) Default value: null. |
INCLUDE_NON_PF_READS (Boolean) | Whether to include non-PF reads Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
CLIP_ADAPTERS (Boolean) | Whether to clip adapters from the reads Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
BASES_TO_WRITE (Integer) | The number of bases from each read to write to the bfq file. If this is non-null, then only the first BASES_TO_WRITE bases from each read will be written. Default value: null. |
Generates BAM index statistics, including the number of aligned and unaligned SAMRecords for each reference sequence, and the number of SAMRecords with no coordinate.Input BAM file must have a corresponding index file.
Option | Description |
---|---|
INPUT (File) | A BAM file to process. Required. |
Converts a BED file to an Picard Interval List.
Option | Description |
---|---|
INPUT (File) | The input BED file Required. |
SEQUENCE_DICTIONARY (File) | The sequence dictionary Required. |
OUTPUT (File) | The output Picard Interval List Required. |
SORT (Boolean) | If true, sort the output interval list before writing it. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
UNIQUE (Boolean) | If true, unique the output interval list by merging overlapping regions, before writing it (implies sort=true). Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
Generates a BAM index (.bai) file.
Option | Description |
---|---|
INPUT (String) | A BAM file or URL to process. Must be sorted in coordinate order. Required. |
OUTPUT (File) | The BAM index file. Defaults to x.bai if INPUT is x.bam, otherwise INPUT.bai.
If INPUT is a URL and OUTPUT is unspecified, defaults to a file in the current directory. Default value: null. |
Calculates a set of Hybrid Selection specific metrics from an aligned SAMor BAM file. If a reference sequence is provided, AT/GC dropout metrics will be calculated, and the PER_TARGET_COVERAGE option can be used to output GC and mean coverage information for every target.
Option | Description |
---|---|
BAIT_INTERVALS (File) | An interval list file that contains the locations of the baits used. Default value: null. This option must be specified at least 1 times. |
BAIT_SET_NAME (String) | Bait set name. If not provided it is inferred from the filename of the bait intervals. Default value: null. |
TARGET_INTERVALS (File) | An interval list file that contains the locations of the targets. Default value: null. This option must be specified at least 1 times. |
INPUT (File) | An aligned SAM or BAM file. Required. |
OUTPUT (File) | The output file to write the metrics to. Required. |
METRIC_ACCUMULATION_LEVEL (MetricAccumulationLevel) | The level(s) at which to accumulate metrics. Default value: [ALL_READS]. This option can be set to 'null' to clear the default value. Possible values: {ALL_READS, SAMPLE, LIBRARY, READ_GROUP} This option may be specified 0 or more times. This option can be set to 'null' to clear the default list. |
PER_TARGET_COVERAGE (File) | An optional file to output per target coverage information to. Default value: null. |
Cleans the provided SAM/BAM, soft-clipping beyond-end-of-reference alignments and setting MAPQ to 0 for unmapped reads
Option | Description |
---|---|
INPUT (File) | Input SAM to be cleaned. Required. |
OUTPUT (File) | Where to write cleaned SAM. Required. |
Produces a file containing summary alignment metrics from a SAM or BAM.
java -jar picard.jar CollectAlignmentMetrics \
R=reference.fasta \
I=input.bam \
O=output.txt
Option | Description |
---|---|
MAX_INSERT_SIZE (Integer) | Paired end reads above this insert size will be considered chimeric along with inter-chromosomal pairs. Default value: 100000. This option can be set to 'null' to clear the default value. |
ADAPTER_SEQUENCE (String) | List of adapter sequences to use when processing the alignment metrics Default value: [AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG, AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG, AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG]. This option can be set to 'null' to clear the default value. This option may be specified 0 or more times. This option can be set to 'null' to clear the default list. |
METRIC_ACCUMULATION_LEVEL (MetricAccumulationLevel) | The level(s) at which to accumulate metrics. Default value: [ALL_READS]. This option can be set to 'null' to clear the default value. Possible values: {ALL_READS, SAMPLE, LIBRARY, READ_GROUP} This option may be specified 0 or more times. This option can be set to 'null' to clear the default list. |
IS_BISULFITE_SEQUENCED (Boolean) | Whether the SAM or BAM file consists of bisulfite sequenced reads. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
REFERENCE_SEQUENCE (File) | Reference sequence file. Note that while this argument isn't required, without it only a small subset of the metrics will be calculated. Default value: null. |
INPUT (File) | Input SAM or BAM file. Required. |
OUTPUT (File) | File to write the output to. Required. |
ASSUME_SORTED (Boolean) | If true (default), then the sort order in the header file will be ignored. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
STOP_AFTER (Long) | Stop after processing N reads, mainly for debugging. Default value: 0. This option can be set to 'null' to clear the default value. |
Program to chart the nucleotide distribution per cycle in a SAM or BAM file.
Option | Description |
---|---|
CHART_OUTPUT (File) | A file (with .pdf extension) to write the chart to. Required. |
ALIGNED_READS_ONLY (Boolean) | If set to true, calculate the base distribution over aligned reads only. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
PF_READS_ONLY (Boolean) | If set to true calculate the base distribution over PF reads only. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
INPUT (File) | Input SAM or BAM file. Required. |
OUTPUT (File) | File to write the output to. Required. |
ASSUME_SORTED (Boolean) | If true (default), then the sort order in the header file will be ignored. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
STOP_AFTER (Long) | Stop after processing N reads, mainly for debugging. Default value: 0. This option can be set to 'null' to clear the default value. |
Tool to collect information about GC bias in the reads in a given BAM file. Computes the number of windows (of size specified by SCAN_WINDOW_SIZE) in the genome at each GC% and counts the number of read starts in each GC bin. What is output and plotted is the "normalized coverage" in each bin - i.e. the number of reads per window normalized to the average number of reads per window across the whole genome..
Option | Description |
---|---|
CHART_OUTPUT (File) | The PDF file to render the chart to. Required. |
SUMMARY_OUTPUT (File) | The text file to write summary metrics to. Required. |
SCAN_WINDOW_SIZE (Integer) | The size of the scanning windows on the reference genome that are used to bin reads. Default value: 100. This option can be set to 'null' to clear the default value. |
MINIMUM_GENOME_FRACTION (Double) | For summary metrics, exclude GC windows that include less than this fraction of the genome. Default value: 1.0E-5. This option can be set to 'null' to clear the default value. |
IS_BISULFITE_SEQUENCED (Boolean) | Whether the SAM or BAM file consists of bisulfite sequenced reads. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
METRIC_ACCUMULATION_LEVEL (MetricAccumulationLevel) | The level(s) at which to accumulate metrics. Default value: [ALL_READS]. This option can be set to 'null' to clear the default value. Possible values: {ALL_READS, SAMPLE, LIBRARY, READ_GROUP} This option may be specified 0 or more times. This option can be set to 'null' to clear the default list. |
INPUT (File) | Input SAM or BAM file. Required. |
OUTPUT (File) | File to write the output to. Required. |
ASSUME_SORTED (Boolean) | If true (default), then the sort order in the header file will be ignored. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
STOP_AFTER (Long) | Stop after processing N reads, mainly for debugging. Default value: 0. This option can be set to 'null' to clear the default value. |
Reads a SAM or BAM file and writes a file containing metrics about the statistical distribution of insert size (excluding duplicates) and generates a Histogram plot.
Option | Description |
---|---|
HISTOGRAM_FILE (File) | File to write insert size Histogram chart to. Required. |
DEVIATIONS (Double) | Generate mean, sd and plots by trimming the data down to MEDIAN + DEVIATIONS*MEDIAN_ABSOLUTE_DEVIATION. This is done because insert size data typically includes enough anomalous values from chimeras and other artifacts to make the mean and sd grossly misleading regarding the real distribution. Default value: 10.0. This option can be set to 'null' to clear the default value. |
HISTOGRAM_WIDTH (Integer) | Explicitly sets the Histogram width, overriding automatic truncation of Histogram tail. Also, when calculating mean and standard deviation, only bins <= Histogram_WIDTH will be included. Default value: null. |
MINIMUM_PCT (Float) | When generating the Histogram, discard any data categories (out of FR, TANDEM, RF) that have fewer than this percentage of overall reads. (Range: 0 to 1). Default value: 0.05. This option can be set to 'null' to clear the default value. |
METRIC_ACCUMULATION_LEVEL (MetricAccumulationLevel) | The level(s) at which to accumulate metrics. Default value: [ALL_READS]. This option can be set to 'null' to clear the default value. Possible values: {ALL_READS, SAMPLE, LIBRARY, READ_GROUP} This option may be specified 0 or more times. This option can be set to 'null' to clear the default list. |
INPUT (File) | Input SAM or BAM file. Required. |
OUTPUT (File) | File to write the output to. Required. |
ASSUME_SORTED (Boolean) | If true (default), then the sort order in the header file will be ignored. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
STOP_AFTER (Long) | Stop after processing N reads, mainly for debugging. Default value: 0. This option can be set to 'null' to clear the default value. |
Takes an input BAM and reference sequence and runs one or more Picard metrics modules at the same time to cut down on I/O. Currently all programs are run with default options and fixed output extensions, but this may become more flexible in future.
Option | Description |
---|---|
INPUT (File) | Input SAM or BAM file. Required. |
ASSUME_SORTED (Boolean) | If true (default), then the sort order in the header file will be ignored. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
STOP_AFTER (Integer) | Stop after processing N reads, mainly for debugging. Default value: 0. This option can be set to 'null' to clear the default value. |
OUTPUT (String) | Base name of output files. Required. |
METRIC_ACCUMULATION_LEVEL (MetricAccumulationLevel) | The level(s) at which to accumulate metrics. Default value: [ALL_READS]. This option can be set to 'null' to clear the default value. Possible values: {ALL_READS, SAMPLE, LIBRARY, READ_GROUP} This option may be specified 0 or more times. This option can be set to 'null' to clear the default list. |
PROGRAM (Program) | List of metrics programs to apply during the pass through the SAM file. Default value: [CollectAlignmentSummaryMetrics, CollectBaseDistributionByCycle, CollectInsertSizeMetrics, MeanQualityByCycle, QualityScoreDistribution]. This option can be set to 'null' to clear the default value. Possible values: {CollectAlignmentSummaryMetrics, CollectInsertSizeMetrics, QualityScoreDistribution, MeanQualityByCycle, CollectBaseDistributionByCycle, CollectGcBiasMetrics, RnaSeqMetrics, CollectSequencingArtifactMetrics} This option may be specified 0 or more times. This option can be set to 'null' to clear the default list. |
INTERVALS (File) | An optional list of intervals to restrict analysis to. Default value: null. |
DB_SNP (File) | VCF format dbSNP file, used to exclude regions around known polymorphisms from analysis. Default value: null. |
Calculates a set of metrics to Illumina Truseq Custom Amplicon sequencing from an aligned SAMor BAM file. If a reference sequence is provided, AT/GC dropout metrics will be calculated, and the PER_TARGET_COVERAGE option can be used to output GC and mean coverage information for every target.
Option | Description |
---|---|
AMPLICON_INTERVALS (File) | An interval list file that contains the locations of the baits used. Required. |
CUSTOM_AMPLICON_SET_NAME (String) | Custom amplicon set name. If not provided it is inferred from the filename of the AMPLICON_INTERVALS intervals. Default value: null. |
TARGET_INTERVALS (File) | An interval list file that contains the locations of the targets. Default value: null. This option must be specified at least 1 times. |
INPUT (File) | An aligned SAM or BAM file. Required. |
OUTPUT (File) | The output file to write the metrics to. Required. |
METRIC_ACCUMULATION_LEVEL (MetricAccumulationLevel) | The level(s) at which to accumulate metrics. Default value: [ALL_READS]. This option can be set to 'null' to clear the default value. Possible values: {ALL_READS, SAMPLE, LIBRARY, READ_GROUP} This option may be specified 0 or more times. This option can be set to 'null' to clear the default list. |
PER_TARGET_COVERAGE (File) | An optional file to output per target coverage information to. Default value: null. |
Collect metrics about the alignment of RNA to various functional classes of loci in the genome:coding, intronic, UTR, intergenic, ribosomal. Also determines strand-specificity for strand-specific libraries.
Option | Description |
---|---|
REF_FLAT (File) | Gene annotations in refFlat form. Format described here: http://genome.ucsc.edu/goldenPath/gbdDescriptionsOld.html#RefFlat Required. |
RIBOSOMAL_INTERVALS (File) | Location of rRNA sequences in genome, in interval_list format. If not specified no bases will be identified as being ribosomal. Format described here: http://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/samtools/util/IntervalList.html Default value: null. |
STRAND_SPECIFICITY (StrandSpecificity) | For strand-specific library prep. For unpaired reads, use FIRST_READ_TRANSCRIPTION_STRAND if the reads are expected to be on the transcription strand. Required. Possible values: {NONE, FIRST_READ_TRANSCRIPTION_STRAND, SECOND_READ_TRANSCRIPTION_STRAND} |
MINIMUM_LENGTH (Integer) | When calculating coverage based values (e.g. CV of coverage) only use transcripts of this length or greater. Default value: 500. This option can be set to 'null' to clear the default value. |
CHART_OUTPUT (File) | The PDF file to write out a plot of normalized position vs. coverage. Default value: null. |
IGNORE_SEQUENCE (String) | If a read maps to a sequence specified with this option, all the bases in the read are counted as ignored bases. These reads are not counted as Default value: null. This option may be specified 0 or more times. |
RRNA_FRAGMENT_PERCENTAGE (Double) | This percentage of the length of a fragment must overlap one of the ribosomal intervals for a read or read pair by this must in order to be considered rRNA. Default value: 0.8. This option can be set to 'null' to clear the default value. |
METRIC_ACCUMULATION_LEVEL (MetricAccumulationLevel) | The level(s) at which to accumulate metrics. Default value: [ALL_READS]. This option can be set to 'null' to clear the default value. Possible values: {ALL_READS, SAMPLE, LIBRARY, READ_GROUP} This option may be specified 0 or more times. This option can be set to 'null' to clear the default list. |
INPUT (File) | Input SAM or BAM file. Required. |
OUTPUT (File) | File to write the output to. Required. |
ASSUME_SORTED (Boolean) | If true (default), then the sort order in the header file will be ignored. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
STOP_AFTER (Long) | Stop after processing N reads, mainly for debugging. Default value: 0. This option can be set to 'null' to clear the default value. |
Computes a number of metrics that are useful for evaluating coverage and performance of whole genome sequencing experiments.
Option | Description |
---|---|
INPUT (File) | Input SAM or BAM file. Required. |
OUTPUT (File) | Output metrics file. Required. |
REFERENCE_SEQUENCE (File) | The reference sequence fasta aligned to. Required. |
MINIMUM_MAPPING_QUALITY (Integer) | Minimum mapping quality for a read to contribute coverage. Default value: 20. This option can be set to 'null' to clear the default value. |
MINIMUM_BASE_QUALITY (Integer) | Minimum base quality for a base to contribute coverage. Default value: 20. This option can be set to 'null' to clear the default value. |
COVERAGE_CAP (Integer) | Treat bases with coverage exceeding this value as if they had coverage at this value. Default value: 250. This option can be set to 'null' to clear the default value. |
STOP_AFTER (Long) | For debugging purposes, stop after processing this many genomic bases. Default value: -1. This option can be set to 'null' to clear the default value. |
INCLUDE_BQ_HISTOGRAM (Boolean) | Determines whether to include the base quality histogram in the metrics file. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
COUNT_UNPAIRED (Boolean) | If true, count unpaired reads, and paired reads with one end unmapped Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
USAGE: CompareSAMS
Read fasta or fasta.gz containing reference sequences, and write as a SAM or BAM file with only sequence dictionary.
Option | Description |
---|---|
REFERENCE (File) | Input reference fasta or fasta.gz Required. |
OUTPUT (File) | Output SAM or BAM file containing only the sequence dictionary Required. |
GENOME_ASSEMBLY (String) | Put into AS field of sequence dictionary entry if supplied Default value: null. |
URI (String) | Put into UR field of sequence dictionary entry. If not supplied, input reference file is used Default value: null. |
SPECIES (String) | Put into SP field of sequence dictionary entry Default value: null. |
TRUNCATE_NAMES_AT_WHITESPACE (Boolean) | Make sequence name the first word from the > line in the fasta file. By default the entire contents of the > line is used, excluding leading and trailing whitespace. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
NUM_SEQUENCES (Integer) | Stop after writing this many sequences. For testing. Default value: 2147483647. This option can be set to 'null' to clear the default value. |
Randomly down-sample a SAM or BAM file to retain only a subset of the reads in the file. All reads for a templates are kept or discarded as a unit, with the goal of retaining readsfrom PROBABILITY * input templates. While this will usually result in approximately PROBABILITY * input reads being retained also, for very small PROBABILITIES this may not be the case. A number of different downsampling strategies are supported using the STRATEGY option: ConstantMemory: Downsamples a stream or file of SAMRecords using a hash-projection strategy such that it can run in constant memory. The downsampling is stochastic, and therefore the actual retained proportion will vary around the requested proportion. Due to working in fixed memory this strategy is good for large inputs, and due to the stochastic nature the accuracy of this strategy is highest with a high number of output records, and diminishes at low output volumes. HighAccuracy: Attempts (but does not guarantee) to provide accuracy up to a specified limit. Accuracy is defined as emitting a proportion of reads as close to the requested proportion as possible. In order to do so this strategy requires memory that is proportional to the number of template names in the incoming stream of reads, and will thus require large amounts of memory when running on large input files. Chained: Attempts to provide a compromise strategy that offers some of the advantages of both the ConstantMemory and HighAccuracy strategies. Uses a ConstantMemory strategy to downsample the incoming stream to approximately the desired proportion, and then a HighAccuracy strategy to finish. Works in a single pass, and will provide accuracy close to (but often not as good as) HighAccuracy while requiring memory proportional to the set of reads emitted from the ConstantMemory strategy to the HighAccuracy strategy. Works well when downsampling large inputs to small proportions (e.g. downsampling hundreds of millions of reads and retaining only 2%. Should be accurate 99.9% of the time when the input contains >= 50,000 templates (read names). For smaller inputs, HighAccuracy is recommended instead.
Option | Description |
---|---|
INPUT (File) | The input SAM or BAM file to downsample. Required. |
OUTPUT (File) | The output, downsampled, SAM or BAM file to write. Required. |
STRATEGY (Strategy) | The downsampling strategy to use. See usage for discussion. Default value: ConstantMemory. This option can be set to 'null' to clear the default value. Possible values: {HighAccuracy, ConstantMemory, Chained} |
RANDOM_SEED (Integer) | Random seed to use if reproducibilty is desired. Setting to null will cause multiple invocations to produce different results. Default value: 1. This option can be set to 'null' to clear the default value. |
PROBABILITY (Double) | The probability of keeping any individual read, between 0 and 1. Default value: 1.0. This option can be set to 'null' to clear the default value. |
ACCURACY (Double) | The accuracy that the downsampler should try to achieve if the selected strategy supports it. Note that accuracy is never guaranteed, but some strategies will attempt to provide accuracy within the requested bounds.Higher accuracy will generally require more memory. Default value: 1.0E-4. This option can be set to 'null' to clear the default value. |
Determine the barcode for each read in an Illumina lane.
For each tile, a file is written to the basecalls directory of the form s_
Option | Description |
---|---|
BASECALLS_DIR (File) | The Illumina basecalls directory. Required. |
OUTPUT_DIR (File) | Where to write _barcode.txt files. By default, these are written to BASECALLS_DIR. Default value: null. |
LANE (Integer) | Lane number. Required. |
READ_STRUCTURE (String) | A description of the logical structure of clusters in an Illumina Run, i.e. a description of the structure IlluminaBasecallsToSam assumes the data to be in. It should consist of integer/character pairs describing the number of cycles and the type of those cycles (B for Barcode, T for Template, and S for skip). E.g. If the input data consists of 80 base clusters and we provide a read structure of "36T8B8S28T" then, before being converted to SAM records those bases will be split into 4 reads where read one consists of 36 cycles of template, read two consists of 8 cycles of barcode, read three will be an 8 base read of skipped cycles and read four is another 28 cycle template read. The read consisting of skipped cycles would NOT be included in output SAM/BAM file read groups. Required. |
BARCODE (String) | Barcode sequence. These must be unique, and all the same length. This cannot be used with reads that have more than one barcode; use BARCODE_FILE in that case. Default value: null. This option may be specified 0 or more times. Cannot be used in conjuction with option(s) BARCODE_FILE |
BARCODE_FILE (File) | Tab-delimited file of barcode sequences, barcode name and, optionally, library name. Barcodes must be unique and all the same length. Column headers must be 'barcode_sequence_1', 'barcode_sequence_2' (optional), 'barcode_name', and 'library_name'. Required. Cannot be used in conjuction with option(s) BARCODE |
METRICS_FILE (File) | Per-barcode and per-lane metrics written to this file. Required. |
MAX_MISMATCHES (Integer) | Maximum mismatches for a barcode to be considered a match. Default value: 1. This option can be set to 'null' to clear the default value. |
MIN_MISMATCH_DELTA (Integer) | Minimum difference between number of mismatches in the best and second best barcodes for a barcode to be considered a match. Default value: 1. This option can be set to 'null' to clear the default value. |
MAX_NO_CALLS (Integer) | Maximum allowable number of no-calls in a barcode read before it is considered unmatchable. Default value: 2. This option can be set to 'null' to clear the default value. |
MINIMUM_BASE_QUALITY (Integer) | Minimum base quality. Any barcode bases falling below this quality will be considered a mismatch even in the bases match. Default value: 0. This option can be set to 'null' to clear the default value. |
MINIMUM_QUALITY (Integer) | The minimum quality (after transforming 0s to 1s) expected from reads. If qualities are lower than this value, an error is thrown.The default of 2 is what the Illumina's spec describes as the minimum, but in practice the value has been observed lower. Default value: 2. This option can be set to 'null' to clear the default value. |
COMPRESS_OUTPUTS (Boolean) | Compress output s_l_t_barcode.txt files using gzip and append a .gz extension to the file names. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
NUM_PROCESSORS (Integer) | Run this many PerTileBarcodeExtractors in parallel. If NUM_PROCESSORS = 0, number of cores is automatically set to the number of cores available on the machine. If NUM_PROCESSORS < 0 then the number of cores used will be the number available on the machine less NUM_PROCESSORS. Default value: 1. This option can be set to 'null' to clear the default value. |
Attempts to estimate library complexity from sequence of read pairs alone. Does so by sorting all reads by the first N bases (5 by default) of each read and then comparing reads with the first N bases identical to each other for duplicates. Reads are considered to be duplicates if they match each other with no gaps and an overall mismatch rate less than or equal to MAX_DIFF_RATE (0.03 by default). Reads of poor quality are filtered out so as to provide a more accurate estimate. The filtering removes reads with any no-calls in the first N bases or with a mean base quality lower than MIN_MEAN_QUALITY across either the first or second read. Unpaired reads are ignored in this computation. The algorithm attempts to detect optical duplicates separately from PCR duplicates and excludes these in the calculation of library size. Also, since there is no alignment to screen out technical reads one further filter is applied on the data. After examining all reads a Histogram is built of [#reads in duplicate set -> #of duplicate sets] all bins that contain exactly one duplicate set are then removed from the Histogram as outliers before library size is estimated.
Option | Description |
---|---|
INPUT (File) | One or more files to combine and estimate library complexity from. Reads can be mapped or unmapped. Default value: null. This option may be specified 0 or more times. |
OUTPUT (File) | Output file to writes per-library metrics to. Required. |
MIN_IDENTICAL_BASES (Integer) | The minimum number of bases at the starts of reads that must be identical for reads to be grouped together for duplicate detection. In effect total_reads / 4^max_id_bases reads will be compared at a time, so lower numbers will produce more accurate results but consume exponentially more memory and CPU. Default value: 5. This option can be set to 'null' to clear the default value. |
MAX_DIFF_RATE (Double) | The maximum rate of differences between two reads to call them identical. Default value: 0.03. This option can be set to 'null' to clear the default value. |
MIN_MEAN_QUALITY (Integer) | The minimum mean quality of the bases in a read pair for the read to be analyzed. Reads with lower average quality are filtered out and not considered in any calculations. Default value: 20. This option can be set to 'null' to clear the default value. |
MAX_GROUP_RATIO (Integer) | Do not process self-similar groups that are this many times over the mean expected group size. I.e. if the input contains 10m read pairs and MIN_IDENTICAL_BASES is set to 5, then the mean expected group size would be approximately 10 reads. Default value: 500. This option can be set to 'null' to clear the default value. |
BARCODE_TAG (String) | Barcode SAM tag (ex. BC for 10X Genomics) Default value: null. |
READ_ONE_BARCODE_TAG (String) | Read one barcode SAM tag (ex. BX for 10X Genomics) Default value: null. |
READ_TWO_BARCODE_TAG (String) | Read two barcode SAM tag (ex. BX for 10X Genomics) Default value: null. |
READ_NAME_REGEX (String) | Regular expression that can be used to parse read names in the incoming SAM file. Read names are parsed to extract three variables: tile/region, x coordinate and y coordinate. These values are used to estimate the rate of optical duplication in order to give a more accurate estimated library size. Set this option to null to disable optical duplicate detection. The regular expression should contain three capture groups for the three variables, in order. It must match the entire read name. Note that if the default regex is specified, a regex match is not actually done, but instead the read name is split on colon character. For 5 element names, the 3rd, 4th and 5th elements are assumed to be tile, x and y values. For 7 element names (CASAVA 1.8), the 5th, 6th, and 7th elements are assumed to be tile, x and y values. Default value: [a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).*. This option can be set to 'null' to clear the default value. |
OPTICAL_DUPLICATE_PIXEL_DISTANCE (Integer) | The maximum offset between two duplicte clusters in order to consider them optical duplicates. This should usually be set to some fairly small number (e.g. 5-10 pixels) unless using later versions of the Illumina pipeline that multiply pixel values by 10, in which case 50-100 is more normal. Default value: 100. This option can be set to 'null' to clear the default value. |
Extracts read sequences and qualities from the input fastq file and writes them into the output file in unaligned BAM format. Input files can be in GZip format (end in .gz).
Option | Description |
---|---|
FASTQ (File) | Input fastq file (optionally gzipped) for single end data, or first read in paired end data. Required. |
FASTQ2 (File) | Input fastq file (optionally gzipped) for the second read of paired end data. Default value: null. |
USE_SEQUENTIAL_FASTQS (Boolean) | Use sequential fastq files with the suffix <prefix>_###.fastq or <prefix>_###.fastq.gz Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
QUALITY_FORMAT (FastqQualityFormat) | A value describing how the quality values are encoded in the fastq. Either Solexa for pre-pipeline 1.3 style scores (solexa scaling + 66), Illumina for pipeline 1.3 and above (phred scaling + 64) or Standard for phred scaled scores with a character shift of 33. If this value is not specified, the quality format will be detected automatically. Default value: null. Possible values: {Solexa, Illumina, Standard} |
OUTPUT (File) | Output SAM/BAM file. Required. |
READ_GROUP_NAME (String) | Read group name Default value: A. This option can be set to 'null' to clear the default value. |
SAMPLE_NAME (String) | Sample name to insert into the read group header Required. |
LIBRARY_NAME (String) | The library name to place into the LB attribute in the read group header Default value: null. |
PLATFORM_UNIT (String) | The platform unit (often run_barcode.lane) to insert into the read group header Default value: null. |
PLATFORM (String) | The platform type (e.g. illumina, solid) to insert into the read group header Default value: null. |
SEQUENCING_CENTER (String) | The sequencing center from which the data originated Default value: null. |
PREDICTED_INSERT_SIZE (Integer) | Predicted median insert size, to insert into the read group header Default value: null. |
PROGRAM_GROUP (String) | Program group to insert into the read group header. Default value: null. |
PLATFORM_MODEL (String) | Platform model to insert into the group header (free-form text providing further details of the platform/technology used) Default value: null. |
COMMENT (String) | Comment(s) to include in the merged output file's header. Default value: null. This option may be specified 0 or more times. |
DESCRIPTION (String) | Inserted into the read group header Default value: null. |
RUN_DATE (Iso8601Date) | Date the run was produced, to insert into the read group header Default value: null. |
SORT_ORDER (SortOrder) | The sort order for the output sam/bam file. Default value: queryname. This option can be set to 'null' to clear the default value. Possible values: {unsorted, queryname, coordinate, duplicate} |
MIN_Q (Integer) | Minimum quality allowed in the input fastq. An exception will be thrown if a quality is less than this value. Default value: 0. This option can be set to 'null' to clear the default value. |
MAX_Q (Integer) | Maximum quality allowed in the input fastq. An exception will be thrown if a quality is greater than this value. Default value: 93. This option can be set to 'null' to clear the default value. |
STRIP_UNPAIRED_MATE_NUMBER (Boolean) | If true and this is an unpaired fastq any occurance of '/1' will be removed from the end of a read name. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
ALLOW_AND_IGNORE_EMPTY_LINES (Boolean) | Allow (and ignore) empty lines Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
Provides a large, configurable, FIFO buffer that can be used to buffer input and output streams between programs with a buffer size that is larger than that offered by native unix FIFOs (usually 64k).
Option | Description |
---|---|
BUFFER_SIZE (Integer) | The size of the memory buffer in bytes. Default value: 536870912. This option can be set to 'null' to clear the default value. |
IO_SIZE (Integer) | The size, in bytes, to read/write atomically to the input and output streams. Default value: 65536. This option can be set to 'null' to clear the default value. |
DEBUG_FREQUENCY (Integer) | How frequently, in seconds, to report debugging statistics. Set to zero for never. Default value: 0. This option can be set to 'null' to clear the default value. |
NAME (String) | Name to use for Fifo in debugging statements. Default value: null. |
Produces a new SAM or BAM file by including or excluding aligned reads or a list of reads names supplied in the READ_LIST_FILE from the INPUT SAM or BAM file.
Option | Description |
---|---|
INPUT (File) | The SAM or BAM file that will be filtered. Required. |
FILTER (Filter) | Filter. Required. Possible values: {includeAligned [OUTPUT SAM/BAM will contain aligned reads only. INPUT SAM/BAM must be in queryname SortOrder. (Note that *both* first and second of paired reads must be aligned to be included in the OUTPUT SAM or BAM)], excludeAligned [OUTPUT SAM/BAM will contain un-mapped reads only. INPUT SAM/BAM must be in queryname SortOrder. (Note that *both* first and second of pair must be aligned to be excluded from the OUTPUT SAM or BAM)], includeReadList [OUTPUT SAM/BAM will contain reads that are supplied in the READ_LIST_FILE file], excludeReadList [OUTPUT bam will contain reads that are *not* supplied in the READ_LIST_FILE file]} |
READ_LIST_FILE (File) | Read List File containing reads that will be included or excluded from the OUTPUT SAM or BAM file. Default value: null. |
SORT_ORDER (SortOrder) | SortOrder of the OUTPUT SAM or BAM file, otherwise use the SortOrder of the INPUT file. Default value: null. Possible values: {unsorted, queryname, coordinate, duplicate} |
WRITE_READS_FILES (Boolean) | Create .reads files (for debugging purposes) Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
OUTPUT (File) | SAM or BAM file to write read excluded results to Required. |
Applies one or more hard filters to a VCF file to filter out genotypes and variants.
Option | Description |
---|---|
INPUT (File) | The INPUT VCF or BCF file. Required. |
OUTPUT (File) | The output VCF or BCF. Required. |
MIN_AB (Double) | The minimum allele balance acceptable before filtering a site. Allele balance is calculated for heterozygotes as the number of bases supporting the least-represented allele over the total number of base observations. Different heterozygote genotypes at the same locus are measured independently. The locus is filtered if any allele balance is below the limit. Default value: 0.0. This option can be set to 'null' to clear the default value. |
MIN_DP (Integer) | The minimum sequencing depth supporting a genotype before the genotype will be filtered out. Default value: 0. This option can be set to 'null' to clear the default value. |
MIN_GQ (Integer) | The minimum genotype quality that must be achieved for a sample otherwise the genotype will be filtered out. Default value: 0. This option can be set to 'null' to clear the default value. |
MAX_FS (Double) | The maximum phred scaled fisher strand value before a site will be filtered out. Default value: 1.7976931348623157E308. This option can be set to 'null' to clear the default value. |
MIN_QD (Double) | The minimum QD value to accept or otherwise filter out the variant. Default value: 0.0. This option can be set to 'null' to clear the default value. |
Ensure that all mate-pair information is in sync between each read and its mate pair. If no OUTPUT file is supplied then the output is written to a temporary file and then copied over the INPUT file. Reads marked with the secondary alignment flag are written to the output file unchanged.
Option | Description |
---|---|
INPUT (File) | The input file to fix. Default value: null. This option may be specified 0 or more times. |
OUTPUT (File) | The output file to write to. If no output file is supplied, the input file is overwritten. Default value: null. |
SORT_ORDER (SortOrder) | Optional sort order if the OUTPUT file should be sorted differently than the INPUT file. Default value: null. Possible values: {unsorted, queryname, coordinate, duplicate} |
ASSUME_SORTED (Boolean) | If true, assume that the input file is queryname sorted, even if the header says otherwise. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
ADD_MATE_CIGAR (Boolean) | Adds the mate CIGAR tag (MC) if true, does not if false. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
Concatenates one or more BAM files together as efficiently as possible. Assumes that the list of BAM files provided as INPUT are in the order that they should be concatenated and simply concatenates the bodies of the BAM files while retaining the header from the first file. Operates via copying of the gzip blocks directly for speed but also supports generation of an MD5 on the output and indexing of the output BAM file. Only support BAM files, does not support SAM files.
Option | Description |
---|---|
INPUT (File) | One or more BAM files or text files containing lists of BAM files one per line. Default value: null. This option may be specified 0 or more times. |
OUTPUT (File) | The output BAM file to write. Required. |
Gathers multiple VCF files from a scatter operation into a single VCF file. Input files must be supplied in genomic order and must not have events at overlapping positions.
Option | Description |
---|---|
INPUT (File) | Input VCF file(s). Default value: null. This option may be specified 0 or more times. |
OUTPUT (File) | Output VCF file. Required. |
Calculates the concordance between genotype data for two samples in two different VCFs - one being considered the truth (or reference) the other being considered the call. The concordance is broken into separate results sections for SNPs and indels. Summary and detailed statistics are reported Note that for any pair of variants to compare, only the alleles for the samples under interrogation are considered and MNP, Symbolic, and Mixed classes of variants are not included.
Option | Description |
---|---|
TRUTH_VCF (File) | The VCF containing the truth sample Required. |
CALL_VCF (File) | The VCF containing the call sample Required. |
OUTPUT (File) | Basename for the two metrics files that are to be written. Resulting files will be <OUTPUT>.genotype_concordance_summary_metrics and <OUTPUT>.genotype_concordance_detail_metrics. Required. |
TRUTH_SAMPLE (String) | The name of the truth sample within the truth VCF Required. |
CALL_SAMPLE (String) | The name of the call sample within the call VCF Required. |
INTERVALS (File) | One or more interval list files that will be used to limit the genotype concordance. Note - if intervals are specified, the VCF files must be indexed. Default value: null. This option may be specified 0 or more times. |
INTERSECT_INTERVALS (Boolean) | If true, multiple interval lists will be intersected. If false multiple lists will be unioned. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
MIN_GQ (Integer) | Genotypes below this genotype quality will have genotypes classified as LowGq. Default value: 0. This option can be set to 'null' to clear the default value. |
MIN_DP (Integer) | Genotypes below this depth will have genotypes classified as LowDp. Default value: 0. This option can be set to 'null' to clear the default value. |
OUTPUT_ALL_ROWS (Boolean) | If true, output all rows in detailed statistics even when count == 0. When false only output rows with non-zero counts. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
USE_VCF_INDEX (Boolean) | If true, use the VCF index, else iterate over the entire VCF. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
MISSING_SITES_HOM_REF (Boolean) | Default is false, which follows the GA4GH Scheme. If true, missing sites in the truth set will be treated as HOM_REF sites and sites missing in both the truth and call sets will be true negatives. Useful when hom ref sites are left out of the truth set. This flag can only be used with a high confidence interval list. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
Generate fastq file(s) from data in an Illumina basecalls output directory.
Separate fastq file(s) are created for each template read, and for each barcode read, in the basecalls.
Template fastqs have extensions like .
Option | Description |
---|---|
BASECALLS_DIR (File) | The basecalls directory. Required. |
BARCODES_DIR (File) | The barcodes directory with _barcode.txt files (generated by ExtractIlluminaBarcodes). If not set, use BASECALLS_DIR. Default value: null. |
LANE (Integer) | Lane number. Required. |
OUTPUT_PREFIX (File) | The prefix for output fastqs. Extensions as described above are appended. Use this option for a non-barcoded run, or for a barcoded run in which it is not desired to demultiplex reads into separate files by barcode. Required. Cannot be used in conjuction with option(s) MULTIPLEX_PARAMS |
RUN_BARCODE (String) | The barcode of the run. Prefixed to read names. Required. |
MACHINE_NAME (String) | The name of the machine on which the run was sequenced; required if emitting Casava1.8-style read name headers Default value: null. |
FLOWCELL_BARCODE (String) | The barcode of the flowcell that was sequenced; required if emitting Casava1.8-style read name headers Default value: null. |
READ_STRUCTURE (String) | A description of the logical structure of clusters in an Illumina Run, i.e. a description of the structure IlluminaBasecallsToSam assumes the data to be in. It should consist of integer/character pairs describing the number of cycles and the type of those cycles (B for Barcode, T for Template, and S for skip). E.g. If the input data consists of 80 base clusters and we provide a read structure of "36T8B8S28T" then, before being converted to SAM records those bases will be split into 4 reads where read one consists of 36 cycles of template, read two consists of 8 cycles of barcode, read three will be an 8 base read of skipped cycles and read four is another 28 cycle template read. The read consisting of skipped cycles would NOT be included in output SAM/BAM file read groups. Required. |
MULTIPLEX_PARAMS (File) | Tab-separated file for creating all output fastqs demultiplexed by barcode for a lane with single IlluminaBasecallsToFastq invocation. The columns are OUTPUT_PREFIX, and BARCODE_1, BARCODE_2 ... BARCODE_X where X = number of barcodes per cluster (optional). Row with BARCODE_1 set to 'N' is used to specify an output_prefix for no barcode match. Required. Cannot be used in conjuction with option(s) OUTPUT_PREFIX (O) |
ADAPTERS_TO_CHECK (IlluminaAdapterPair) | Which adapters to look for in the read. Default value: [INDEXED, DUAL_INDEXED, NEXTERA_V2, FLUIDIGM]. This option can be set to 'null' to clear the default value. Possible values: {PAIRED_END, INDEXED, SINGLE_END, NEXTERA_V1, NEXTERA_V2, DUAL_INDEXED, FLUIDIGM, TRUSEQ_SMALLRNA, ALTERNATIVE_SINGLE_END} This option may be specified 0 or more times. This option can be set to 'null' to clear the default list. |
NUM_PROCESSORS (Integer) | The number of threads to run in parallel. If NUM_PROCESSORS = 0, number of cores is automatically set to the number of cores available on the machine. If NUM_PROCESSORS < 0, then the number of cores used will be the number available on the machine less NUM_PROCESSORS. Default value: 0. This option can be set to 'null' to clear the default value. |
FIRST_TILE (Integer) | If set, this is the first tile to be processed (used for debugging). Note that tiles are not processed in numerical order. Default value: null. |
TILE_LIMIT (Integer) | If set, process no more than this many tiles (used for debugging). Default value: null. |
APPLY_EAMSS_FILTER (Boolean) | Apply EAMSS filtering to identify inappropriately quality scored bases towards the ends of reads and convert their quality scores to Q2. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
FORCE_GC (Boolean) | If true, call System.gc() periodically. This is useful in cases in which the -Xmx value passed is larger than the available memory. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
MAX_READS_IN_RAM_PER_TILE (Integer) | Configure SortingCollections to store this many records before spilling to disk. For an indexed run, each SortingCollection gets this value/number of indices. Default value: 1200000. This option can be set to 'null' to clear the default value. |
MINIMUM_QUALITY (Integer) | The minimum quality (after transforming 0s to 1s) expected from reads. If qualities are lower than this value, an error is thrown.The default of 2 is what the Illumina's spec describes as the minimum, but in practice the value has been observed lower. Default value: 2. This option can be set to 'null' to clear the default value. |
INCLUDE_NON_PF_READS (Boolean) | Whether to include non-PF reads Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
IGNORE_UNEXPECTED_BARCODES (Boolean) | Whether to ignore reads whose barcodes are not found in MULTIPLEX_PARAMS. Useful when outputting fastqs for only a subset of the barcodes in a lane. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
READ_NAME_FORMAT (ReadNameFormat) | The read name header formatting to emit. Casava1.8 formatting has additional information beyond Illumina, including: the passing-filter flag value for the read, the flowcell name, and the sequencer name. Default value: CASAVA_1_8. This option can be set to 'null' to clear the default value. Possible values: {CASAVA_1_8, ILLUMINA} |
COMPRESS_OUTPUTS (Boolean) | Compress output FASTQ files using gzip and append a .gz extension to the file names. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
Generate a SAM or BAM file from data in an Illumina basecalls output directory
Option | Description |
---|---|
BASECALLS_DIR (File) | The basecalls directory. Required. |
BARCODES_DIR (File) | The barcodes directory with _barcode.txt files (generated by ExtractIlluminaBarcodes). If not set, use BASECALLS_DIR. Default value: null. |
LANE (Integer) | Lane number. Required. |
OUTPUT (File) | Deprecated (use LIBRARY_PARAMS). The output SAM or BAM file. Format is determined by extension. Required. Cannot be used in conjuction with option(s) BARCODE_PARAMS LIBRARY_PARAMS |
RUN_BARCODE (String) | The barcode of the run. Prefixed to read names. Required. |
SAMPLE_ALIAS (String) | Deprecated (use LIBRARY_PARAMS). The name of the sequenced sample Required. Cannot be used in conjuction with option(s) BARCODE_PARAMS LIBRARY_PARAMS |
READ_GROUP_ID (String) | ID used to link RG header record with RG tag in SAM record. If these are unique in SAM files that get merged, merge performance is better. If not specified, READ_GROUP_ID will be set to <first 5 chars of RUN_BARCODE>.<LANE> . Default value: null. |
LIBRARY_NAME (String) | Deprecated (use LIBRARY_PARAMS). The name of the sequenced library Default value: null. Cannot be used in conjuction with option(s) BARCODE_PARAMS LIBRARY_PARAMS |
SEQUENCING_CENTER (String) | The name of the sequencing center that produced the reads. Used to set the RG.CN tag. Default value: BI. This option can be set to 'null' to clear the default value. |
RUN_START_DATE (Date) | The start date of the run. Default value: null. |
PLATFORM (String) | The name of the sequencing technology that produced the read. Default value: illumina. This option can be set to 'null' to clear the default value. |
READ_STRUCTURE (String) | A description of the logical structure of clusters in an Illumina Run, i.e. a description of the structure IlluminaBasecallsToSam assumes the data to be in. It should consist of integer/character pairs describing the number of cycles and the type of those cycles (B for Barcode, T for Template, and S for skip). E.g. If the input data consists of 80 base clusters and we provide a read structure of "36T8B8S28T" then, before being converted to SAM records those bases will be split into 4 reads where read one consists of 36 cycles of template, read two consists of 8 cycles of barcode, read three will be an 8 base read of skipped cycles and read four is another 28 cycle template read. The read consisting of skipped cycles would NOT be included in output SAM/BAM file read groups. Required. |
BARCODE_PARAMS (File) | Deprecated (use LIBRARY_PARAMS). Tab-separated file for creating all output BAMs for barcoded run with single IlluminaBasecallsToSam invocation. Columns are BARCODE, OUTPUT, SAMPLE_ALIAS, and LIBRARY_NAME. Row with BARCODE=N is used to specify a file for no barcode match Required. Cannot be used in conjuction with option(s) SAMPLE_ALIAS (ALIAS) LIBRARY_NAME (LIB) OUTPUT (O) LIBRARY_PARAMS |
LIBRARY_PARAMS (File) | Tab-separated file for creating all output BAMs for a lane with single IlluminaBasecallsToSam invocation. The columns are OUTPUT, SAMPLE_ALIAS, and LIBRARY_NAME, BARCODE_1, BARCODE_2 ... BARCODE_X where X = number of barcodes per cluster (optional). Row with BARCODE_1 set to 'N' is used to specify a file for no barcode match. You may also provide any 2 letter RG header attributes (excluding PU, CN, PL, and DT) as columns in this file and the values for those columns will be inserted into the RG tag for the BAM file created for a given row. Required. Cannot be used in conjuction with option(s) SAMPLE_ALIAS (ALIAS) LIBRARY_NAME (LIB) BARCODE_PARAMS OUTPUT (O) |
ADAPTERS_TO_CHECK (IlluminaAdapterPair) | Which adapters to look for in the read. Default value: [INDEXED, DUAL_INDEXED, NEXTERA_V2, FLUIDIGM]. This option can be set to 'null' to clear the default value. Possible values: {PAIRED_END, INDEXED, SINGLE_END, NEXTERA_V1, NEXTERA_V2, DUAL_INDEXED, FLUIDIGM, TRUSEQ_SMALLRNA, ALTERNATIVE_SINGLE_END} This option may be specified 0 or more times. This option can be set to 'null' to clear the default list. |
NUM_PROCESSORS (Integer) | The number of threads to run in parallel. If NUM_PROCESSORS = 0, number of cores is automatically set to the number of cores available on the machine. If NUM_PROCESSORS < 0, then the number of cores used will be the number available on the machine less NUM_PROCESSORS. Default value: 0. This option can be set to 'null' to clear the default value. |
FIRST_TILE (Integer) | If set, this is the first tile to be processed (used for debugging). Note that tiles are not processed in numerical order. Default value: null. |
TILE_LIMIT (Integer) | If set, process no more than this many tiles (used for debugging). Default value: null. |
FORCE_GC (Boolean) | If true, call System.gc() periodically. This is useful in cases in which the -Xmx value passed is larger than the available memory. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
APPLY_EAMSS_FILTER (Boolean) | Apply EAMSS filtering to identify inappropriately quality scored bases towards the ends of reads and convert their quality scores to Q2. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
MAX_READS_IN_RAM_PER_TILE (Integer) | Configure SortingCollections to store this many records before spilling to disk. For an indexed run, each SortingCollection gets this value/number of indices. Default value: 1200000. This option can be set to 'null' to clear the default value. |
MINIMUM_QUALITY (Integer) | The minimum quality (after transforming 0s to 1s) expected from reads. If qualities are lower than this value, an error is thrown.The default of 2 is what the Illumina's spec describes as the minimum, but in practice the value has been observed lower. Default value: 2. This option can be set to 'null' to clear the default value. |
INCLUDE_NON_PF_READS (Boolean) | Whether to include non-PF reads Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
IGNORE_UNEXPECTED_BARCODES (Boolean) | Whether to ignore reads whose barcodes are not found in LIBRARY_PARAMS. Useful when outputting BAMs for only a subset of the barcodes in a lane. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
Check that the files to provide the data specified by DATA_TYPES are available, exist, and are reasonably sized for every tile/cycle. Reasonably sized means non-zero sized for files that exist per tile and equal size for binary files that exist per cycle/per tile. CheckIlluminaDirectory DOES NOT check that the individual records in a file are well-formed.
Option | Description |
---|---|
BASECALLS_DIR (File) | The basecalls output directory. Required. |
DATA_TYPES (IlluminaDataType) | The data types that should be checked for each tile/cycle. If no values are provided then the data types checked are those required by IlluminaBaseCallsToSam (which is a superset of those used in ExtractIlluminaBarcodes). These data types vary slightly depending on whether or not the run is barcoded so READ_STRUCTURE should be the same as that which will be passed to IlluminaBasecallsToSam. If this option is left unspecified then both ExtractIlluminaBarcodes and IlluminaBaseCallsToSam should complete successfully UNLESS the individual records of the files themselves are spurious. Default value: null. Possible values: {Position, BaseCalls, QualityScores, PF, Barcodes} This option may be specified 0 or more times. |
READ_STRUCTURE (String) | A description of the logical structure of clusters in an Illumina Run, i.e. a description of the structure IlluminaBasecallsToSam assumes the data to be in. It should consist of integer/character pairs describing the number of cycles and the type of those cycles (B for Barcode, T for Template, and S for skip). E.g. If the input data consists of 80 base clusters and we provide a read structure of "36T8B8S28T" then, before being converted to SAM records those bases will be split into 4 reads where read one consists of 36 cycles of template, read two consists of 8 cycles of barcode, read three will be an 8 base read of skipped cycles and read four is another 28 cycle template read. The read consisting of skipped cycles would NOT be included in output SAM/BAM file read groups. Note: If you want to check whether or not a future IlluminaBasecallsToSam or ExtractIlluminaBarcodes run will fail then be sure to use the exact same READ_STRUCTURE that you would pass to these programs for this run. Required. |
LANES (Integer) | The number of the lane(s) to check. Default value: null. This option must be specified at least 1 times. |
TILE_NUMBERS (Integer) | The number(s) of the tile(s) to check. Default value: null. This option may be specified 0 or more times. |
FAKE_FILES (Boolean) | A flag to determine whether or not to create fake versions of the missing files. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
LINK_LOCS (Boolean) | A flag to create symlinks to the loc file for the X Ten for each tile. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
General tool for manipulating interval lists, including sorting, merging, padding, uniqueifying, and other set-theoretic operations. Default operation if given one or more inputs is to merge and sort them. Other options are controlled by arguments.
Option | Description |
---|---|
INPUT (File) | One or more interval lists. If multiple interval lists are provided the output is theresult of merging the inputs. Supported formats are interval_list and VCF. Default value: null. This option must be specified at least 1 times. |
OUTPUT (File) | The output interval list file to write (if SCATTER_COUNT is 1) or the directory into which to write the scattered interval sub-directories (if SCATTER_COUNT > 1) Default value: null. |
PADDING (Integer) | The amount to pad each end of the intervals by before other operations are undertaken. Negative numbers are allowed and indicate intervals should be shrunk. Resulting intervals < 0 bases long will be removed. Padding is applied to the interval lists <b> before </b> the ACTION is performed. Default value: 0. This option can be set to 'null' to clear the default value. |
UNIQUE (Boolean) | If true, merge overlapping and adjacent intervals to create a list of unique intervals. Implies SORT=true Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
SORT (Boolean) | If true, sort the resulting interval list by coordinate. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
ACTION (Action) | Action to take on inputs. Default value: CONCAT. This option can be set to 'null' to clear the default value. Possible values: {
CONCAT (The concatenation of all the INPUTs, no sorting or merging of overlapping/abutting intervals implied. Will result in an unsorted list unless requested otherwise.) UNION (Like CONCATENATE but with UNIQUE and SORT implied, the result being the set-wise union of all INPUTS.) INTERSECT (The sorted, uniqued set of all loci that are contained in all of the INPUTs.) SUBTRACT (Subtracts SECOND_INPUT from INPUT. The resulting loci are there in INPUT that are not in SECOND_INPUT) SYMDIFF (Find loci that are in INPUT or SECOND_INPUT but are not in both.) } |
SECOND_INPUT (File) | Second set of intervals for SUBTRACT and DIFFERENCE operations. Default value: null. This option may be specified 0 or more times. |
COMMENT (String) | One or more lines of comment to add to the header of the output file. Default value: null. This option may be specified 0 or more times. |
SCATTER_COUNT (Integer) | The number of files into which to scatter the resulting list by locus; in some situations, fewer intervals may be emitted. Note - if > 1, the resultant scattered intervals will be sorted and uniqued. The sort will be inverted if the INVERT flag is set. Default value: 1. This option can be set to 'null' to clear the default value. |
INCLUDE_FILTERED (Boolean) | Whether to include filtered variants in the vcf when generating an interval list from vcf Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
BREAK_BANDS_AT_MULTIPLES_OF (Integer) | If set to a positive value will create a new interval list with the original intervals broken up at integer multiples of this value. Set to 0 to NOT break up intervals Default value: 0. This option can be set to 'null' to clear the default value. |
SUBDIVISION_MODE (Mode) | Do not subdivide Default value: INTERVAL_SUBDIVISION. This option can be set to 'null' to clear the default value. Possible values: {INTERVAL_SUBDIVISION, BALANCING_WITHOUT_INTERVAL_SUBDIVISION, BALANCING_WITHOUT_INTERVAL_SUBDIVISION_WITH_OVERFLOW} |
INVERT (Boolean) | Produce the inverse list Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
Lifts a VCF over from one genome build to another using UCSC liftover. The output file will be sorted and indexed. Records may be rejected because they cannot be lifted over or because post-liftover the reference allele mismatches the target genome build. Rejected records will be emitted with filters to the REJECT file, on the source genome.
Option | Description |
---|---|
INPUT (File) | The input VCF/BCF file to be lifted over. Required. |
OUTPUT (File) | The output location to write the lifted over VCF/BCF to. Required. |
CHAIN (File) | The liftover chain file. See https://genome.ucsc.edu/goldenPath/help/chain.html for a description of chain files. See http://hgdownload.soe.ucsc.edu/downloads.html#terms for where to download chain files. Required. |
REJECT (File) | File to which to write rejected records. Required. |
REFERENCE_SEQUENCE (File) | The reference sequence (fasta) for the TARGET genome build. The fasta file must have an accompanying sqeuence dictionary (.dict file). Required. |
Reads a VCF/VCF.gz/BCF and removes all genotype information from it while retaining all site level information, including annotations based on genotypes (e.g. AN, AF). Output an be any support variant format including .vcf, .vcf.gz or .bcf.
Option | Description |
---|---|
INPUT (File) | Input VCF or BCF Required. |
OUTPUT (File) | Output VCF or BCF to emit without per-sample info. Required. |
SAMPLE (String) | Optionally one or more samples to retain when building the 'sites-only' VCF. Default value: null. This option may be specified 0 or more times. |
Examines aligned records in the supplied SAM or BAM file to locate duplicate molecules. All records are then written to the output file with the duplicate records flagged.
Option | Description |
---|---|
MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP (Integer) | This option is obsolete. ReadEnds will always be spilled to disk. Default value: 50000. This option can be set to 'null' to clear the default value. |
MAX_FILE_HANDLES_FOR_READ_ENDS_MAP (Integer) | Maximum number of file handles to keep open when spilling read ends to disk. Set this number a little lower than the per-process maximum number of file that may be open. This number can be found by executing the 'ulimit -n' command on a Unix system. Default value: 8000. This option can be set to 'null' to clear the default value. |
SORTING_COLLECTION_SIZE_RATIO (Double) | This number, plus the maximum RAM available to the JVM, determine the memory footprint used by some of the sorting collections. If you are running out of memory, try reducing this number. Default value: 0.25. This option can be set to 'null' to clear the default value. |
BARCODE_TAG (String) | Barcode SAM tag (ex. BC for 10X Genomics) Default value: null. |
READ_ONE_BARCODE_TAG (String) | Read one barcode SAM tag (ex. BX for 10X Genomics) Default value: null. |
READ_TWO_BARCODE_TAG (String) | Read two barcode SAM tag (ex. BX for 10X Genomics) Default value: null. |
INPUT (String) | One or more input SAM or BAM files to analyze. Must be coordinate sorted. Default value: null. This option may be specified 0 or more times. |
OUTPUT (File) | The output file to write marked records to Required. |
METRICS_FILE (File) | File to write duplication metrics to Required. |
PROGRAM_RECORD_ID (String) | The program record ID for the @PG record(s) created by this program. Set to null to disable PG record creation. This string may have a suffix appended to avoid collision with other program record IDs. Default value: MarkDuplicates. This option can be set to 'null' to clear the default value. |
PROGRAM_GROUP_VERSION (String) | Value of VN tag of PG record to be created. If not specified, the version will be detected automatically. Default value: null. |
PROGRAM_GROUP_COMMAND_LINE (String) | Value of CL tag of PG record to be created. If not supplied the command line will be detected automatically. Default value: null. |
PROGRAM_GROUP_NAME (String) | Value of PN tag of PG record to be created. Default value: MarkDuplicates. This option can be set to 'null' to clear the default value. |
COMMENT (String) | Comment(s) to include in the output file's header. Default value: null. This option may be specified 0 or more times. |
REMOVE_DUPLICATES (Boolean) | If true do not write duplicates to the output file instead of writing them with appropriate flags set. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
ASSUME_SORTED (Boolean) | If true, assume that the input file is coordinate sorted even if the header says otherwise. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
DUPLICATE_SCORING_STRATEGY (ScoringStrategy) | The scoring strategy for choosing the non-duplicate among candidates. Default value: SUM_OF_BASE_QUALITIES. This option can be set to 'null' to clear the default value. Possible values: {SUM_OF_BASE_QUALITIES, TOTAL_MAPPED_REFERENCE_LENGTH} |
READ_NAME_REGEX (String) | Regular expression that can be used to parse read names in the incoming SAM file. Read names are parsed to extract three variables: tile/region, x coordinate and y coordinate. These values are used to estimate the rate of optical duplication in order to give a more accurate estimated library size. Set this option to null to disable optical duplicate detection. The regular expression should contain three capture groups for the three variables, in order. It must match the entire read name. Note that if the default regex is specified, a regex match is not actually done, but instead the read name is split on colon character. For 5 element names, the 3rd, 4th and 5th elements are assumed to be tile, x and y values. For 7 element names (CASAVA 1.8), the 5th, 6th, and 7th elements are assumed to be tile, x and y values. Default value: [a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).*. This option can be set to 'null' to clear the default value. |
OPTICAL_DUPLICATE_PIXEL_DISTANCE (Integer) | The maximum offset between two duplicte clusters in order to consider them optical duplicates. This should usually be set to some fairly small number (e.g. 5-10 pixels) unless using later versions of the Illumina pipeline that multiply pixel values by 10, in which case 50-100 is more normal. Default value: 100. This option can be set to 'null' to clear the default value. |
Examines aligned records in the supplied SAM or BAM file to locate duplicate molecules. All records are then written to the output file with the duplicate records flagged.
Option | Description |
---|---|
MINIMUM_DISTANCE (Integer) | The minimum distance to buffer records to account for clipping on the 5' end of the records.Set this number to -1 to use twice the first read's read length (or 100, whichever is smaller). Default value: -1. This option can be set to 'null' to clear the default value. |
SKIP_PAIRS_WITH_NO_MATE_CIGAR (Boolean) | Skip record pairs with no mate cigar and include them in the output. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
BLOCK_SIZE (Integer) | The block size for use in the coordinate-sorted record buffer. Default value: 100000. This option can be set to 'null' to clear the default value. |
INPUT (String) | One or more input SAM or BAM files to analyze. Must be coordinate sorted. Default value: null. This option may be specified 0 or more times. |
OUTPUT (File) | The output file to write marked records to Required. |
METRICS_FILE (File) | File to write duplication metrics to Required. |
PROGRAM_RECORD_ID (String) | The program record ID for the @PG record(s) created by this program. Set to null to disable PG record creation. This string may have a suffix appended to avoid collision with other program record IDs. Default value: MarkDuplicates. This option can be set to 'null' to clear the default value. |
PROGRAM_GROUP_VERSION (String) | Value of VN tag of PG record to be created. If not specified, the version will be detected automatically. Default value: null. |
PROGRAM_GROUP_COMMAND_LINE (String) | Value of CL tag of PG record to be created. If not supplied the command line will be detected automatically. Default value: null. |
PROGRAM_GROUP_NAME (String) | Value of PN tag of PG record to be created. Default value: MarkDuplicatesWithMateCigar. This option can be set to 'null' to clear the default value. |
COMMENT (String) | Comment(s) to include in the output file's header. Default value: null. This option may be specified 0 or more times. |
REMOVE_DUPLICATES (Boolean) | If true do not write duplicates to the output file instead of writing them with appropriate flags set. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
ASSUME_SORTED (Boolean) | If true, assume that the input file is coordinate sorted even if the header says otherwise. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
DUPLICATE_SCORING_STRATEGY (ScoringStrategy) | The scoring strategy for choosing the non-duplicate among candidates. Default value: TOTAL_MAPPED_REFERENCE_LENGTH. This option can be set to 'null' to clear the default value. Possible values: {SUM_OF_BASE_QUALITIES, TOTAL_MAPPED_REFERENCE_LENGTH} |
READ_NAME_REGEX (String) | Regular expression that can be used to parse read names in the incoming SAM file. Read names are parsed to extract three variables: tile/region, x coordinate and y coordinate. These values are used to estimate the rate of optical duplication in order to give a more accurate estimated library size. Set this option to null to disable optical duplicate detection. The regular expression should contain three capture groups for the three variables, in order. It must match the entire read name. Note that if the default regex is specified, a regex match is not actually done, but instead the read name is split on colon character. For 5 element names, the 3rd, 4th and 5th elements are assumed to be tile, x and y values. For 7 element names (CASAVA 1.8), the 5th, 6th, and 7th elements are assumed to be tile, x and y values. Default value: [a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).*. This option can be set to 'null' to clear the default value. |
OPTICAL_DUPLICATE_PIXEL_DISTANCE (Integer) | The maximum offset between two duplicte clusters in order to consider them optical duplicates. This should usually be set to some fairly small number (e.g. 5-10 pixels) unless using later versions of the Illumina pipeline that multiply pixel values by 10, in which case 50-100 is more normal. Default value: 100. This option can be set to 'null' to clear the default value. |
Program to generate a data table and pdf chart of mean base quality by cycle from a SAM or BAM file. Works best on a single lane/run of data, but can be applied tomerged BAMs. Uses R to generate chart output.
Option | Description |
---|---|
CHART_OUTPUT (File) | A file (with .pdf extension) to write the chart to. Required. |
ALIGNED_READS_ONLY (Boolean) | If set to true, calculate mean quality over aligned reads only. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
PF_READS_ONLY (Boolean) | If set to true calculate mean quality over PF reads only. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
INPUT (File) | Input SAM or BAM file. Required. |
OUTPUT (File) | File to write the output to. Required. |
ASSUME_SORTED (Boolean) | If true (default), then the sort order in the header file will be ignored. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
STOP_AFTER (Long) | Stop after processing N reads, mainly for debugging. Default value: 0. This option can be set to 'null' to clear the default value. |
Merges alignment data from a SAM or BAM file with additional data stored in an unmapped BAM file and produces a third SAM or BAM file of aligned and unaligned reads. The purpose is to use information from the unmapped BAM to fix up aligner output, so that the resulting file is valid for use by other Picard programs. For simple BAM file merges, use MergeSamFiles. NOTE that MergeBamAlignment expects to find a sequence dictionary in the same directory as REFERENCE_SEQUENCE and expects it to have the same base name as the reference fasta except with the extension '.dict'
Option | Description |
---|---|
UNMAPPED_BAM (File) | Original SAM or BAM file of unmapped reads, which must be in queryname order. Required. |
ALIGNED_BAM (File) | SAM or BAM file(s) with alignment data. Default value: null. This option may be specified 0 or more times. Cannot be used in conjuction with option(s) READ1_ALIGNED_BAM (R1_ALIGNED) READ2_ALIGNED_BAM (R2_ALIGNED) |
READ1_ALIGNED_BAM (File) | SAM or BAM file(s) with alignment data from the first read of a pair. Default value: null. This option may be specified 0 or more times. Cannot be used in conjuction with option(s) ALIGNED_BAM (ALIGNED) |
READ2_ALIGNED_BAM (File) | SAM or BAM file(s) with alignment data from the second read of a pair. Default value: null. This option may be specified 0 or more times. Cannot be used in conjuction with option(s) ALIGNED_BAM (ALIGNED) |
OUTPUT (File) | Merged SAM or BAM file to write to. Required. |
REFERENCE_SEQUENCE (File) | Path to the fasta file for the reference sequence. Required. |
PROGRAM_RECORD_ID (String) | The program group ID of the aligner (if not supplied by the aligned file). Default value: null. |
PROGRAM_GROUP_VERSION (String) | The version of the program group (if not supplied by the aligned file). Default value: null. |
PROGRAM_GROUP_COMMAND_LINE (String) | The command line of the program group (if not supplied by the aligned file). Default value: null. |
PROGRAM_GROUP_NAME (String) | The name of the program group (if not supplied by the aligned file). Default value: null. |
PAIRED_RUN (Boolean) | This argument is ignored and will be removed. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
JUMP_SIZE (Integer) | The expected jump size (required if this is a jumping library). Deprecated. Use EXPECTED_ORIENTATIONS instead Default value: null. Cannot be used in conjuction with option(s) EXPECTED_ORIENTATIONS (ORIENTATIONS) |
CLIP_ADAPTERS (Boolean) | Whether to clip adapters where identified. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
IS_BISULFITE_SEQUENCE (Boolean) | Whether the lane is bisulfite sequence (used when caculating the NM tag). Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
ALIGNED_READS_ONLY (Boolean) | Whether to output only aligned reads. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
MAX_INSERTIONS_OR_DELETIONS (Integer) | The maximum number of insertions or deletions permitted for an alignment to be included. Alignments with more than this many insertions or deletions will be ignored. Set to -1 to allow any number of insertions or deletions. Default value: 1. This option can be set to 'null' to clear the default value. |
ATTRIBUTES_TO_RETAIN (String) | Reserved alignment attributes (tags starting with X, Y, or Z) that should be brought over from the alignment data when merging. Default value: null. This option may be specified 0 or more times. |
ATTRIBUTES_TO_REMOVE (String) | Attributes from the alignment record that should be removed when merging. This overrides ATTRIBUTES_TO_RETAIN if they share common tags. Default value: null. This option may be specified 0 or more times. |
READ1_TRIM (Integer) | The number of bases trimmed from the beginning of read 1 prior to alignment Default value: 0. This option can be set to 'null' to clear the default value. |
READ2_TRIM (Integer) | The number of bases trimmed from the beginning of read 2 prior to alignment Default value: 0. This option can be set to 'null' to clear the default value. |
EXPECTED_ORIENTATIONS (PairOrientation) | The expected orientation of proper read pairs. Replaces JUMP_SIZE Default value: null. Possible values: {FR, RF, TANDEM} This option may be specified 0 or more times. Cannot be used in conjuction with option(s) JUMP_SIZE (JUMP) |
ALIGNER_PROPER_PAIR_FLAGS (Boolean) | Use the aligner's idea of what a proper pair is rather than computing in this program. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
SORT_ORDER (SortOrder) | The order in which the merged reads should be output. Default value: coordinate. This option can be set to 'null' to clear the default value. Possible values: {unsorted, queryname, coordinate, duplicate} |
PRIMARY_ALIGNMENT_STRATEGY (PrimaryAlignmentStrategy) | Strategy for selecting primary alignment when the aligner has provided more than one alignment for a pair or fragment, and none are marked as primary, more than one is marked as primary, or the primary alignment is filtered out for some reason. BestMapq expects that multiple alignments will be correlated with HI tag, and prefers the pair of alignments with the largest MAPQ, in the absence of a primary selected by the aligner. EarliestFragment prefers the alignment which maps the earliest base in the read. Note that EarliestFragment may not be used for paired reads. BestEndMapq is appropriate for cases in which the aligner is not pair-aware, and does not output the HI tag. It simply picks the alignment for each end with the highest MAPQ, and makes those alignments primary, regardless of whether the two alignments make sense together.MostDistant is also for a non-pair-aware aligner, and picks the alignment pair with the largest insert size. If all alignments would be chimeric, it picks the alignments for each end with the best MAPQ. For all algorithms, ties are resolved arbitrarily. Default value: BestMapq. This option can be set to 'null' to clear the default value. Possible values: {BestMapq, EarliestFragment, BestEndMapq, MostDistant} |
CLIP_OVERLAPPING_READS (Boolean) | For paired reads, soft clip the 3' end of each read if necessary so that it does not extend past the 5' end of its mate. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
INCLUDE_SECONDARY_ALIGNMENTS (Boolean) | If false, do not write secondary alignments to output. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
ADD_MATE_CIGAR (Boolean) | Adds the mate CIGAR tag (MC) if true, does not if false. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
UNMAP_CONTAMINANT_READS (Boolean) | Detect reads originating from foreign organisms (e.g. bacterial DNA in a non-bacterial sample),and unmap + label those reads accordingly. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
MIN_UNCLIPPED_BASES (Integer) | If UNMAP_CONTAMINANT_READS is set, require this many unclipped bases or else the read will be marked as contaminant. Default value: 32. This option can be set to 'null' to clear the default value. |
Merges multiple SAM/BAM files into one file.
Option | Description |
---|---|
INPUT (File) | SAM or BAM input file Default value: null. This option must be specified at least 1 times. |
OUTPUT (File) | SAM or BAM file to write merged result to Required. |
SORT_ORDER (SortOrder) | Sort order of output file Default value: coordinate. This option can be set to 'null' to clear the default value. Possible values: {unsorted, queryname, coordinate, duplicate} |
ASSUME_SORTED (Boolean) | If true, assume that the input files are in the same sort order as the requested output sort order, even if their headers say otherwise. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
MERGE_SEQUENCE_DICTIONARIES (Boolean) | Merge the sequence dictionaries Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
USE_THREADING (Boolean) | Option to create a background thread to encode, compress and write to disk the output file. The threaded version uses about 20% more CPU and decreases runtime by ~20% when writing out a compressed BAM file. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
COMMENT (String) | Comment(s) to include in the merged output file's header. Default value: null. This option may be specified 0 or more times. |
Merges multiple VCF or BCF files into one VCF file. Input files must be sorted by their contigs and, within contigs, by start position. The input files must have the same sample and contig lists. An index file is created and a sequence dictionary is required by default.
Option | Description |
---|---|
INPUT (File) | VCF or BCF input files File format is determined by file extension. Default value: null. This option must be specified at least 1 times. |
OUTPUT (File) | The merged VCF or BCF file. File format is determined by file extension. Required. |
SEQUENCE_DICTIONARY (File) | The index sequence dictionary to use instead of the sequence dictionary in the input file Default value: null. |
Takes any file that conforms to the fasta format and normalizes it so that all lines of sequence except the last line per named sequence are of the same length.
Option | Description |
---|---|
INPUT (File) | The input fasta file to normalize. Required. |
OUTPUT (File) | The output fasta file to write. Required. |
LINE_LENGTH (Integer) | The line length to be used for the output fasta file. Default value: 100. This option can be set to 'null' to clear the default value. |
TRUNCATE_SEQUENCE_NAMES_AT_WHITESPACE (Boolean) | Truncate sequence names at first whitespace. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
Extracts one or more intervals described in an interval_list file from a given reference sequence and writes them out in FASTA format. Requires a fasta index file to be present.
Option | Description |
---|---|
INTERVAL_LIST (File) | Interval list describing intervals to be extracted from the reference sequence. Required. |
REFERENCE_SEQUENCE (File) | Reference sequence file. Required. |
OUTPUT (File) | Output fasta file. Required. |
LINE_LENGTH (Integer) | Maximum line length for sequence data. Default value: 80. This option can be set to 'null' to clear the default value. |
Program to chart quality score distributions in a SAM or BAM file.
Option | Description |
---|---|
CHART_OUTPUT (File) | A file (with .pdf extension) to write the chart to. Required. |
ALIGNED_READS_ONLY (Boolean) | If set to true calculate mean quality over aligned reads only. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
PF_READS_ONLY (Boolean) | If set to true calculate mean quality over PF reads only. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
INCLUDE_NO_CALLS (Boolean) | If set to true, include quality for no-call bases in the distribution. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
INPUT (File) | Input SAM or BAM file. Required. |
OUTPUT (File) | File to write the output to. Required. |
ASSUME_SORTED (Boolean) | If true (default), then the sort order in the header file will be ignored. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
STOP_AFTER (Long) | Stop after processing N reads, mainly for debugging. Default value: 0. This option can be set to 'null' to clear the default value. |
Not to be confused with SortSam which sorts a SAM or BAM file with a valid sequence dictionary, ReorderSam reorders reads in a SAM/BAM file to match the contig ordering in a provided reference file, as determined by exact name matching of contigs. Reads mapped to contigs absent in the new reference are dropped. Runs substantially faster if the input is an indexed BAM file.
Option | Description |
---|---|
INPUT (File) | Input file (bam or sam) to extract reads from. Required. |
OUTPUT (File) | Output file (bam or sam) to write extracted reads to. Required. |
REFERENCE (File) | Reference sequence to reorder reads to match. A sequence dictionary corresponding to the reference fasta is required. Create one with CreateSequenceDictionary.jar. Required. |
ALLOW_INCOMPLETE_DICT_CONCORDANCE (Boolean) | If true, then allows only a partial overlap of the BAM contigs with the new reference sequence contigs. By default, this tool requires a corresponding contig in the new reference for each read contig Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
ALLOW_CONTIG_LENGTH_DISCORDANCE (Boolean) | If true, then permits mapping from a read contig to a new reference contig with the same name but a different length. Highly dangerous, only use if you know what you are doing. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
Replace the SAMFileHeader in a SAM file with the given header. Validation is minimal. It is up to the user to ensure that all the elements referred to in the SAMRecords are present in the new header. Sort order of the two input files must be the same.
Option | Description |
---|---|
INPUT (File) | SAM file from which SAMRecords will be read. Required. |
HEADER (File) | SAM file from which SAMFileHeader will be read. Required. |
OUTPUT (File) | SAMFileHeader from HEADER file will be written to this file, followed by SAMRecords from INPUT file Required. |
Reverts SAM or BAM files to a previous state by removing certain types of information and/or substituting in the original quality scores when available.
Option | Description |
---|---|
INPUT (File) | The input SAM/BAM file to revert the state of. Required. |
OUTPUT (File) | The output SAM/BAM file to create. Required. |
SORT_ORDER (SortOrder) | The sort order to create the reverted output file with. Default value: queryname. This option can be set to 'null' to clear the default value. Possible values: {unsorted, queryname, coordinate, duplicate} |
RESTORE_ORIGINAL_QUALITIES (Boolean) | True to restore original qualities from the OQ field to the QUAL field if available. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
REMOVE_DUPLICATE_INFORMATION (Boolean) | Remove duplicate read flags from all reads. Note that if this is true and REMOVE_ALIGNMENT_INFORMATION==false, the output may have the unusual but sometimes desirable trait of having unmapped reads that are marked as duplicates. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
REMOVE_ALIGNMENT_INFORMATION (Boolean) | Remove all alignment information from the file. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
ATTRIBUTE_TO_CLEAR (String) | When removing alignment information, the set of optional tags to remove. Default value: [NM, UQ, PG, MD, MQ, SA, MC, AS]. This option can be set to 'null' to clear the default value. This option may be specified 0 or more times. This option can be set to 'null' to clear the default list. |
SANITIZE (Boolean) | WARNING: This option is potentially destructive. If enabled will discard reads in order to produce a consistent output BAM. Reads discarded include (but are not limited to) paired reads with missing mates, duplicated records, records with mismatches in length of bases and qualities. This option can only be enabled if the output sort order is queryname and will always cause sorting to occur. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
MAX_DISCARD_FRACTION (Double) | If SANITIZE=true and higher than MAX_DISCARD_FRACTION reads are discarded due to sanitization thenthe program will exit with an Exception instead of exiting cleanly. Output BAM will still be valid. Default value: 0.01. This option can be set to 'null' to clear the default value. |
SAMPLE_ALIAS (String) | The sample alias to use in the reverted output file. This will override the existing sample alias in the file and is used only if all the read groups in the input file have the same sample alias Default value: null. |
LIBRARY_NAME (String) | The library name to use in the reverted output file. This will override the existing sample alias in the file and is used only if all the read groups in the input file have the same sample alias Default value: null. |
Reverts the original base qualities and adds the mate cigar tag to read-group BAMs.
Option | Description |
---|---|
INPUT (File) | The input SAM/BAM file to revert the state of. Required. |
OUTPUT (File) | The output SAM/BAM file to create. Required. |
SORT_ORDER (SortOrder) | The sort order to create the reverted output file with.By default, the sort order will be the same as the input. Default value: null. Possible values: {unsorted, queryname, coordinate, duplicate} |
RESTORE_ORIGINAL_QUALITIES (Boolean) | True to restore original qualities from the OQ field to the QUAL field if available. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
MAX_RECORDS_TO_EXAMINE (Integer) | The maximum number of records to examine to determine if we can exit early and not output, given that there are a no original base qualities (if we are to restore) and mate cigars exist. Set to 0 to never skip the file. Default value: 10000. This option can be set to 'null' to clear the default value. |
Convert a BAM file to a SAM file, or SAM to BAM. Input and output formats are determined by file extension.
Option | Description |
---|---|
INPUT (File) | The BAM or SAM file to parse. Required. |
OUTPUT (File) | The BAM or SAM output file. Required. |
Extracts read sequences and qualities from the input SAM/BAM file and writes them into the output file in Sanger fastq format. In the RC mode (default is True), if the read is aligned and the alignment is to the reverse strand on the genome, the read's sequence from input SAM file will be reverse-complemented prior to writing it to fastq in order restore correctlythe original read sequence as it was generated by the sequencer.
Option | Description |
---|---|
INPUT (File) | Input SAM/BAM file to extract reads from Required. |
FASTQ (File) | Output fastq file (single-end fastq or, if paired, first end of the pair fastq). Required. Cannot be used in conjuction with option(s) OUTPUT_PER_RG (OPRG) |
SECOND_END_FASTQ (File) | Output fastq file (if paired, second end of the pair fastq). Default value: null. Cannot be used in conjuction with option(s) OUTPUT_PER_RG (OPRG) |
UNPAIRED_FASTQ (File) | Output fastq file for unpaired reads; may only be provided in paired-fastq mode Default value: null. Cannot be used in conjuction with option(s) OUTPUT_PER_RG (OPRG) |
OUTPUT_PER_RG (Boolean) | Output a fastq file per read group (two fastq files per read group if the group is paired). Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} Cannot be used in conjuction with option(s) SECOND_END_FASTQ (F2) UNPAIRED_FASTQ (FU) FASTQ (F) |
RG_TAG (String) | The read group tag (PU or ID) to be used to output a fastq file per read group. Default value: PU. This option can be set to 'null' to clear the default value. |
OUTPUT_DIR (File) | Directory in which to output the fastq file(s). Used only when OUTPUT_PER_RG is true. Default value: null. |
RE_REVERSE (Boolean) | Re-reverse bases and qualities of reads with negative strand flag set before writing them to fastq Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
INTERLEAVE (Boolean) | Will generate an interleaved fastq if paired, each line will have /1 or /2 to describe which end it came from Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
INCLUDE_NON_PF_READS (Boolean) | Include non-PF reads from the SAM file into the output FASTQ files. PF means 'passes filtering'. Reads whose 'not passing quality controls' flag is set are non-PF reads. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
CLIPPING_ATTRIBUTE (String) | The attribute that stores the position at which the SAM record should be clipped Default value: null. |
CLIPPING_ACTION (String) | The action that should be taken with clipped reads: 'X' means the reads and qualities should be trimmed at the clipped position; 'N' means the bases should be changed to Ns in the clipped region; and any integer means that the base qualities should be set to that value in the clipped region. Default value: null. |
READ1_TRIM (Integer) | The number of bases to trim from the beginning of read 1. Default value: 0. This option can be set to 'null' to clear the default value. |
READ1_MAX_BASES_TO_WRITE (Integer) | The maximum number of bases to write from read 1 after trimming. If there are fewer than this many bases left after trimming, all will be written. If this value is null then all bases left after trimming will be written. Default value: null. |
READ2_TRIM (Integer) | The number of bases to trim from the beginning of read 2. Default value: 0. This option can be set to 'null' to clear the default value. |
READ2_MAX_BASES_TO_WRITE (Integer) | The maximum number of bases to write from read 2 after trimming. If there are fewer than this many bases left after trimming, all will be written. If this value is null then all bases left after trimming will be written. Default value: null. |
INCLUDE_NON_PRIMARY_ALIGNMENTS (Boolean) | If true, include non-primary alignments in the output. Support of non-primary alignments in SamToFastq is not comprehensive, so there may be exceptions if this is set to true and there are paired reads with non-primary alignments. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
Sorts the input SAM or BAM. Input and output formats are determined by file extension.
Option | Description |
---|---|
INPUT (File) | The BAM or SAM file to sort. Required. |
OUTPUT (File) | The sorted BAM or SAM output file. Required. |
SORT_ORDER (SortOrder) | Sort order of output file Required. Possible values: {unsorted, queryname, coordinate, duplicate} |
Sorts one or more VCF files according to the order of the contigs in the header/sequence dictionary and then by coordinate. Can accept an external sequence dictionary. If no external dictionary is supplied, multiple inputs' headers must have the same sequence dictionaries. Multiple inputs must have the same sample names (in order)
Option | Description |
---|---|
INPUT (File) | Input VCF(s) to be sorted. Multiple inputs must have the same sample names (in order) Default value: null. This option may be specified 0 or more times. |
OUTPUT (File) | Output VCF to be written. Required. |
SEQUENCE_DICTIONARY (File) | Default value: null. |
Takes a VCF and a second file that contains a sequence dictionary and updates the VCF with the new sequence dictionary.
Option | Description |
---|---|
INPUT (File) | Input VCF Required. |
OUTPUT (File) | Output VCF to be written. Required. |
SEQUENCE_DICTIONARY (File) | A Sequence Dictionary (can be read from one of the following file types (SAM, BAM, VCF, BCF, Interval List, Fasta, or Dict) Required. |
Convert a VCF file to a BCF file, or BCF to VCF. Input and output formats are determined by file extension.
Option | Description |
---|---|
INPUT (File) | The BCF or VCF input file. The file format is determined by file extension. Required. |
OUTPUT (File) | The BCF or VCF output file. The file format is determined by file extension. Required. |
REQUIRE_INDEX (Boolean) | Fail if an index is not available for the input VCF/BCF Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
Reads a SAM or BAM file and rewrites it with new adapter-trimming tags. Clear any existing adapter-trimming tags (XT:i:). Only works for unaligned files in query-name order. Note: This is a utility program and will not be run in the pipeline.
Option | Description |
---|---|
INPUT (File) | Required. |
OUTPUT (File) | If output is not specified, just the metrics are generated Default value: null. |
METRICS (File) | Histogram showing counts of bases_clipped in how many reads Required. |
MIN_MATCH_BASES_SE (Integer) | The minimum number of bases to match over when clipping single-end reads. Default value: 12. This option can be set to 'null' to clear the default value. |
MIN_MATCH_BASES_PE (Integer) | The minimum number of bases to match over (per-read) when clipping paired-end reads. Default value: 6. This option can be set to 'null' to clear the default value. |
MAX_ERROR_RATE_SE (Double) | The maximum mismatch error rate to tolerate when clipping single-end reads. Default value: 0.1. This option can be set to 'null' to clear the default value. |
MAX_ERROR_RATE_PE (Double) | The maximum mismatch error rate to tolerate when clipping paired-end reads. Default value: 0.1. This option can be set to 'null' to clear the default value. |
PAIRED_RUN (Boolean) | DEPRECATED. Whether this is a paired-end run. No longer used. Default value: null. Possible values: {true, false} |
ADAPTERS (IlluminaAdapterPair) | Which adapters sequences to attempt to identify and clip. Default value: [INDEXED, DUAL_INDEXED, PAIRED_END]. This option can be set to 'null' to clear the default value. Possible values: {PAIRED_END, INDEXED, SINGLE_END, NEXTERA_V1, NEXTERA_V2, DUAL_INDEXED, FLUIDIGM, TRUSEQ_SMALLRNA, ALTERNATIVE_SINGLE_END} This option may be specified 0 or more times. This option can be set to 'null' to clear the default list. |
FIVE_PRIME_ADAPTER (String) | For specifying adapters other than standard Illumina Default value: null. |
THREE_PRIME_ADAPTER (String) | For specifying adapters other than standard Illumina Default value: null. |
ADAPTER_TRUNCATION_LENGTH (Integer) | Adapters are truncated to this length to speed adapter matching. Set to a large number to effectively disable truncation. Default value: 30. This option can be set to 'null' to clear the default value. |
PRUNE_ADAPTER_LIST_AFTER_THIS_MANY_ADAPTERS_SEEN (Integer) | If looking for multiple adapter sequences, then after having seen this many adapters, shorten the list of sequences. Keep the adapters that were found most frequently in the input so far. Set to -1 if the input has a heterogeneous mix of adapters so shortening is undesirable. Default value: 100. This option can be set to 'null' to clear the default value. |
NUM_ADAPTERS_TO_KEEP (Integer) | If pruning the adapter list, keep only this many adapter sequences when pruning the list (plus any adapters that were tied with the adapters being kept). Default value: 1. This option can be set to 'null' to clear the default value. |
Splits an input VCF or BCF file into two VCF files, one for indel records and one for SNPs. Theheaders of the two output files will be identical. An index file is created and asequence dictionary is required by default.
Option | Description |
---|---|
INPUT (File) | The VCF or BCF input file Required. |
SNP_OUTPUT (File) | The VCF or BCF file to which SNP records should be written. The file format is determined by file extension. Required. |
INDEL_OUTPUT (File) | The VCF or BCF file to which indel records should be written. The file format is determined by file extension. Required. |
SEQUENCE_DICTIONARY (File) | The index sequence dictionary to use instead of the sequence dictionaries in the input files Default value: null. |
STRICT (Boolean) | If true an exception will be thrown if an event type other than SNP or indel is encountered Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
Read a SAM or BAM file and report on its validity.
Option | Description |
---|---|
INPUT (File) | Input SAM/BAM file Required. |
OUTPUT (File) | Output file or standard out if missing Default value: null. |
MODE (Mode) | Mode of output Default value: VERBOSE. This option can be set to 'null' to clear the default value. Possible values: {VERBOSE, SUMMARY} |
IGNORE (Type) | List of validation error types to ignore. Default value: null. Possible values: {INVALID_QUALITY_FORMAT, INVALID_FLAG_PROPER_PAIR, INVALID_FLAG_MATE_UNMAPPED, MISMATCH_FLAG_MATE_UNMAPPED, INVALID_FLAG_MATE_NEG_STRAND, MISMATCH_FLAG_MATE_NEG_STRAND, INVALID_FLAG_FIRST_OF_PAIR, INVALID_FLAG_SECOND_OF_PAIR, PAIRED_READ_NOT_MARKED_AS_FIRST_OR_SECOND, INVALID_FLAG_NOT_PRIM_ALIGNMENT, INVALID_FLAG_SUPPLEMENTARY_ALIGNMENT, INVALID_FLAG_READ_UNMAPPED, INVALID_INSERT_SIZE, INVALID_MAPPING_QUALITY, INVALID_CIGAR, ADJACENT_INDEL_IN_CIGAR, INVALID_MATE_REF_INDEX, MISMATCH_MATE_REF_INDEX, INVALID_REFERENCE_INDEX, INVALID_ALIGNMENT_START, MISMATCH_MATE_ALIGNMENT_START, MATE_FIELD_MISMATCH, INVALID_TAG_NM, MISSING_TAG_NM, MISSING_HEADER, MISSING_SEQUENCE_DICTIONARY, MISSING_READ_GROUP, RECORD_OUT_OF_ORDER, READ_GROUP_NOT_FOUND, RECORD_MISSING_READ_GROUP, INVALID_INDEXING_BIN, MISSING_VERSION_NUMBER, INVALID_VERSION_NUMBER, TRUNCATED_FILE, MISMATCH_READ_LENGTH_AND_QUALS_LENGTH, EMPTY_READ, CIGAR_MAPS_OFF_REFERENCE, MISMATCH_READ_LENGTH_AND_E2_LENGTH, MISMATCH_READ_LENGTH_AND_U2_LENGTH, E2_BASE_EQUALS_PRIMARY_BASE, BAM_FILE_MISSING_TERMINATOR_BLOCK, UNRECOGNIZED_HEADER_TYPE, POORLY_FORMATTED_HEADER_TAG, HEADER_TAG_MULTIPLY_DEFINED, HEADER_RECORD_MISSING_REQUIRED_TAG, INVALID_DATE_STRING, TAG_VALUE_TOO_LARGE, INVALID_INDEX_FILE_POINTER, INVALID_PREDICTED_MEDIAN_INSERT_SIZE, DUPLICATE_READ_GROUP_ID, MISSING_PLATFORM_VALUE, INVALID_PLATFORM_VALUE, DUPLICATE_PROGRAM_GROUP_ID, MATE_NOT_FOUND, MATES_ARE_SAME_END, MISMATCH_MATE_CIGAR_STRING, MATE_CIGAR_STRING_INVALID_PRESENCE} This option may be specified 0 or more times. |
MAX_OUTPUT (Integer) | The maximum number of lines output in verbose mode Default value: 100. This option can be set to 'null' to clear the default value. |
IGNORE_WARNINGS (Boolean) | If true, only report errors and ignore warnings. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
VALIDATE_INDEX (Boolean) | If true and input is a BAM file with an index file, also validates the index. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
IS_BISULFITE_SEQUENCED (Boolean) | Whether the SAM or BAM file consists of bisulfite sequenced reads. If so, C->T is not counted as an error in computing the value of the NM tag. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
MAX_OPEN_TEMP_FILES (Integer) | Relevant for a coordinate-sorted file containing read pairs only. Maximum number of file handles to keep open when spilling mate info to disk. Set this number a little lower than the per-process maximum number of file that may be open. This number can be found by executing the 'ulimit -n' command on a Unix system. Default value: 8000. This option can be set to 'null' to clear the default value. |
Prints a SAM or BAM file to the screen.
Option | Description |
---|---|
INPUT (String) | The SAM or BAM file or GA4GH url to view. Required. |
ALIGNMENT_STATUS (AlignmentStatus) | Print out all reads, just the aligned reads or just the unaligned reads. Default value: All. This option can be set to 'null' to clear the default value. Possible values: {Aligned, Unaligned, All} |
PF_STATUS (PfStatus) | Print out all reads, just the PF reads or just the non-PF reads. Default value: All. This option can be set to 'null' to clear the default value. Possible values: {PF, NonPF, All} |
HEADER_ONLY (Boolean) | Print the SAM header only. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
RECORDS_ONLY (Boolean) | Print the alignment records only. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
INTERVAL_LIST (File) | An intervals file used to restrict what records are output. Default value: null. |
Converts a VCF or BCF file to a Picard Interval List.
Option | Description |
---|---|
INPUT (File) | The BCF or VCF input file. The file format is determined by file extension. Required. |
OUTPUT (File) | The output Picard Interval List Required. |