VirusDetect with own genome


This tool runs the VirusDetect pipeline, that performs virus identification using sRNA sequencing data. Given a FASTQ file, it performs de novo assembly and reference-guided assembly by aligning sRNA reads to the reference database of known viruses. The assembled contigs are compared to the reference virus sequences for virus identification.

More detailed definition of Virus detect pipeline can be found from the home page of VirusDetect.

If possible, the input data should be cleaned from sequences originating from the genome of the host organism. This can be done by mapping the query sequences against the genome of the host organisms and selecting only those reads that do not match to the host genome.

If the host genonome is not available or it is available in Chipter, you should use the VirusDetect tool in stead of this tool.

If the genome of the host organism is not available in Chipster, but you have the host genome as a fasta formatted sequence file, you can use this tool to automatically calculate the required BWA indexes and the host genome filtering for your input data. You should note that it may take several hours to calculate BWA indexes for a large genome.

When the VirusDetect analysis is finished, the BWA indexes of the host genome are returned as one tar-formatted archive file. This archive file can be used as input file, in stead of the fasta formatted genome, for succeeding DirusDetect and BWA jobs so that you don't have to repeat the time consuming indexing process.

Input Files

This tool requires two input files:



VirusDetect produces large amount of different files as reports. Output related options are used to select, what data is returned. By default VirusDtetect returns following files: If parameter Return matching reference sequences is turned on the also following files are returned If parameter Return matching reference sequences is turned on the also following files are returned

Note: If you select both the blastn_matching_references.fa + .fai and blastn_matches.bam + .bai, (or the corresponding blastx files) you can use the Genome Browser to visualize the BLAST reulst. In the Genome Browser the blastn_matching_references.fa should be assigned to be used as the genome. Each reference virus sequence is then listed in the Chromosome pull down menu.

If parameter Return results in one archive file is selected, the all the outputfiles are stored to a single tar formatted output file. This feature is useful if you run VirusDetetc to several input files in the same time. The tar formatted output file can be expanded with tool Extract .tar.gz file.