Copy number aberrations / Import from CanGEM

Description

Load a microarray data set from the CanGEM database, perform background-correction and normalization, and append chromosomal locations of the microarray probes.

Parameters

Accession number (...) [empty]
Username (...) [empty]
Password (...) [empty]
Session ID (...) [empty]
Agilent 2-color filtering (yes, no) [yes]
Background treatment (none, subtract, normexp, rma) [normexp]
Background offset (0, 50) [50]
Intra-array normalization (none, median, loess) [loess]
Inter-array normalization (none, quantile, scale) [none]
Affymetrix normalization (gcrma, rma, mas5)
Human genome build to use (GRCh37/hg19, NCBI36/hg18, NCBI35/hg17, NCBI34/hg16) [GRCh37/hg19]

Details

Imports data from the CanGEM database. The given accession number can be that of an experiment, a series, a data set, or a single microarray results. The original data files are downloaded, preprocessed and combined with chromosomal position information for the probes.

For password-protected data, there are two possible ways of authenticating. The first one is to use your username/password pair. This has the downside that these will be saved in Chipster session and/or workflow files. To prevent this, also another approach is provided: First log in on the CanGEM web page, then notice the session ID in the lower-right corner of the page, copy this ID on the clipboard (do not include the text "session ID: ", only the 32-character gibberish after that), and paste it in the session parameter of this tool. This way Chipster can access the files from the database as long as you stay logged in in your web browser. After you log out, or the session expires (after 24 minutes), Chipster is no longer able to access password-protected sensitive data in CanGEM.

For normalization, similar settings are provided as in the separate Affymetrix and Agilent 1 and 2-color normalization tools, so they can be consulted for more details. The only difference is that as Agilent miRNA arrays can contain multiple probes for a single miRNA, these measurements are summarized using the RMA algorithm to yield a single value for an individual miRNA. The recommended normalization settings for aCGH arrays are normexp, 50, loess, none; and for miRNA arrays none, 0, none, quantile.

The optional filtering step for Agilent 2-color arrays requires the following fields to be 0 (for both red end green channels): ControlType, IsBGNonUnifOL, IsBGPopnOL, IsFeatNonUnifOL, IsFeatPopnOL, IsManualFlag, IsSaturated and SurrogateUsed, and the following to equal 1 (for both channels): IsFound, IsPosAndSignif and IsWellAboveBG.

The normalized data is automatically paired with chromosomal positions for the probes, according to the selected build of the human genome. The positions have been obtained through a megablast analysis of the probe sequences as described in the referenced paper, except for Agilent miRNA arrays where the positions of the miRNA genes have been retrieved from Ensembl. For Affymetrix arrays, the sequences used are the target sequences used to design individual probes for a probe set.

Output

A normalized data set and a phenodata file.

References

CanGEM:
Scheinin et al. (2008) CanGEM: mining gene copy number changes in cancer. Nucleic Acids Res 36: D830

For Affymetrix and Agilent normalization references, please see the respective tools.