Using the chromosomal locations of the data points in copy number data set, convert the data from probe/bin-based to gene-based. The probes/bins from which the copy number call for a given gene are determined as follows. If there are probes/bins overlapping with the position of the gene, they are used. In case of no overlaps, the last preceding and first tailing probe/bin are used.
The list of human genes is retrieved from Bioconductor, and the input data converted from probe/bin-based to gene-based. For a given gene, the features to use are determined as described in the referenced CanGEM paper. Briefly, if there are features overlapping the position of a gene, those are used. If not, the last preceding feature and the first tailing feature are used. The majority method means that the gene is marked as a loss (or gain), if more than 50% of the determined probes/bins show a loss (or a gain). The unambiguous method requires that all of the probes/bins have to give the same signal, or otherwise the gene is marked as normal copy number.
Input data set transferred from probe/bin-based to gene-based measurements.
Scheinin et al. (2008) CanGEM: mining gene copy number changes in cancer. Nucleic Acids Res 36: D830-D835
Flicek et al. (2010) Ensembl's 10th year. Nucleic Acids Res 38: D557-62