Methods Compiling the data Microarray sample files, GSM files, we

Methods Compiling the data Microarray sample files, GSM files, were downloaded form the NCBI GEO database. Individual GSM files were assigned to GSE series and log scaled values scaled to lin ear and low level responders dropped. EF profiles were then order inhibitor generated based on ratio of individual condition to the average across the series. Expression data from five Affymetrix GeneChip platforms corresponding to three species were collected. These were all samples from two Human array platforms corresponding to Human Gen ome U133 Array Set HG U133A GPL96 and U133 Plus 2. 0 Array GPL570 . all samples from the Mouse Genome 430 2. 0 Array GPL1261 chip. all samples from two Rat chips corresponding to Rat Genome U34 Array GPL85 and Rat Genome 230 2. 0 Array GPL1355. The database thus totals 106,101 samples.

Of course, this can always be extended to include more platforms from the same species and/or other species. Non redundant database The individual GSM sample file expression values were transformed into EF values corresponding to the expres sion relative to the series mean. Expression values that have been logarithmically transformed are transformed back to a linear scale and low expression values dropped, that is are set to zero and dont contribute to the fold profile. We found that the results were rela tively insensitive to the cut off value and we set this to be 10% of the average expression value. All sample expression profiles within a series were scaled to the, where sk is the expression level of the kth probe set database to be searchable with cross platform response profiles and gene lists it has to be rewritten as a data base of expression profiles over non redundant gene lists.

The EF profiles across the probe sets were there fore mapped onto expression profiles for a non redun dant gene list. In general each gene is represented by multiple probe sets. For each platform we generated the EF statistics for each probe set across the totality of samples. The probe set with the most robust response across the samples was chosen to represent the gene. Explicitly, the probe set with the highest root mean square deviation form zero was chosen to represent the given gene. The number of genes defined on each plat form were as follows chip with 6,341 genes. The database totals 106,101 samples and is searchable on a reasonably fast desktop PC in 10 minutes per query.

Searching the database The query profile is a statistically Brefeldin_A thresholded non redun dant list of genes and associated fold values. Statistical significance is assigned to a fold change based on a sim ple Students t test between multiple control and treat ment sample expression values. This is compared to each profile in the database by means of a simple Pearson regression analysis, with a correlation coefficient r.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>