| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
gorzata Wiench1,
Barbara Jarz
b,
Knut Krohn,
Martin Beck,
Jürgen Läuter,
El
bieta Guba
a,
Krzysztof Fujarewicz,
Andrzej
wierniak and
Ralf Paschke
III. Medical Department (M.E., K.K., R.P.), University of Leipzig, 04103 Leipzig, Germany; Department of Nuclear Medicine and Endocrine Oncology (M.W., B.J., E.G.), Maria Sklodowska-Curie Memorial Cancer Center and Institute of Oncology, Gliwice Branch, 44-100 Gliwice, Poland; Interdisciplinary Center for Clinical Research Leipzig (K.K.), 04103 Leipzig, Germany; Interdisciplinary Center for Bioinformatics Leipzig (M.B.), 04107 Leipzig, Germany; Institute of Biometrics and Medical Informatics (J.L.), University of Magdeburg, 39114 Magdeburg, Germany; and Institute of Automatic Control (K.F., A.S.), Silesian University of Technology, 44-100 Gliwice, Poland
Address all correspondence and requests for reprints to: Ralf Paschke, M.D., III. Medical Department, University of Leipzig, Philipp-Rosenthal-Strasse 27, D-04103 Leipzig, Germany. E-mail: pasr{at}medizin.uni-leipzig.de.
| Abstract |
|---|
|
|
|---|
Objective: The ability to compare data sets derived from different Affymetrix GeneChip generations and the influence of intra- and interindividual comparisons of gene expression data were evaluated to build multigene classifiers of benign thyroid nodules to verify a previously proposed papillary thyroid carcinoma (PTC) classifier and to look for molecular pathways essential for PTC oncogenesis.
Methods: Gene expression profile data sets from autonomously functioning and cold thyroid nodules and from PTC were analyzed by support vector machines. GenMAPP analysis was used for PTC data analysis to examine the expression patterns of biologically relevant gene sets.
Results: Only intraindividual reference samples allowed the identification of subtle changes in the expression patterns of relevant signaling cascades, such as the MAPK pathway in PTC. Using an artificial intelligence approach, the autonomously functioning and cold thyroid nodule multigene classifiers were derived and evaluated by cross-comparisons.
Conclusion: We recommend defining classifiers within one generation of gene chips and subsequently checking them across different array generations. Using this approach, we have demonstrated the specificity of a previously reported PTC classifier on an independent collection of benign tumors. Moreover, we propose multigene classifiers for different types of benign thyroid nodules.
| Introduction |
|---|
|
|
|---|
Recently, we investigated mRNA expression profiles of AFTNs (5) and CTNs (9), each in comparison with their normal surrounding tissue (ST), using the Affymetrix GeneChip U95Av2. In these studies, we focused on the identification of gene sets (e.g. signaling cascades) that differentiate between nodular tissue and normal ST and identified an inactivation of the TGF-ß signaling cascade in AFTNs (5) and an increased expression of cell cycle-associated genes in CTNs (9). In another study, Jarzab et al. (10) analyzed the gene expression profiles of PTCs in comparison with benign/normal tissue obtained from either the same patients (intraindividual comparisons) or from other individuals (interindividual comparisons) using Affymetrix U133A GeneChips, with the goal of generating a diagnostic multigene classifier distinguishing between the PTCs and benign/normal tissues.
Because of the increasing number of oligonucleotide chip analyses of different thyroid pathologies, the question of data comparability is emerging. In this study, we therefore combined our data sets of AFTNs, CTNs, PTCs, and normal STs. This approach served both to identify biologically significant pathways in PTC and to confirm the specificity of the multigene PTC classifier and required the investigation of the comparability of different GeneChip generations as well as the evaluation of the influence of intra- and interindividual comparisons on gene expression profiling results. Moreover, we used our previously described algorithms to propose two support vector machine (SVM)-based classifiers for AFTNs and CTNs as the first application of this method to benign thyroid diseases, and we performed cross-comparisons to validate their efficacy.
| Patients and Methods |
|---|
|
|
|---|
Data analysis
Data preprocessing. Affymetrix GeneChip data were scaled to normalize data for interarray comparison using Affymetrix MAS 5.0 software.
To compare the expression data of AFTNs and CTNs (GeneChip U95A, 12,625 probe sets) with the PTCs expression data (GeneChip U133A, 22,283 probe sets), we used Affymetrix Best Match Comparison Spreadsheet Human Genome U95 to Human Genome U133, which yielded an intersection of 10,507 probe sets (www.affymetrix.com/support/technical/comparison_spreadsheets.affx).
Principal component analysis (PCA).
First, the expression data matrix of 10,507 probe sets from the 140 GeneChips was subjected to PCA to view the overall trend of the data set. PCA allows the identification of key variables in a multidimensional data set, which explain the differences between the experiments in the best way. When assuming m experiments, each with n genes, the aim of the PCA is to reduce the dimensionality of the data matrix by the identification of r
n new variables. These r principal components explain the variance of the original n variables as well as possible, while they are uncorrelated and orthogonal. This reduction of dimensionality allows an improved data visualization and analysis.
To detect differentially regulated genes and gene sets, different methods of data analysis were used.
Empirical filtering. We selected genes that had a signal log ratio greater than 0.585 or lower than 0.585 in at least 85% of the comparisons (i.e. these genes are characterized by a minimum 1.5-fold change).
Application of the Westfall-Young strategy to genes.
To avoid a high rate of false positive results in this multitude of tests for the different genes, we added so-called multiple test procedures. After a logarithmic transformation of the data, we computed adjusted P values for each gene according to the Westfall-Young procedure, which imbeds univariate F tests into a permutation procedure (17). This procedure keeps the family-wise error rate
in the strong sense, i.e. the lot of all selected genes may contain a false positive gene with the probability
, at most.
GenMAPP. In contrast to statistical filters and pattern-finding algorithms (e.g. hierarchical clustering), GenMAPP (Gene Microarray Pathway Profiler, www.GenMAPP.org) analyzes the gene expression changes in the context of known biological pathways (18). Gene expression data, including P values, were imported into GenMAPP in a comma-separated value file format. GenMAPP converts the expression data into a data set that can then be viewed on any microarray pathway profile with any number of color-coding criteria set. The following color-coding criteria were defined. Significantly (P < 0.05) up-regulated genes are colored red and significantly down-regulated genes are colored green.
Recursive feature replacement (RFR). RFR (19, 20) is an iterative method based on the SVM technique (21) with the goal of finding an optimal gene subset in a leave-one-out cross-validation approach. In an iterative manner, it looks for gene sets, not for single genes, to evaluate their classification quality and to select the best one. RFR is similar to the standard recursive feature elimination algorithm (22) and uses recursive feature elimination to find starting gene sets. Recently, a superior quality of classification compared with RFR has been shown (10).
| Results |
|---|
|
|
|---|
First, we investigated the influence of the GeneChip generation on the expression patterns of different thyroid tissue entities. For this purpose, we compared the mRNA expression profiles of AFTNs, CTNs, and PTCs hybridized to two generations of Affymetrix GeneChips (AFTNs and CTNs, U95Av2; PTCs, U95A/U133A). The expression data matrix derived from 140 microarrays altogether was subjected to PCA to analyze the overall trend within the data set. Much stronger differences between the expression patterns of PTC samples hybridized to different GeneChip generations than between the expression patterns of the different tissue entities were observed (Fig. 1
). Therefore, a direct combination of data obtained by different GeneChip generations is not possible.
|
Three different PTC data sets were used for this comparison: two data sets of paired samples, Huangs data set comprising eight PTCs with their normal STs hybridized to GeneChips U95A (PTC-P-U95) (1) and Jarzabs initial data set comprising 16 PTCs and their STs hybridized to GeneChips U133A (PTC-P-U133) (10), and one unpaired data set, Jarzabs validation set comprising seven PTCs and 11 unpaired benign samples, also hybridized to GeneChips U133A (PTC-U-U133) (10). Each data set was independently analyzed using our empirical filters and the statistical analysis of individual probe sets according to Westfall-Young. Subsequently, the results of these two algorithms in the three different data sets were compared (Fig. 2
). Using the filter algorithm, we found 60 up-regulated probe sets in the PTC-P-U95 data set with an overlap of 39 probe sets to the PTC-P-U133 data set, which comprised 115 up-regulated probe sets. The analysis of the PTC-U-U133 data set according to the filter algorithm revealed 121 up-regulated probe sets, but only one gene (glutaminyl cyclase) was also found in the other two data sets. Similar results were observed in the group of down-regulated genes (Fig. 2
). Sixteen down-regulated probe sets were found in the PTC-P-U95 data set with an overlap of five probe sets (two probe sets representing trefoil factor 3 intestinal, two probe sets representing dermatopontin, and one cellular retinoic acid binding protein 1) with the PTC-P-U133 data set, which comprised 79 down-regulated probe sets, whereas we found 52 down-regulated probe sets in the PTC-U-U133 data set without any overlap to the two paired data sets.
|
GenMAPP analysis of PTCs
GenMAPP software was used to visualize and analyze the GeneChip data on microarray pathway profiles. The significant changes within the MAPK signaling pathway could be identified when the PTC-P-U95 and the PTC-P-U133 data sets were analyzed (Fig. 3
). In both cases, we found a significantly increased expression of H-RAS, R-RAS, microtubule-associated protein 2, and RAS GTPase-activating protein 1 and a significantly decreased expression of the protooncogene c-fos in the PTCs in comparison with their normal STs. Moreover, we found a significant up-regulation of MEK kinase 1, MAP kinase kinase 1, MAP kinase 1, and the tyrosine kinase Elk-1 in the PTC-P-U133 data set. Furthermore, this data set is characterized by a significantly down-regulated expression of N-RAS and an additional probe set representing the MAP kinase kinase 1 in the PTCs. GenMAPP analysis did not point out the significance of MAPK signaling pathway when applied to the unpaired PTC data set (PTC-U-U133).
|
The aim of this step was to verify the molecular classifier differentiating PTC and normal thyroid tissue, which consists of 20 genes selected by the RFR technique (19, 20) proposed recently (10). We intended to validate its specificity on a large collection of benign thyroid tissues: AFTNs, CTNs, and their normal STs. After normalization, the probe sets of the AFTN-P-U95 and the CTN-P-U95 data set were matched to the probe sets proposed by the RFR-20 molecular classifier (10), and a classification of all data sets was performed (Fig. 4
). Also, at the conditions used in this study, our approach correctly classified all PTCs and STs from the PTC-P-U95 data set. Moreover, all benign tissues (AFTNs, CTNs, and their normal STs) were classified as normal/benign and have been assigned with very similar classification function values as the normal STs of the PTC-P-U95 data set.
|
Multigene classifiers for AFTNs and CTNs were specified by the RFR algorithm. The procedures used in SVM analysis were the same as those used to build the PTC classifier (10), and two classifiers consisting of 20 genes were derived. Both classifiers were compared for their ability to classify other thyroid tissues, first within the same GeneChip generation (AFTN classifier on CTN-P-U95 data set, CTN classifier on AFTN-P-U95 data set, both on PTC-P-U95), then with U133A chips on the combined data set, including our 23 PTCs and 51 PTCs as made available by Giordano et al. (2) (PTC-U133 data set). The genes included into the AFTN classifier are shown in Table 1
, and the cross-classification result is shown in Fig. 5
, AD. No PTC was classified as AFTN, and all but two normal tissues were properly classified as non-AFTN. Simultaneously, eight (36.4%) of 22 CTNs were recognized by the classifier as showing the attributes characteristic of AFTNs.
|
|
|
| Discussion |
|---|
|
|
|---|
Comparison of expression data derived from different GeneChip generations
Before comparing the mRNA expression patterns of AFTNs, CTNs, and PTCs, which were hybridized to different generations of GeneChips, we asked how differences in oligo DNA and chip design influence the expression patterns of the different entities of thyroid tissues. Despite the use of the Affymetrix Best Match Comparison Spreadsheet, we observed much stronger differences between the expression patterns of the PTC samples hybridized to different GeneChip generations than between the expression patterns of the different tissue entities. These differences of the signal intensities most likely are attributed to the degree of dissimilarity of probe sets, expression level of the corresponding transcript (23, 24), and technical differences such as different scanner settings. Therefore, to be able to compare our previous results (5, 9, 10) with the present re- and meta-analysis, different entities were analyzed separately. Only afterward were the results compared between the different thyroid pathologies.
Comparison of paired and unpaired data sets
In some reports, intraindividual comparisons of diseased tissue with normal (surrounding) tissue were performed, whereas in other reports, pathologic and normal tissues from different individuals were compared. To address the relevance of paired (intraindividual) vs. unpaired data sets, we compared the results of empirical filters and Westfall-Young analysis among the PTC-P-U133 (10), PTC-P-U95 (1), and PTC-U-U133 (10) data sets (Fig. 2
). The results of these comparisons clearly indicate the quality of paired data sets; despite the different GeneChip generations that were used in the PTC-P-U133 and PTC-P-U95 data sets, these two paired data sets share more similarities in gene expression profiles than the two U133A data sets (PTC-P-U133 and PTC-U-U133), which were generated in the same laboratory under identical conditions and scanned on the same GeneChip scanner. Especially in studies whose objective is the identification of subtle differences to elucidate the etiology of a specific pathology or to identify signaling cascades that are affected in this pathology, the definition of the reference tissue seems to be a crucial step for the analysis of the gene expression profiles. This conclusion is further supported by the results of the GenMAPP analysis of the PTC samples. Although both paired data sets show a significantly changed expression pattern of the MAPK cascade in the PTCs (Fig. 3
), the unpaired data set provides no indications for this alteration (data not shown).
Reanalysis of PTCs for GenMAPP
In PTCs, several mutations and chromosomal rearrangements of genes coding for effectors along the MAPK pathway have been shown to be essential for the transformation of thyroid epithelial cells (25). In about 70% of all PTCs, activating mutations or chromosomal rearrangements of BRAF, RET, or RAS have been identified (8, 26, 27, 28). Interestingly, even in the study by Giordano et al. (2), who specified gene expression signatures of PTC initiating mutations, a differential expression of MAPK cascade-associated genes was not very distinct (1, 6, 7, 8, 10, 11, 12, 14, 15). Notwithstanding their interpretation that related RET/PTC induced cancers to the activation of PI3K pathway, it should be mentioned that the signal transduction of the MAPK cascade occurs mainly by posttranslational modifications (e.g. phosphorylation) and might be poorly visible on the level of mRNA. However, in contrast to the data of Huang et al. (1) and Jarzab et al. (10), who did not find any of the MAPK cascade-associated genes to be differentially regulated, our GenMAPP analysis revealed a significantly increased expression pattern of various genes of the MAPK cascade (Fig. 3
). Furthermore, the Westfall-Young analysis applied to these gene sets (9) also revealed a significantly changed MAPK cascade signaling in the PTC-P-U133 data set (P = 0.0016).
Verification of the SVM-based multigene classifier of PTCs on a large collection of benign thyroid tissues
We reassessed the specificity of the previously published molecular classifier consisting of 20 genes selected by RFR in the PTC-P-U133 data set (10) using a far larger collection of benign thyroid tumors analyzed by the U95 platform. Despite the matching performed, which diminished the number of evaluable genes, our gene set again correctly classified all PTCs from the PTC-P-U95 data set and all but two from the PTC-U-U133A data set. The remaining two misclassified samples gave values of the classifying function close to zero, and we therefore introduced the region for which samples could not be assessed (see Fig. 4
). All of the benign tissues from the U95 platform were assigned the proper classification function values and diagnosed as benign.
SVM-based classifiers for benign thyroid nodules
Twenty gene-classifiers for AFTNs and CTNs were specified from the respective U95 data sets. During their creation, the leave-one-out cross-validation was applied. This method allows for validation of the obtained gene sets even if independent validation sets are unavailable (2). An additional level of evaluation could be obtained by applying the AFTN classifier to the CTN and PTC data sets and for the CTN classifier by applying it to the AFTN and PTC data sets.
The AFTN classifier was specific for benign tissues analyzed, including benign/normal thyroid tissues from Jarzabs PTC data set as well as from the other published data sets (1, 2). The AFTN signature was not found in any of the PTCs analyzed, and we relate this finding to the low risk of malignancy in AFTNs. However, the AFTN classifier recognized also a subset of samples within CTNs. This should not be regarded as a failure of the SVM-based analysis. The classifier was built based on the differences between AFTNs and normal thyroid tissue, both of which demonstrate iodine uptake and thyroid hormone secretion. Therefore, the list of genes included in the AFTN classifier (Table 1
) was built from genes participating in other processes, some of which might be commonly affected in all benign tumors. In the selected gene set, cell adhesion and extracellular matrix-related genes (SDC2, COL9A3, TNC, and NID) were found, genes belonging to the wide group of growth process/apoptosis/immune response, two metallothioneins, which also participate in proliferation/apoptosis and immune processes, and genes involved in signaling. One gene was transport-related (ABCA8), and four were attributed to metabolism regulation. Decreased expression of APOD and SDC2 as well as up-regulation of MT1F, SIAT1, and COL9A3 were indicated in our previous study (5). Increased expression of MT1F and COL9A3 was also reported in benign tumors by Finley et al. (11). Moreover, some of these genes may also be changed in thyroid cancer. APOD was indicated as a significantly down-regulated gene in oncocytomas and in FTC by Baris et al. (29) and Aldred et al. (3), respectively, and tenascin C was found to be overexpressed in thyroid carcinomas by Finley et al. (11).
Considering the CTN classifier, in agreement with our previous study showing the involvement of cell cycle genes in the gene expression profile of CTNs (9), mainly genes related to proliferation and growth processes were included (Table 2
). By previous approaches performed on the same data set, we found an overexpression of histone H2A family O and L members, and the first of them was included in the CTN classifier in the present study, together with a gene of the same family (histone H2BE). Another gene previously selected by us is GPM6A, a plasma membrane protein, down-regulated in CTNs. The CTN classifier also contains some genes considered by others as cancer specific; among them, FGFR1 found by Chevillard et al. (14) as down-regulated in follicular variant of PTC in comparison with the classic variant, TLE4, found by Aldred et al. (3) as down-regulated in FTC, TUSC3 (N33) overexpressed in PTC (1). GATA 3 (involved in immune response) is decreased in CTNs, whereas it is up-regulated in papillary vs. oncocytic thyroid cancer (29).
Contrary to AFTN, the CTN multigene signature was found in PTCs (Fig. 5
). The reason why the CTN classifier positively scored PTCs could be the partly dedifferentiatiated and proliferating phenotype of both entities. Moreover, the signature is recognizable even in some (one of three) AFTNs. Like CTNs (30, 31), but less prominently, they also show increased proliferation.
We did not verify the single genes included in both classifiers because the SVM-based methods do not select single genes. They look for gene sets with the criterion of their best global classification ability and may include genes with minor change in expression pattern that provide complementary information and increase the classification quality of the whole gene set. It would be reasonable to validate both classifiers on independently collected hot thyroid nodules and CTNs; however, such data sets are not yet available.
In conclusion, although the recent studies show a high reproducibility even across microarray platforms by other approaches (32, 33), for thyroid tissues we could show that a comparison of gene expression data generated on different GeneChip generations, which were preprocessed by Affymetrix MAS5.0 software, reveals a high reproducibility only when gene sets are selected within one study. Furthermore, our results point out another critical point for meta-analysis: the quality of the reference samples. A precise selection of intraindividual reference samples is mandatory to identify subtle changes in the expression patterns of signaling cascades, as we have shown in the case of the MAPK cascade in the PTCs. Moreover, we verify the SVM-based PTC classifier, published previously, as highly specific and propose similar multigene classifiers for AFTNs and CTNs.
| Acknowledgments |
|---|
| Footnotes |
|---|
Part of the work of K.F. was done during his visit at Rice University (Houston, TX).
Disclosure of potential conflicts of interests: M.E., M.W., B.J., K.K., M.B., J.L., E.G., K.F., A.S., and R.P. have nothing to declare.
First Published Online January 11, 2006
1 M.E. and M.W. contributed equally to this work. ![]()
Abbreviations: AFTN, Autonomously functioning thyroid nodule; CTN, cold thyroid nodule; FTC, follicular thyroid carcinoma; PCA, principal component analysis; PTC, papillary thyroid carcinoma; RFR, recursive feature replacement; ST, surrounding tissue; SVM, support vector machine.
Received July 20, 2005.
Accepted January 3, 2006.
| References |
|---|
|
|
|---|
) fusion oncogene. Oncogene 24:14671476[CrossRef][Medline]This article has been cited by other articles:
![]() |
Y. Y Liu, H. Morreau, J. Kievit, J. A Romijn, N. Carrasco, and J. W Smit Combined immunostaining with galectin-3, fibronectin-1, CITED-1, Hector Battifora mesothelial-1, cytokeratin-19, peroxisome proliferator-activated receptor-{gamma}, and sodium/iodide symporter antibodies for the differential diagnosis of non-medullary thyroid carcinoma Eur. J. Endocrinol., March 1, 2008; 158(3): 375 - 384. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Fujarewicz, M. Jarzab, M. Eszlinger, K. Krohn, R. Paschke, M. Oczko-Wojciechowska, M. Wiench, A. Kukulska, B. Jarzab, and A. Swierniak A multi-gene approach to differentiate papillary thyroid carcinoma from benign lesions: gene selection using support vector machines with bootstrapping Endocr. Relat. Cancer, September 1, 2007; 14(3): 809 - 826. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Eszlinger, K. Krohn, A. Kukulska, B. Jarzab, and R. Paschke Perspectives and Limitations of Microarray-Based Gene Expression Profiling of Thyroid Tumors Endocr. Rev., May 1, 2007; 28(3): 322 - 338. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Endocrinology | Endocrine Reviews | J. Clin. End. & Metab. |
| Molecular Endocrinology | Recent Prog. Horm. Res. | All Endocrine Journals |