The Journal of Clinical Endocrinology & Metabolism Vol. 85, No. 5 1923-1927
Copyright © 2000 by The Endocrine Society
Serial Analysis of Gene Expression as a Tool to Assess the Human Thyroid Expression Profile and to Identify Novel Thyroidal Genes1
E. Pauws,
J. C. Moreno,
M. Tijssen,
F. Baas,
J. J. M. de Vijlder and
C. Ris-Stalpers
Laboratory of Pediatric Endocrinology (E.P., J.C.M., M.T.,
J.J.M.d.V., C.R.-S.) and Neurozintuigen Laboratory (F.B.), Academic
Medical Center, University of Amsterdam, 1100 DE Amsterdam, The
Netherlands
Address correspondence and requests for reprints to: E. Pauws, Department of Pediatric Endocrinology, University of Amsterdam, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands. E-mail:
E.Pauws{at}AMC.UVA.NL
 |
Abstract
|
|---|
The assessment of the expression profile of normal human thyroid tissue
using serial analysis of gene expression (SAGE) generated a collection
of 10,994 sequence transcripts (tags). Each tag represented a messenger
RNA transcript, and, in total, 6099 different tags could be
distinguished. The presence and abundance of thyroid-specific
transcripts showed the overall expression profile to be from a normal
thyroid cell. The expression level of several transcripts was confirmed
on Northern blot. Seventy percent of tags could not be attributed to a
known human gene and, therefore, possibly correspond to novel genes
putatively involved in thyroid function. The tag sequence generated by
the SAGE technique can be used to further characterize these novel
genes. In this way, application of the SAGE technique to thyroid tissue
gives insight in the expression profile of a normal thyroid gland and
provides the information to characterize novel genes involved in
thyroid pathology, such as congenital hypothyroidism and thyroid
neoplasia.
 |
Introduction
|
|---|
THE FUNCTIONAL and biochemical features of
a specific cell-type are determined by its particular profile of gene
expression. Most studies on gene expression focus on one or more
previously identified genes of interest. This approach greatly
underestimates the complexity of molecular mechanisms. Serial analysis
of gene expression (SAGE) is a recently developed technique that
provides an expression profile, also called transcriptome. The
transcriptome describes genes expressed, including their relative
abundance in the tissue or cell-type studied (1). The SAGE technology
is based on two main principles (2). First, a short sequence tag (10
bp) is generated, which contains sufficient information to specifically
identify a messenger RNA (mRNA) transcript, provided that the sequence
tag is derived from a defined location within this transcript. Second,
the concatenation of many sequence tags into a large single DNA
molecule facilitates high throughput sequencing. The transcriptome is
generated by identifying the corresponding gene to each tag and
determining the relative abundance of each individual tag. The
sensitivity of the method is limited only by the total amount of tags
analyzed that will influence the minimal expression level that can be
detected. Data from a SAGE library can be used for several purposes.
Comparison of SAGE profiles from various physiological or disease
states provides insight into the molecular and cellular background of
such events showing up- or down-regulation of certain transcripts
(3, 4, 5). SAGE tags that have no matches to the current set of known
human genes (NoMatch tags) can be used to identify the corresponding
uncharacterized genes using the tag sequences generated by the SAGE
technique. Compared to SAGE, other techniques quantify only a limited
number of previously identified genes at a time (Northern blotting,
RNase protection, RT-PCR) or do enable characterization of unidentified
mRNA transcripts but do not show direct information about abundance
(cDNA subtraction, differential display). Although several genes
involved in thyroid development and function have been identified
(6, 7, 8), more still remain to be elucidated since not all cDNAs
corresponding to proteins known to be involved in thyroid
hormonogenesis have been cloned. Cases of primary congenital
hypothyroidism (CH) are known where the mutated gene is identified and
linked to the patients phenotype, as is the case with the thyroid
peroxidase gene (TPO), thyroglobulin (TG), iodide symporter (NIS), TSH
receptor (TSH-R) (8, 9), and thyroid-transcription factors PAX8, TTF1,
and TTF2 (10, 11, 12). Also, recently the Pendrin (PDS) gene coding for a
chloride/iodide-transporter associated with Pendred syndrome was
identified as a gene mutated in cases of congenital deafness associated
with a mild type of thyroid organification defect (13). There are,
however, still unresolved cases of CH where currently unidentified
genes must be involved. The thyroid NADPH oxidase responsible for
H2O2 generation (14) and the dehalogenase
enzyme(s) are two of the most obvious candidates. In thyroid neoplasia
little is known about diagnostic and/or prognostic factors concerning
the genetic mechanism behind the pathology of the tumor. To address
these fields of interest, we constructed and analyzed a SAGE library
from a normal human thyroid gland as a starting point for the
identification of novel thyroid-specific genes involved in thyroid
disease. The genes most prominently expressed in thyroid and their
relative abundance are studied, with special attention to genes
involved in thyroid function.
 |
Material and Methods
|
|---|
Tissue and RNA extraction
Thyroid gland tissue was obtained from a single individual
without thyroid pathology after resection at a routine autopsy. Tissue
was immediately frozen in liquid nitrogen. Informed consent to use this
material for scientific research was obtained. After homogenization
total RNA was extracted using Trizol (Gibco BRL). mRNA was extracted
using PolyA/T-Tract mRNA Isolating System III (Promega).
Construction of SAGE library
The library was constructed using 5 µg thyroid mRNA
essentially following SAGE Protocol 1.0c by Velculescu et
al. (2). Additional information, including graphical presentation
of the SAGE technique, can be found at URL www.sagenet.org mRNA was
converted to double-stranded complementary DNA (cDNA) using the cDNA
Synthesis System kit (Life Technologies, Inc., Gaithersburg, MD) with a
biotinylated oligo-dT18. cDNA was digested with NIaIII and
3' cDNA fragments were isolated using Streptavidin Dynabeads M-280
(Dynal, Oslo, Norway) and divided into two equal pools. Each pool was
ligated to a different SAGE-linker and subsequently digested with BsmFI
to release tags. Tags were then blunted using T4-DNA
polymerase, and pools were combined and ligated to form ditags. Ditags
were amplified using the attached linkers as priming sites, and PCR
products were digested with NIaIII to release linkers from ditags.
Ditags (2024 bp) were isolated from polyacrylamide gel and
concatenated by self-ligation. SAGE clones with an average length of
500 bp were cloned into the SphI site of pZero (Invitrogen, Groningen,
The Netherlands). Ligation was transformed into Top10F'
electrocompetent cells (Invitrogen), and clones with inserts were
selected using colony-PCR with M13 vector-located primers.
Sequencing of SAGE library
SAGE clones were sequenced with the Dyenamic Direct cycle
sequencing kit using the ET-T7 primer (Amersham Pharmacia, Uppsala,
Sweden). Samples were run on a ABI377XL Automatic Sequencer
(Perkin-Elmer Corp., Norwalk, CT) and analyzed using Sequence Analysis
3.0 software.
Tag abundance and identification
SAGE data was analyzed using specialized UNIX software
USAGE1.5 developed in our institute for extraction of single tags from
sequence data and subsequent identification on EMBL human gene database
(February 1999). To further study tag identification and expression,
NCBI/CGAPs SAGEMAP program was used at URL
www.ncbi.nlm.nih.gov/SAGE/.
Northern hybridizations
RNA gels were prepared using the glyoxal/NaPi method (15)
electrophorizing 10 µg of total RNA. Capillary blotting was performed
overnight in 20 x SSC, followed by ultraviolet cross-linking (1.2
J/m2) and baking (80 C). Hybridizations were performed
following the Church and Gilbert (16) protocol at 65 C overnight, and
blots were exposed for 16 h and analyzed using PhosphorImager 2.0
(PE Applied Biosystems, Foster City, CA).
 |
Results
|
|---|
The sequencing of 10,994 tags from a human thyroid SAGE library
resulted in an expression profile of 6099 unique mRNA transcripts. From
these 6099 transcripts, the larger part, 4813 tags, was present only
once (0.01% of total library), indicating that the bulk of expressed
genes is present at a basic low level. Only 98 genes scored more than
10 tags from which 9 genes were expressed at a very high level (>50
tags), including the TG transcript (Table 1
). The percentage of identification was
larger (85100%) in tags scored more than 10 than in tags scored 10
or less (2563%). In total, 30% of tags could be attributed to a
known gene transcript. In general, the high abundance class showed
expression of several ribosomal and mitochondrial transcripts. The tags
corresponding to TG and TPO mRNA are also present in the top 50 of
highest expressed genes (Table 2
).
Furthermore, seven NoMatch tags are present putatively corresponding to
novel yet unidentified transcripts. Further analysis of SAGE data
showed that in the low range of expression the NoMatch tags become more
prominent (Table 1
). Starting with 80 NoMatch tags expressed five
times, we screened GenBank databases intensively to try to identify
these transcripts. Using the human Expressed Sequence Tag (EST)
database about 20 NoMatch transcripts could be identified as a known
gene. Other NoMatches could be excluded as artifacts from the linkers
used in the construction of the library (linker 1-tag:
TCCCTATTAA; and linker 2-tag: TCCCCGTACA). The
remaining group of NoMatches could be considered an interesting group
of sequences, possibly corresponding to novel genes. Some of the
highest expressed NoMatches are listed in Table 2
. Focusing more on
genes important for thyroid function, we summarized in Table 3
the SAGE expression data of some
thyroid-specific genes. Apart from the presence of the expected tags
corresponding to TG, TPO, PAX8, TSH-R, PDS, TTF1, iodothyronine
deiodinase type 1 and type 2, we found two alternative tags probably
corresponding to TG transcripts. Tags corresponding to expected
transcripts like TTF2 or NIS could not be detected. The actual TG SAGE
tag is located near the end of the TG mRNA spanning the polyadenylation
site. Because the two additional tags were scored as a NoMatch with no
homology to any known sequence (including ESTs), except TG, we ascribe
these tags to be alternative TG SAGE tags as a result of alternative
polyadenylation. The total expression level of TG when adding up all
tags corresponding to TG comes to 289 tags or 2.6% of total mRNA pool.
TPO scored 24 tags (0.26%), showing an expression level 10-fold lower
than TG. The identification of the PDS transcripts could only be done
after the presence of an internal polyA-stretch in the 3' untranslated
region (3' UTR) was noticed. The actual PDS tag flanked the last NIaIII
site before this internal polyA-stretch. The validity of the abundance
data in our SAGE library was checked using Northern blots. RNA was
isolated from several normal thyroid tissues, including the one used
for the SAGE library; liver RNA was used as a control. TPO, NIS, and
PDS, as well as ELF1
and glyceraldehyde-3-phosphate dehydrogenase
(GAPDH), were hybridized on this blot, and intensities were compared
correcting for RNA loading with a 28S ribosomal probe. In Fig. 1
, the results show that the relative
abundance of TPO, ELF1
, GAPDH, and PDS are similar to that observed
in the SAGE library. The expression in thyroid RNA TH4 (RNA used for
the SAGE library) was similar to that in five other normal thyroids.
Thyroid-specific mRNAs were absent in liver RNA. Because of the absence
of a SAGE tag corresponding to the NIS transcript we checked for
expression of NIS in the same Northern analysis. NIS was present in the
thyroid RNA used for the SAGE library with an estimated expression
comparable to TPO and GAPDH.

View larger version (74K):
[in this window]
[in a new window]
|
Figure 1. Expressed genes in thyroid SAGE library as
determined by Northern blot. The expression of several
(thyroid-specific) genes (right) in a group of normal
thyroid RNAs is shown. First lane TH4 shows expression levels of RNA as
used in the SAGE library. On the left, the SAGE abundance
out of 10,994 total tags is indicated. The most right lane
is liver control RNA.
|
|
 |
Discussion
|
|---|
The SAGE library data from a normal human thyroid gland
constructed in this study show the expected expression profile of the
cell type studied. The expression of several housekeeping genes and
mitochondrial transcripts was similar to that observed in a recent SAGE
study (17). We, as well as other laboratories, checked the validity of
SAGE data extensively. Quantitatively we demonstrated that the relative
abundance of genes inside the SAGE expression profile was similar to
that seen in Northern blot hybridizations and that the expression in
this particular normal thyroid was representative. Qualitatively we
wanted to see if all genes expressed were present in the SAGE library,
focusing on the absence of NIS and TTF2 in our library. The absence of
NIS in our thyroid SAGE library can be explained due to the absence of
the full-length NIS mRNA sequence in GenBank database. When comparing
the 2.2-kb NIS cDNA sequence [accession no. D87920) with the published
mRNA length (4.5 kb on Northern blot (18)], around 2 kb of 3' UTR
downstream sequence is missing, including the polyadenylation signal
and polyA-tail. Also, no 3'EST sequences corresponding to NIS mRNA
could be found in human EST databases. The SAGE tag corresponding to
this transcript is probably in this missing 2-kb 3' UTR sequence since
the average distance from tag to polyA-tail is 256 bp. Table 2
shows
several NoMatch tags in the expected range of expression putatively
belonging to the NIS mRNA transcript. Northern analysis of thyroid RNA
used for the SAGE library showed NIS mRNA expression at an abundance
level slightly lower than that of TPO. TTF2 could not be detected,
although the full-length mRNA sequence is known. Because the abundance
in the library of transcription factors TTF1 (3) and PAX8 (7) was
relatively low, this can be due to the detection limit of this
particular SAGE library. The difficulties in identifying PDS in the
SAGE library arose from the fact that an internal polyA-stretch is
present in the PDS 3' UTR. In the initial cDNA synthesis the internal
polyA-stretch is probably used as a priming site, resulting in this
alternative SAGE tag. The actual last NlaIII site and corresponding tag
situated between the polyA-stretch and the polyA-tail could not be
found in the SAGE library, indicating that the internal poly-A stretch
is preferentially used in oligo-dT priming. Transcripts specific for C
cells (calcitonin, ret), which are also present in thyroid gland
tissue, could not be found probably because the amount of C cells in
the thyroid is too low (<1%) (19) to pick up any mRNA in the total
RNA pool from thyroid tissue. The presence of ferritin H as one of the
highest expressed messengers (Table 2
) in thyroid cells was described
previously for FRTL-5 cells (20). Its abundance in the SAGE library
confirms this finding. The sensitivity of the SAGE library shows that a
single tag present in the thyroid SAGE library corresponded to an
expression level of 0.01% (1 in 10,994 total tags). Assuming the SAGE
expression profile represents the total mRNA pool of the thyroid cell
and the total amount of mRNA molecules per cell is about 300,000 (21),
we estimate that the detection level was limited to mRNAs with 30 mRNA
copies per cell. A possibility to increase the sensitivity of a SAGE
library is by sequencing more clones and analyzing more tags. Next to
the transcripts corresponding to known genes, we are especially
interested in the NoMatch tags, not corresponding to a known gene.
Seventy percent of the total amount of different tags in the library is
a NoMatch. This number is, however, somewhat overestimated because 85%
of the NoMatches are in the group of tags scored once. In the high
range of expression (tags
5) the amount of NoMatch tags is
considerably lower (80 of 281, 28%), indicating that transcripts
expressed at a high level are better characterized than lower expressed
transcripts (Table 1
). In the large group of single tags we expect to
have some sequence artifacts (3, 17) explaining also the high
percentage of NoMatches in this group. The possibility of an artifact,
such as found in the alternative Tg polyA sites and the linkers from
the SAGE technique, has to be ruled out. Before using any NoMatch tag
sequence in an experiment it is, to our view, imperative to screen
extensively the possibility of a NoMatch tag corresponding to a known
gene from which the 3' UTR is not sequenced. This can partly be
circumvented by screening the human EST database. Via the screening of
the EST database it is also possible to use an unidentified 3' EST
corresponding to a NoMatch tag as a probe sequence in further
experiments. SAGE has become a promising technique in molecular
genetics, generating enormous amounts of immortalized data. The data
generated from this SAGE library bring us closer to identifying the
complete thyroid transcriptome. Because SAGE libraries are made in
different scientific institutions around the world according to the
same principle and protocol, it is possible to even compare
interlaboratory data. Comparing libraries from normal tissues
vs. pathological tissues generates differentially expressed
tags corresponding to known and unknown genes possibly involved in the
pathology of the disease. We intend to use this possibility in the
future by making new SAGE libraries from thyroid disease tissues. The
NoMatch tags that at the moment can not be identified to a known human
gene can, in the future, when the Human Genome Project and similar
projects finish characterizing the complete human genome, be screened
again. In the meantime, these NoMatch sequences provide valuable
information to start characterizing novel genes using other techniques.
We conclude that the SAGE library from human thyroid offers an
extensive expression profile of both previously identified and
unidentified genes that can be used to elucidate novel genes involved
in thyroid hormonogenesis.
 |
Acknowledgments
|
|---|
We thank A. H. C. van Kampen and M. van der Mee
(Bioinformatics Laboratory, AMC) for their work in designing the SAGE
analysis software USAGE used in this study.
 |
Footnotes
|
|---|
1 The Dr. Ludgardine Bouwman foundation and the Stichting
Kindergeneeskundig Kankeronderzoek (SKK) financially supported this
study. This study was supported in part by ESPE Research Fellowship,
sponsored by NOVO Nordisk A/S. 
Received October 20, 1999.
Revised December 7, 1999.
Accepted December 20, 1999.
 |
References
|
|---|
-
Bertelsen AH, Velculescu VE. 1998 High-throughput gene expression analysis using SAGE. Drugs Discovery
Today. 3:152159.[CrossRef]
-
Velculescu VE, Zhang L, Vogelstein B, Kinzler KW. 1995 Serial analysis of gene expression. Science. 270:484487.[Abstract/Free Full Text]
-
Zhang L, Zhou W, Velculescu VE, et al. 1997 Gene
Expression Profiles in Normal and Cancer cells. Science. 276:12681272.[Abstract/Free Full Text]
-
Velculescu VE, Zhang L, Zhou W, et al. 1997 Characterization of the yeast transcriptome. Cell. 88:243251.[CrossRef][Medline]
-
Madden SL, Galella EA, Zhu J, Bertelsen AH, Beaudry
GA. 1997 SAGE transcript profiles for p53-dependent growth
regulation. Oncogene. 15:10791085.[CrossRef][Medline]
-
Dai G, Levy O, Carasco N. 1996 Cloning and
characterization of the thyroid iodide transporter. Nature. 379:458460.[CrossRef][Medline]
-
Scott DA, Wang R, Kreman TM, Sheffield VC,
Karniski LP. 1999 The Pendred syndrome gene encodes a
chloride-iodide transport protein. Nat Genet. 21:440443.[CrossRef][Medline]
-
Taurog A. 1991 Hormone synthesis. In: Braverman
LE, Utiger RD, eds. The thyroid. Philadelphia: JB Lippincott;
5198.
-
Bikker H, Baas F, deVijlder JJM. 1997 Molecular
analysis of mutated thyroid peroxidase detected in patients with total
iodide organification defects. J Clin Endocrinol Metab. 82:649653.[Abstract/Free Full Text]
-
Clifton-Bligh RJ, Wentworth JM, Heinz P, et al. 1998 Mutation of the gene encoding human TTF-2 associated with thyroid
agenesis, cleft palate and choanal atresia. Nat Gene. 19:399401.[CrossRef][Medline]
-
Macchia PE, Lapi P, Krude H, et al. 1998 PAX8
mutations associated with congenital hypothyroidism caused by thyroid
dysgenesis. Nat Gene. 19:8386.[CrossRef][Medline]
-
DeVriendt K, VanHol C, Matthijs G, DeZegher F. 1998 Deletion of thyroid transcription factor-1 gene in an infant with
neonatal thyroid dysfunction and respiratory failure. N Engl
J Med. 338:13171318.[Free Full Text]
-
VanHauwe P, Everett LA, Coucke P, et al. 1998 Two
frequent missense mutations in Pendred syndrome. Hum Mol Gene. 7:10991104.[Abstract/Free Full Text]
-
Leseney A-M, Deme D, Legue O, et al. 1999 Biochemical characterization of a Ca2+/NAD(P)H-dependent
H2O2 generator in human thyroid tissue. Biochimie. 81:373380.[Medline]
-
Ausubel FM, Brent R, Kingston RE, et al. 1999 Chanda VB, ed. Current protocols in molecular biology. John Wiley &
Sons, Inc.
-
Church GM, Gilbert W. 1984 Genomic sequencing. Proc
Natl Acad Sci USA. 81:19911995.[Abstract/Free Full Text]
-
Welle S, Bhatt K, Thornton CA. 1999 Inventory of
high-abundance mRNAs in skeletal muscle of normal men. Genome Res. 9:506513.[Abstract/Free Full Text]
-
Smanik PA, Liu Q, Furminger TL, et al. 1996 Cloning
of the human iodide symporter. Biochem Biophy Res Commun. 226:339345.[CrossRef][Medline]
-
Kalina M, Pearse AGE. 1971 Ultrastructural
localization of calcitonin in C-cells of dog thyroid; an
immunocytochemical study. Histochemie. 26:1.[CrossRef][Medline]
-
Cox F, Gestautas J, Rapoport B. 1988 Molecular
cloning of cDNA corresponding to mRNA species whose steady state levels
in the thyroid are enhanced by thyrotropin. Homology of one of these
sequences with ferritin H. J Biol Chem. 263:70607067.[Abstract/Free Full Text]
-
Hastie ND, Bishop JO. 1976 The expression of three
abundance classes of messenger RNA in mouse tissues. Cell. 9:761774.[CrossRef][Medline]
This article has been cited by other articles:

|
 |

|
 |
 
L. de Sanctis, A. Corrias, D. Romagnolo, T. DI Palma, A. Biava, G. Borgarello, P. Gianino, L. Silvestro, M. Zannini, and I. Dianzani
Familial PAX8 Small Deletion (c.989_992delACCC) Associated with Extreme Phenotype Variability
J. Clin. Endocrinol. Metab.,
November 1, 2004;
89(11):
5669 - 5674.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
Y. Yano, N. Uematsu, T. Yashiro, H. Hara, E. Ueno, M. Miwa, G. Tsujimoto, Y. Aiyoshi, and K. Uchida
Gene Expression Profiling Identifies Platelet-Derived Growth Factor as a Diagnostic Molecular Marker for Papillary Thyroid Carcinoma
Clin. Cancer Res.,
March 15, 2004;
10(6):
2035 - 2043.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
C. P. Leo, S. Y. Hsu, and A. J. W. Hsueh
Hormonal Genomics
Endocr. Rev.,
June 1, 2002;
23(3):
369 - 381.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
E. Pauws, A. H. C. van Kampen, S. A. R. van de Graaf, J. J. M. de Vijlder, and C. Ris-Stalpers
Heterogeneity in polyadenylation cleavage sites in mammalian mRNA sequences: implications for SAGE analysis
Nucleic Acids Res.,
April 15, 2001;
29(8):
1690 - 1694.
[Abstract]
[Full Text]
[PDF]
|
 |
|