|
|
||||||||
From the Department of Cell Biology and Human Anatomy, University of California School of Medicine, Davis, California.
| Abstract |
|---|
|
|
|---|
METHODS. The primary sequence and gene structure of CP49 from a third vertebrate order was determined from a combination of cDNA and genomic sequencing. Protein product was characterized by SDS-PAGE and Western blot analysis. Consensus features and phylogenetic relationships were identified by multiple alignment. Coiled-coil analysis was conducted to define central rod domains.
RESULTS. Trout CP49 is unique among CP49s in having a 39-amino-acid tail domain and shows both unique sequence and allelic variation at the LNDR motif. Comparison of consensus sequences identified unprecedented divergence between CP49s and other type I cytokeratins, including a shortened central rod domain that is conserved among CP49s, but distinct from type I cytokeratins.
CONCLUSIONS. The considerable differences that have emerged between the consensus features of the type I cytokeratins and the CP49s suggest that the beaded filament serves a significantly different function from intermediate filaments in other epithelia and that type I cytokeratins may have limited utility as a model for studies on lens beaded filaments. These differences, in concert with consensus features identified among CP49s, suggest sites that are probably critical to CP49 function in the lens fiber cell.
| Introduction |
|---|
|
|
|---|
Determination of primary sequence and gene structure has established that both BF proteins are part of the intermediate filament (IF) family of proteins.2 13 14 15 16 17 18 19 20 Although the family of cytoplasmic IF proteins vary considerably in both size and primary sequence, they have historically been unified into a gene family on the basis of several properties: conservation of gene structure; a common domain structure consisting of variable head and tail domains flanking a central rod domain that is conserved in size and in predicted subdomain structure; a low level of overall sequence identity, including strong conservation at two short motifs found at the beginning and ends of the central rod domain; and the ability to assemble into10-nm IFs.
Although the BF proteins are clearly a part of the IF family, they have also been noteworthy for the degree to which they do not have the features that have otherwise been highly conserved among all cytoplasmic IF proteinsfeatures that have been hypothesized or experimentally established to be critical for assembly. The BF proteins that have been sequenced thus far have been limited to mammalian and avian examples. To widen the scope of comparison among CP49s and to help distinguish features that are conserved from those that are species-specific variants, we sequenced a CP49 from a third vertebrate order, fish. The data reported herein establish that some of the features of avian-mammalian CP49s that are noteworthy for their variance from other type I cytokeratins are not necessarily conserved among all CP49s. Multiple alignment of the CP49s defines features and residues that have been strongly conserved and highlights the unusual degree to which the CP49s have diverged from the remainder of the type I cytokeratins. Such information identifies residues and properties that are conserved, presumably because of functional importance, and provides a basis for identifying features that adapt the type I cytokeratin CP49 to its function in the lens fiber cell.
| Materials and Methods |
|---|
|
|
|---|
Determination of trout CP49 sequence was initiated by PCR, using degenerate primers derived by alignment of CP49s and identification of conserved regions: 5'-TAYGARAAYGARCARCCNTT-3' and 5'-YTCNATNTCRTGCCARTG-3'. A 500-bp product resulted from PCR with these primers, which permitted amplification by 3'-rapid amplification of cDNA ends (RACE), using gene-specific and oligo dT primers, yielding an additional 460 bp. 5'-RACE was conducted also using gene-specific primers and oligo dC, after deoxyguanosine triphosphate (dGTP) tailing of reverse-transcribed trout lens RNA.
Trout genomic DNA was isolated from trout liver with a kit (DNeasy; Qiagen, Chatsworth, CA). Ambiguity in the cDNA sequence for nucleotide 296 (Fig. 1) was resolved by PCR amplification of genomic DNA isolated from six different individuals.
|
SDS-PAGE and Immunoblot Analysis
Decapsulated trout lenses were homogenized in 50 mM Tris, 5 mM
EDTA, and a cocktail of protease inhibitors (Complete Mini; Roche
Biochemicals, Indianapolis, IN) and fractionated into buffer-soluble
and buffer-insoluble fractions by centrifugation at 50,000g
for 30 minutes. The buffer-insoluble fraction was solubilized for
SDS-PAGE, resolved on 12.5% SDS-polyacrylamide gels, and transferred
to polyvinylidene fluoride (PVDF) membrane (Immobilon P; Millipore,
Bedford, MA). Immunoblots were probed with rabbit antiserum raised
against recombinant mouse CP49, at 1:2000 dilution. Primary antibody
was visualized with alkaline phosphatase-conjugated goat anti-rabbit
antibody, and nitroblue tetrazolium chloride
(NBT)-5-bromo-4-chloro-3-indoyl (BCIP) substrate-chromogen.
Sequence Analysis
Multiple alignments were conducted at
http://www.toulouse.inra.fr/multalin.html (provided in the public
domain by Institut National de la Recherche Agronomique, Tolouse,
France), using the method of Corpet et al.22
Paircoil
analysis was conducted at
http://nightingale.lcs.mit.edu/cgi-bin/score (provided in the
public domain by Massachusetts Institute of Technology, Cambridge,
MA), using the method of Berger et al.23
To permit
direct alignment and comparison of the resultant Paircoil graphs, all
Paircoil analyses were conducted on uniformly sized fragments of IF
proteins. Fragments consisted of the central rod domain, plus 15
residues of the head domain (counting from the conserved L of the LNDR
motif), and 4 residues of the tail domain (8 residues past the
conserved Y of the TYRKLLEGE motif). These residues are identified in
Figure 3
by the symbol
below the residue.
|
| Results |
|---|
|
|
|---|
The deduced trout CP49 amino acid sequence was used to probe the protein databases and showed the highest degree of similarity to the existing human, bovine, murine, and chicken CP49s, with sequence identities ranging from 49% to 52%. This level of identity is typical of that seen between IF homologues from orders this distant over the evolutionary spectrum. The best non-CP49 match was the type I cytokeratin 18 at 32% identity, followed by several dozen different type I cytokeratins at comparable levels of identity. These results are similar to those achieved when other CP49s were used to probe protein sequence databases.14
Trout CP49 Gene Structure
Gene structure is generally well conserved among IF proteins,
particularly within the segment of gene that encodes for the rod
domain.24
25
26
27
28
The number and location of introns is
strongly conserved within an IF class, mirroring the subgrouping of IF
proteins that is achieved through primary sequence analysis. Thus, IF
type can be determined by gene structure as well as by primary sequence
similarity.29
Figure 2
shows the nucleotide sequence at the intronexon boundaries for
introns that we have identified in the trout CP49 gene, along with the
approximate size of the introns, estimated by PCR-agarose gel
electrophoresis. We show that introns C, E, F, and G, which are common
to type I, II, and III IF proteins, are present in trout CP49. These
introns are identical in location and in the phase of the triplet codon
that they interrupt, with those shown for human CP49, and for type I
cytokeratins in general. Intron H, which is generally present in type
I, II, and III IF genes, and in the human CP49, is absent from trout.
Notably, intron H is also absent from the chicken CP49
gene,30
as well as the type I cytokeratin K19
gene.31
|
We identified an intron in the tail domain of the trout CP49 gene as well (located in Fig. 1 at nucleotide 1215).
CP49 Consensus Features
To identify CP49 consensus features, we conducted a multiple
alignment of all CP49s sequenced to date (Fig. 3)
. The most striking difference between trout and other CP49s was the
presence of a 39-amino-acid tail domain in trout CP49. The presence of
this tail domain leads to an increase in molecular weight of
approximately 4.2 kDa and a corresponding shift in mobility in
SDS-PAGE/immunoblot analysis. This shift can be seen in Figure 4
, which presents an immunoblot of trout lens (lane A) and bovine lens
(lane B) probed with rabbit antiserum raised against recombinant mouse
CP49. This blot also confirms the strong immunologic relationship among
the trout, bovine, and murine CP49s.
|
Two short motifs, located at the beginning and end of the rod domain, are among the best conserved regions of IF proteins. These motifs are unusually sensitive to mutations and are considered critical to IF assembly.32 33 34 35 36 37 38 The consensus sequences for these two motifs in type I cytokeratins are LNDR and TYRRLLEGE. The former of these two is by far the best conserved. The CP49 homologues to these motifs are highlighted in Figure 3 (starting at residues 154 and 460, respectively). Alignment of the LNDR region from 10 human type I cytokeratins and a sampling of cytokeratins from several vertebrate orders establishes that the LNDR sequence is 100% conserved among the non-CP49 type I cytokeratins (though exceptions are likely to be found). In contrast, the CP49s show variation from the type I consensus at three of four residues in this motif. Moreover, this motif is not well conserved, even among the CP49s, with three permutations identified in the five species sequenced to date. Only the first (L) and fourth (C) residues are conserved among CP49s in these sequences. We noted allelic variations in this motif even within the population of trout we examined, with both LNSC and LNNC identified.
Also indicated in Figure 3
are the sites where mutations in human CP49
have been implicated as cataractogenic (denoted in Fig. 3
as
at
amino acid 282, and
at amino acid 343). In both cases the residue
is 100% conserved among the CP49s.
Domain Structure
Paircoil analysis is a means of predicting whether a given primary
sequence is likely to engage in the formation of a coiled-coil dimer.
The coiled-coil is a dimer formed by the pairing of two stretches of
helix. The dimer, in turn, exhibits a gentle supercoiling or
coiling of the coils. The initial dimerization appears to require the
presence of predominantly hydrophobic residues along one edge of each
helix (at the 1 and 4 positions of the heptads that comprise the
helix). The algorithm is based on empiric data derived from the
protein crystal database, where such dimers have been demonstrated.
Such analysis predicts the presence of a central rod domain in all
cytoplasmic IF proteins. This domain is predicted to be rich in
helical regions (coils) characterized by the heptad repeat pattern that
predicts coiled-coil interactions. The coils are separated by short
regions (linkers) that are not predicted to be
helical. The overall
size of the rod domain has been strongly conserved among IF proteins, a
feature considered critical to the initial pairing of IF proteins as a
coiled-coil dimer.39
We therefore used Paircoil analysis
to characterize the predicted rod domain in the CP49s and compared this
with the prediction generated for several type I cytokeratins.
Figure 5 presents a graphic representation of eight human type I cytokeratins, and five CP49s. The probability of coiled-coil formation is scored on the y-axis, from 0 to 1, with 1 the maximal probability and 0.5 the cutoff. The amino acid residue is numbered on the x-axis. To permit visual comparison of overall rod domain size and the distribution of coil and noncoil subdomains, we analyzed the same sized fragment from each protein, a fragment that includes the rod domain, plus a small amount of flanking sequence (described in the Materials and Methods section).
|
Figures 5h 5i 5j 5k 5l are five CP49s. These, too, show features that emerged as generally common to the CP49s: The first region to exceed the 0.5 default cutoff in CP49s does not occur until a point that is equivalent to the second half of rod 1b in the type I cytokeratins. Thus, CP49s have no rod domain 1a (exhibiting a conserved lower probability). Rod domain 1b starts later and is overall shorter than its counterpart in type I cytokeratins.
Rod domain 2 in the CP49s begins and ends at a site equivalent to that for rod 2 in the cytokeratins, but exhibits different features: The strength of the signal in CP49 rod 2 is generally lower, with human as an exception; and the CP49 rod 2 is subdivided by a gap in signal strength. This gap occurs where the heptad repeat pattern "stutters" (shifts phase). This stutter is demarcated by asterisks over residues 417-426 in Figure 3 . The presence of a stutter in the heptad repeat pattern is a well-conserved feature of rod domain 2 in IF proteins and aligns exactly with the interruption noted here. Thus, CP49s appear to have conserved this feature of IF proteins.
Although variation exists in the Paircoil profiles of individual proteins, particularly in the more distant type I cytokeratins, the Paircoil analysis suggests conservation of patterns, regardless of where the cutoff value may be set. The most notable difference between CP49s and the other type I cytokeratins is that the CP49s all predict a shorter central rod domain. This observation is particularly interesting, because similar analysis of CP49s assembly partner CP115 also showed a shorter rod domain.2 15 This analysis, theoretical until structural data can be generated, suggests that the CP49s have diverged from the majority of type I cytokeratins in secondary structure as well.
It has been postulated that the large number of IF proteins found in a given species have been derived from a single ancestral gene by divergent evolution,24 an assumption based on the gene structure and sequence conservation seen among most IFs. Insight can be gained into the IF family tree by establishing percent accepted mutation (PAM) distances between IF proteins within a species as though they were homologues from different species. This analysis integrates the degree of sequence divergence between a given protein and all others in the comparison group and sums the process for each of the proteins within the group. Figure 6 shows such an analysis of 25 human IF proteins, including CP49 and CP115. The historic clustering of cytoplasmic IF proteins into types I through IV based on primary sequence of the central rod domain, tissue distribution, and gene structure is reiterated in such an analysis. The type I cytokeratins (K9-20), type II cytokeratins (K1-8), type III IF proteins (GFAP, DES, VIM, PER), and type IV neurofilament proteins (NFH, NFM, NFL) form distinct clusters. The two BF proteins, CP49 and CP115 (filensin), stand out as the most distant members of the human IF family.
|
| Discussion |
|---|
|
|
|---|
In this report, we extend the analysis of the CP49s beyond the mammalian and avian sequences that have been reported thus far. The data presented herein established that some of the features hypothesized to be unique to the CP49s are conserved in a species from a third vertebrate order. This suggests that such features have resulted from strong selective pressure and are important to biological function in the lens fiber cells. Conversely, some of the features considered unique to the CP49s are less well conserved, being absent from the trout CP49. By expanding the database of CP49 sequence information, we contribute to identification of those residues, motifs, and properties that have been retained across a wider phylogenetic spectrum and help discriminate these from variations that may be species specific.
One of the more interesting differences between the CP49s and other
type I cytokeratins occurs at the "LNDR" motif found near the
beginning of the central rod domain in all IF proteins. This motif is
extremely well-conserved, not only among the several type I
cytokeratins in humans, but also in type I cytokeratins from several
vertebrate orders. The importance suggested by its strong conservation
is confirmed by the sensitivity of this motif to disease-causing
mutations. A large body of elegant work has established that a mutation
in the LNDR motif that causes an R
C substitution at the fourth
residue, for example, is the cause of one form of the human skin
disorder epidermolysis bullosa simplex (EBS), a skin-blistering
disease.37
38
40
This substitution in some way compromises
the capacity of epidermal IFs to provide the necessary resistance to
mechanical trauma, resulting in a separation of epidermal layers and
blistering. This has been confirmed by experimental introduction of
these same mutations in mice. Yet in the CP49s, the presence of a C at
that same fourth position is a conserved feature of all the CP49s thus
far sequenced, including the trout. Further, this motif as a whole,
shows a relatively high degree of variability, even within the CP49s,
with three permutations reported in the five species that have been
sequenced. We note that allelic variation occurs even within the
population of trout that were included in this study. The variability
that is demonstrated at this motif suggests that it experiences a
weaker selective pressure than its counterpart in the other type I
cytokeratins, implying the assumption of a less important role in the
biology of this protein.
It is interesting to speculate that the variability seen in the CP49 homologue of the LNDR motif is related to the changes seen in the predicted central rod domain of CP49. Paircoil analysis shown in Figure 5 suggests not only that the central rod domain of CP49 begins well after the LNDR motif but is also shorter in overall size than that seen in other IF proteins. If the LNDR motif represents a start point for anchoring the formation of a coiled-coil dimer, then the shifting of this start point farther downstream in CP49 may lower the selective pressure on the LNDR homologue in the CP49s, resulting in the emergence of sequence variability. This leads to the hypothesis that a CP49 should then show a high degree of sequence conservation at the beginning of its foreshortened rod domain. In fact, this is the case. One of the longest runs of absolutely conserved amino acids among the CP49s occurs at the very beginning of what is predicted by Paircoil analysis to be the rod domain (amino acids 239-250 in CP49 Conserved, Fig. 3 ).
The difference between CP49s and the other type I cytokeratins at the LNDR motif is of particular interest because of the established importance of the motif in human disease. However, we noted similar variations between CP49s and the other residues that are highly conserved among type I cytokeratins. To identify residues that are likely to be critical to type I cytokeratins we conducted multiple alignment of all human type I cytokeratins, plus representatives of type I cytokeratins from several different vertebrate orders (Fig. 3) . We identified 43 residues that were 100% conserved in this population, implying functional importance. This suggests that any newly identified type I cytokeratin would have a very high probability of exhibiting the same residues at all or most of these sites. The CP49s are identical at only 24 of the 43 residues, further reinforcing the size of the gap that exists between the CP49s and the remainder of the type I cytokeratins.
The mammalian and avian CP49s were noteworthy for the absence of the tail domain, a feature that looked to be unique among IF proteins and a conserved feature of the CP49s.14 17 This would suggest that a tail domain is either not important in the function of the CP49 or perhaps even a detriment to it. However, the trout CP49 exhibits a 39-amino-acid tail, comparable in size, but not sequence, to that that is typical of type I cytokeratins.
In determining the location of introns for the trout CP49 gene we noted that the trout CP49 gene does not have the intron H (Fig. 2) . This intron is commonly found at the very end of the rod domain among type I, II, and III IF proteins and in human CP49 as well. The absence of intron H in the trout CP49 gene raises the possibility that the trout CP49 acquired a tail domain through a mutation that eliminated a splice site. In the absence of selective pressure against the tail domain, this feature may have persisted. It is worth noting that the chicken CP49 also has no intron H,30 but also that the chicken CP49 has no tail domain.
Alignments of the CP49s shows a strong run of absolutely conserved sequence in the head domain of CP49s (RRALGISSVFLQGLRS, starting at aa residue 102 in Fig. 3 ). This region stands out not only because this sequence is so well conserved in CP49s, but also because there is no comparably conserved region in the type I cytokeratins, suggesting the assumption of importance of this region in the head domain of CP49.
Among the members of an IF type, such as the type I cytokeratins, the level of sequence identity, the conservation of rod domain gene structure, and the similarity in properties is generally quite strong. The CP49s are clearly an exception to this tendency. CP49s do, in fact, have a rod domain gene structure that is similar to that of most type I cytokeratins14 and that is distinct from all other IF types. They are thus clearly type I cytokeratins. However, beyond this, the relationship to the type I cytokeratins weakens considerably, suggesting that the CP49s have experienced a dramatically different set of selective pressures, resulting in considerable change in both primary sequence and secondary structure. CP49s are closer in primary sequence identity to the type I cytokeratins than to other IF types, but only marginally so. Similarly, the motifs and residues that are either absolutely or at least extremely well conserved among the type I cytokeratins have undergone extensive divergence in the CP49s. The same contrast emerges for features such as secondary structure. The ultimate question, of course, is how these differences adapt the CP49 function to the unique biology of the lens fiber cell. To answer this question will undoubtedly require loss-of-functiongain-of-function mutational studies that target areas of the CP49s identified as strongly conserved through analysis, such as that presented in this report.
It may be predicted that the production of a BF instead of an intermediate filament would require substantial changes in primary and secondary structure. Such a hypothesis is appealing because it rationalizes the existence of two distinct categories of IF: the classic 8- to 11-nm IFs and the BFs. In such a case, the features and sequence motifs that are conserved among the IF proteins would be relevant to assembly into 10-nm IFs, whereas those divergences cataloged in the BF proteins would explain their alternative assembly outcome. However, Goulielmos et al.41 and Carter et al.42 and have reported that the BF proteins assemble in vitro into classic 10-nm filaments. Thus, the rather dramatic changes seen in the BF proteins do not eliminate their capacity to assemble into 10-nm filaments. This is a striking observation given the extreme degree to which the CP49, as well as its assembly partner CP115, have varied from the IF consensus.
That CP49 and CP115 form a heteropolymer only adds to the complexity of the story. In this vein, it is worth noting that one of CP115s most unusual features has been a shortened central rod domain.2 15 Multiple alignment showed that the CP115 has homologues of the LNDR and TYRKLLEGE motifs that demarcate the beginning and end of the rod domain, but that these domains are 28 amino acids closer together in CP115 than in any other member of the IF family.15 Paircoil analysis of the CP115 confirms that the predicted rod domain is shorter as well and is more in line with that predicted for the CP49. This leads to the hypothesis that CP49 and CP115 form heterodimers with central rod domains that are matched in size. This is supported by yeast two-hybrid data that support the hypothesis of a CP49-CP115 heterodimer.15 Although the meaning of such predictions must be confirmed by experimental data, it prompts consideration of the need for mutual evolution of these two assembly partners.
Neither IF nor BF proteins have been crystallized; thus, their structure and the filaments they form are understood in only the sketchiest of terms. It will be most interesting to finally understand exactly what fiber cellspecific functions are permitted by the unusual divergence seen in the BF proteins. Of equal interest will be the determination of how point mutations in structural proteins such as these are causative in some forms of inherited human cataract.11 12
| Footnotes |
|---|
Submitted for publication August 1, 2001; accepted September 5, 2001.
Commercial relationships policy: N.
The publication costs of this article were defrayed in part by page charge payment. This article must therefore be marked "advertisement" in accordance with 18 U.S.C.
1734 solely to indicate this fact.
Corresponding Author: Paul FitzGerald, Department of Cell Biology and Human Anatomy, University of California School of Medicine, One Shields Avenue, Davis, CA 95616; pgfitzgerald{at}ucdavis.edu.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
J. T. Pittenger, J. F. Hess, and P. G. FitzGerald Identifying the Role of Specific Motifs in the Lens Fiber Cell Specific Intermediate Filament Phakosin Invest. Ophthalmol. Vis. Sci., November 1, 2007; 48(11): 5132 - 5141. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Alizadeh, J. Clark, T. Seeberger, J. Hess, T. Blankenship, and P. G. FitzGerald Characterization of a Mutation in the Lens-Specific CP49 in the 129 Strain of Mouse Invest. Ophthalmol. Vis. Sci., March 1, 2004; 45(3): 884 - 891. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Alizadeh, J. Clark, T. Seeberger, J. Hess, T. Blankenship, and P. G. FitzGerald Targeted Deletion of the Lens Fiber Cell-Specific Intermediate Filament Protein Filensin Invest. Ophthalmol. Vis. Sci., December 1, 2003; 44(12): 5252 - 5258. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Alizadeh, J. I. Clark, T. Seeberger, J. Hess, T. Blankenship, A. Spicer, and P. G. FitzGerald Targeted Genomic Deletion of the Lens-Specific Intermediate Filament Protein CP49 Invest. Ophthalmol. Vis. Sci., December 1, 2002; 43(12): 3722 - 3727. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |