POL_HTL3P

ID   POL_HTL3P               Reviewed;        1440 AA.
AC   Q4U0X6;
DT   28-NOV-2006, integrated into UniProtKB/Swiss-Prot.
DT   23-JAN-2007, sequence version 4.
DT   03-AUG-2022, entry version 106.
DE   RecName: Full=Gag-Pro-Pol polyprotein;
DE   AltName: Full=Pr160Gag-Pro-Pol;
DE   Contains:
DE     RecName: Full=Matrix protein p19;
DE              Short=MA;
DE   Contains:
DE     RecName: Full=Capsid protein p24;
DE              Short=CA;
DE   Contains:
DE     RecName: Full=Nucleocapsid protein p15-pro;
DE              Short=NC';
DE              Short=NC-pro;
DE   Contains:
DE     RecName: Full=Protease;
DE              Short=PR;
DE              EC=3.4.23.-;
DE   Contains:
DE     RecName: Full=p1;
DE   Contains:
DE     RecName: Full=Reverse transcriptase/ribonuclease H;
DE              Short=RT;
DE              EC=2.7.7.49;
DE              EC=2.7.7.7;
DE              EC=3.1.26.4;
DE   Contains:
DE     RecName: Full=Integrase;
DE              Short=IN;
DE              EC=2.7.7.- {ECO:0000250|UniProtKB:P03363};
DE              EC=3.1.-.- {ECO:0000250|UniProtKB:P03363};
GN   Name=gag-pro-pol;
OS   Human T-cell leukemia virus 3 (strain Pyl43) (HTLV-3).
OC   Viruses; Riboviria; Pararnavirae; Artverviricota; Revtraviricetes;
OC   Ortervirales; Retroviridae; Orthoretrovirinae; Deltaretrovirus.
OX   NCBI_TaxID=406769;
OH   NCBI_TaxID=9606; Homo sapiens (Human).
RN   [1]
RP   NUCLEOTIDE SEQUENCE [GENOMIC DNA].
RX   PubMed=16973592; DOI=10.1128/jvi.00799-06;
RA   Calattini S., Chevalier S.A., Duprez R., Afonso P., Froment A., Gessain A.,
RA   Mahieux R.;
RT   "Human T-cell lymphotropic virus type 3: complete nucleotide sequence and
RT   characterization of the human tax3 protein.";
RL   J. Virol. 80:9876-9888(2006).
CC   -!- FUNCTION: Matrix protein p19 targets Gag, Gag-Pro and Gag-Pro-Pol
CC       polyproteins to the plasma membrane via a multipartite membrane binding
CC       signal, that includes its myristoylated N-terminus. Also mediates
CC       nuclear localization of the preintegration complex (By similarity).
CC       {ECO:0000250}.
CC   -!- FUNCTION: Capsid protein p24 forms the conical core of the virus that
CC       encapsulates the genomic RNA-nucleocapsid complex. {ECO:0000250}.
CC   -!- FUNCTION: Nucleocapsid protein p15 is involved in the packaging and
CC       encapsidation of two copies of the genome. {ECO:0000250}.
CC   -!- FUNCTION: The aspartyl protease mediates proteolytic cleavages of Gag,
CC       Gag-Pro and Gag-Pro-Pol polyproteins during or shortly after the
CC       release of the virion from the plasma membrane. Cleavages take place as
CC       an ordered, step-wise cascade to yield mature proteins. This process is
CC       called maturation. Displays maximal activity during the budding process
CC       just prior to particle release from the cell. Hydrolyzes host EIF4GI in
CC       order to shut off the capped cellular mRNA translation. The resulting
CC       inhibition of cellular protein synthesis serves to ensure maximal viral
CC       gene expression and to evade host immune response (By similarity).
CC       {ECO:0000250}.
CC   -!- FUNCTION: Reverse transcriptase (RT) is a multifunctional enzyme that
CC       converts the viral RNA genome into dsDNA in the cytoplasm, shortly
CC       after virus entry into the cell. This enzyme displays a DNA polymerase
CC       activity that can copy either DNA or RNA templates, and a ribonuclease
CC       H (RNase H) activity that cleaves the RNA strand of RNA-DNA
CC       heteroduplexes in a partially processive 3' to 5'-endonucleasic mode.
CC       Conversion of viral genomic RNA into dsDNA requires many steps. A tRNA-
CC       Pro binds to the primer-binding site (PBS) situated at the 5'-end of
CC       the viral RNA. RT uses the 3' end of the tRNA primer to perform a short
CC       round of RNA-dependent minus-strand DNA synthesis. The reading proceeds
CC       through the U5 region and ends after the repeated (R) region which is
CC       present at both ends of viral RNA. The portion of the RNA-DNA
CC       heteroduplex is digested by the RNase H, resulting in a ssDNA product
CC       attached to the tRNA primer. This ssDNA/tRNA hybridizes with the
CC       identical R region situated at the 3' end of viral RNA. This template
CC       exchange, known as minus-strand DNA strong stop transfer, can be either
CC       intra- or intermolecular. RT uses the 3' end of this newly synthesized
CC       short ssDNA to perform the RNA-dependent minus-strand DNA synthesis of
CC       the whole template. RNase H digests the RNA template except for a
CC       polypurine tract (PPT) situated at the 5' end of the genome. It is not
CC       clear if both polymerase and RNase H activities are simultaneous. RNase
CC       H probably can proceed both in a polymerase-dependent (RNA cut into
CC       small fragments by the same RT performing DNA synthesis) and a
CC       polymerase-independent mode (cleavage of remaining RNA fragments by
CC       free RTs). Secondly, RT performs DNA-directed plus-strand DNA synthesis
CC       using the PPT that has not been removed by RNase H as primer. PPT and
CC       tRNA primers are then removed by RNase H. The 3' and 5' ssDNA PBS
CC       regions hybridize to form a circular dsDNA intermediate. Strand
CC       displacement synthesis by RT to the PBS and PPT ends produces a blunt
CC       ended, linear dsDNA copy of the viral genome that includes long
CC       terminal repeats (LTRs) at both ends (By similarity). {ECO:0000250}.
CC   -!- FUNCTION: Integrase catalyzes viral DNA integration into the host
CC       chromosome, by performing a series of DNA cutting and joining
CC       reactions. This enzyme activity takes place after virion entry into a
CC       cell and reverse transcription of the RNA genome in dsDNA. The first
CC       step in the integration process is 3' processing. This step requires a
CC       complex comprising the viral genome, matrix protein, and integrase.
CC       This complex is called the pre-integration complex (PIC). The integrase
CC       protein removes 2 nucleotides from each 3' end of the viral DNA,
CC       leaving recessed dinucleotides OH's at the 3' ends. In the second step,
CC       the PIC access cell chromosomes during cell division. The third step,
CC       termed strand transfer, the integrase protein joins the previously
CC       processed 3' ends to the 5'-ends of strands of target cellular DNA at
CC       the site of integration. The 5'-ends are produced by integrase-
CC       catalyzed staggered cuts, 5 bp apart. A Y-shaped, gapped, recombination
CC       intermediate results, with the 5'-ends of the viral DNA strands and the
CC       3' ends of target DNA strands remaining unjoined, flanking a gap of 5
CC       bp. The last step is viral DNA integration into host chromosome. This
CC       involves host DNA repair synthesis in which the 5 bp gaps between the
CC       unjoined strands (see above) are filled in and then ligated (By
CC       similarity). {ECO:0000250}.
CC   -!- CATALYTIC ACTIVITY:
CC       Reaction=Endonucleolytic cleavage to 5'-phosphomonoester.; EC=3.1.26.4;
CC         Evidence={ECO:0000255|PROSITE-ProRule:PRU00408};
CC   -!- CATALYTIC ACTIVITY:
CC       Reaction=a 2'-deoxyribonucleoside 5'-triphosphate + DNA(n) =
CC         diphosphate + DNA(n+1); Xref=Rhea:RHEA:22508, Rhea:RHEA-COMP:17339,
CC         Rhea:RHEA-COMP:17340, ChEBI:CHEBI:33019, ChEBI:CHEBI:61560,
CC         ChEBI:CHEBI:173112; EC=2.7.7.49; Evidence={ECO:0000255|PROSITE-
CC         ProRule:PRU00405};
CC   -!- CATALYTIC ACTIVITY:
CC       Reaction=a 2'-deoxyribonucleoside 5'-triphosphate + DNA(n) =
CC         diphosphate + DNA(n+1); Xref=Rhea:RHEA:22508, Rhea:RHEA-COMP:17339,
CC         Rhea:RHEA-COMP:17340, ChEBI:CHEBI:33019, ChEBI:CHEBI:61560,
CC         ChEBI:CHEBI:173112; EC=2.7.7.7; Evidence={ECO:0000255|PROSITE-
CC         ProRule:PRU00405};
CC   -!- COFACTOR:
CC       Name=Mg(2+); Xref=ChEBI:CHEBI:18420; Evidence={ECO:0000250};
CC       Note=Binds 2 magnesium ions for reverse transcriptase polymerase
CC       activity. {ECO:0000250};
CC   -!- COFACTOR:
CC       Name=Mg(2+); Xref=ChEBI:CHEBI:18420; Evidence={ECO:0000250};
CC       Note=Binds 2 magnesium ions for ribonuclease H (RNase H) activity.
CC       {ECO:0000250};
CC   -!- SUBUNIT: Interacts with human TSG101. This interaction is essential for
CC       budding and release of viral particles (By similarity). {ECO:0000250}.
CC   -!- SUBCELLULAR LOCATION: [Matrix protein p19]: Virion {ECO:0000305}.
CC   -!- SUBCELLULAR LOCATION: [Capsid protein p24]: Virion {ECO:0000305}.
CC   -!- SUBCELLULAR LOCATION: [Nucleocapsid protein p15-pro]: Virion
CC       {ECO:0000305}.
CC   -!- ALTERNATIVE PRODUCTS:
CC       Event=Ribosomal frameshifting; Named isoforms=3;
CC         Comment=This strategy of translation probably allows the virus to
CC         modulate the quantity of each viral protein.;
CC       Name=Gag-Pol polyprotein;
CC         IsoId=Q4U0X6-1; Sequence=Displayed;
CC       Name=Gag-Pro polyprotein;
CC         IsoId=Q09SZ9-1; Sequence=External;
CC       Name=Gag polyprotein;
CC         IsoId=Q09T00-1; Sequence=External;
CC   -!- DOMAIN: Late-budding domains (L domains) are short sequence motifs
CC       essential for viral particle release. They can occur individually or in
CC       close proximity within structural proteins. They interacts with sorting
CC       cellular proteins of the multivesicular body (MVB) pathway. Most of
CC       these proteins are class E vacuolar protein sorting factors belonging
CC       to ESCRT-I, ESCRT-II or ESCRT-III complexes. Matrix protein p19
CC       contains two L domains: a PTAP/PSAP motif which interacts with the UEV
CC       domain of TSG101, and a PPXY motif which binds to the WW domains of
CC       HECT (homologous to E6-AP C-terminus) E3 ubiquitin ligases (By
CC       similarity). {ECO:0000250}.
CC   -!- DOMAIN: The capsid protein N-terminus seems to be involved in Gag-Gag
CC       interactions. {ECO:0000250}.
CC   -!- PTM: Specific enzymatic cleavages by the viral protease yield mature
CC       proteins. The polyprotein is cleaved during and after budding, this
CC       process is termed maturation. The protease is autoproteolytically
CC       processed at its N- and C-termini (By similarity). {ECO:0000250}.
CC   -!- MISCELLANEOUS: The reverse transcriptase is an error-prone enzyme that
CC       lacks a proof-reading function. High mutations rate is a direct
CC       consequence of this characteristic. RT also displays frequent template
CC       switching leading to high recombination rate. Recombination mostly
CC       occurs between homologous regions of the two copackaged RNA genomes. If
CC       these two RNA molecules derive from different viral strains, reverse
CC       transcription will give rise to highly recombinated proviral DNAs (By
CC       similarity). {ECO:0000250}.
CC   -!- MISCELLANEOUS: [Isoform Gag-Pol polyprotein]: Produced by -1 ribosomal
CC       frameshifting at the gag-pol genes boundary.
CC   ---------------------------------------------------------------------------
CC   Copyrighted by the UniProt Consortium, see https://www.uniprot.org/terms
CC   Distributed under the Creative Commons Attribution (CC BY 4.0) License
CC   ---------------------------------------------------------------------------
DR   EMBL; DQ462191; AAY34569.2; ALT_SEQ; Genomic_DNA.
DR   SMR; Q4U0X6; -.
DR   Proteomes; UP000007684; Genome.
DR   GO; GO:0019013; C:viral nucleocapsid; IEA:UniProtKB-KW.
DR   GO; GO:0004190; F:aspartic-type endopeptidase activity; IEA:UniProtKB-KW.
DR   GO; GO:0003677; F:DNA binding; IEA:UniProtKB-KW.
DR   GO; GO:0003887; F:DNA-directed DNA polymerase activity; IEA:UniProtKB-EC.
DR   GO; GO:0003964; F:RNA-directed DNA polymerase activity; IEA:UniProtKB-KW.
DR   GO; GO:0004523; F:RNA-DNA hybrid ribonuclease activity; IEA:UniProtKB-EC.
DR   GO; GO:0005198; F:structural molecule activity; IEA:InterPro.
DR   GO; GO:0008270; F:zinc ion binding; IEA:InterPro.
DR   GO; GO:0015074; P:DNA integration; IEA:UniProtKB-KW.
DR   GO; GO:0006310; P:DNA recombination; IEA:UniProtKB-KW.
DR   GO; GO:0075713; P:establishment of integrated proviral latency; IEA:UniProtKB-KW.
DR   GO; GO:0006508; P:proteolysis; IEA:UniProtKB-KW.
DR   GO; GO:0039657; P:suppression by virus of host gene expression; IEA:UniProtKB-KW.
DR   GO; GO:0046718; P:viral entry into host cell; IEA:UniProtKB-KW.
DR   GO; GO:0044826; P:viral genome integration into host DNA; IEA:UniProtKB-KW.
DR   Gene3D; 1.10.1200.30; -; 1.
DR   Gene3D; 1.10.375.10; -; 1.
DR   Gene3D; 2.40.70.10; -; 1.
DR   Gene3D; 3.30.420.10; -; 2.
DR   Gene3D; 3.30.70.270; -; 2.
DR   InterPro; IPR001969; Aspartic_peptidase_AS.
DR   InterPro; IPR003139; D_retro_matrix.
DR   InterPro; IPR043502; DNA/RNA_pol_sf.
DR   InterPro; IPR045345; Gag_p24_C.
DR   InterPro; IPR000721; Gag_p24_N.
DR   InterPro; IPR001037; Integrase_C_retrovir.
DR   InterPro; IPR001584; Integrase_cat-core.
DR   InterPro; IPR003308; Integrase_Zn-bd_dom_N.
DR   InterPro; IPR001995; Peptidase_A2_cat.
DR   InterPro; IPR021109; Peptidase_aspartic_dom_sf.
DR   InterPro; IPR018061; Retropepsins.
DR   InterPro; IPR008916; Retrov_capsid_C.
DR   InterPro; IPR008919; Retrov_capsid_N.
DR   InterPro; IPR010999; Retrovr_matrix.
DR   InterPro; IPR043128; Rev_trsase/Diguanyl_cyclase.
DR   InterPro; IPR012337; RNaseH-like_sf.
DR   InterPro; IPR002156; RNaseH_domain.
DR   InterPro; IPR036397; RNaseH_sf.
DR   InterPro; IPR000477; RT_dom.
DR   InterPro; IPR001878; Znf_CCHC.
DR   InterPro; IPR036875; Znf_CCHC_sf.
DR   Pfam; PF02228; Gag_p19; 1.
DR   Pfam; PF00607; Gag_p24; 1.
DR   Pfam; PF19317; Gag_p24_C; 1.
DR   Pfam; PF00552; IN_DBD_C; 1.
DR   Pfam; PF02022; Integrase_Zn; 1.
DR   Pfam; PF00075; RNase_H; 1.
DR   Pfam; PF00665; rve; 1.
DR   Pfam; PF00077; RVP; 1.
DR   Pfam; PF00078; RVT_1; 1.
DR   Pfam; PF00098; zf-CCHC; 1.
DR   SMART; SM00343; ZnF_C2HC; 2.
DR   SUPFAM; SSF47836; SSF47836; 1.
DR   SUPFAM; SSF47943; SSF47943; 1.
DR   SUPFAM; SSF50630; SSF50630; 1.
DR   SUPFAM; SSF53098; SSF53098; 1.
DR   SUPFAM; SSF56672; SSF56672; 1.
DR   SUPFAM; SSF57756; SSF57756; 1.
DR   PROSITE; PS50175; ASP_PROT_RETROV; 1.
DR   PROSITE; PS00141; ASP_PROTEASE; 1.
DR   PROSITE; PS50994; INTEGRASE; 1.
DR   PROSITE; PS51027; INTEGRASE_DBD; 1.
DR   PROSITE; PS50879; RNASE_H_1; 1.
DR   PROSITE; PS50878; RT_POL; 1.
DR   PROSITE; PS50158; ZF_CCHC; 1.
PE   3: Inferred from homology;
KW   Aspartyl protease; Capsid protein; DNA integration; DNA recombination;
KW   DNA-binding; Endonuclease;
KW   Eukaryotic host gene expression shutoff by virus;
KW   Eukaryotic host translation shutoff by virus;
KW   Host gene expression shutoff by virus; Host-virus interaction; Hydrolase;
KW   Lipoprotein; Magnesium; Metal-binding; Multifunctional enzyme; Myristate;
KW   Nuclease; Nucleotidyltransferase; Protease; Repeat;
KW   Ribosomal frameshifting; RNA-directed DNA polymerase; Transferase;
KW   Viral genome integration; Viral nucleoprotein; Virion;
KW   Virus entry into host cell; Zinc; Zinc-finger.
FT   INIT_MET        1
FT                   /note="Removed; by host"
FT                   /evidence="ECO:0000250"
FT   CHAIN           2..1440
FT                   /note="Gag-Pro-Pol polyprotein"
FT                   /evidence="ECO:0000250"
FT                   /id="PRO_0000260472"
FT   CHAIN           2..123
FT                   /note="Matrix protein p19"
FT                   /evidence="ECO:0000250"
FT                   /id="PRO_0000260473"
FT   CHAIN           124..337
FT                   /note="Capsid protein p24"
FT                   /evidence="ECO:0000250"
FT                   /id="PRO_0000260474"
FT   CHAIN           338..430
FT                   /note="Nucleocapsid protein p15-pro"
FT                   /evidence="ECO:0000250"
FT                   /id="PRO_0000260475"
FT   CHAIN           431..553
FT                   /note="Protease"
FT                   /evidence="ECO:0000250"
FT                   /id="PRO_0000260476"
FT   PEPTIDE         554..561
FT                   /note="p1"
FT                   /evidence="ECO:0000250"
FT                   /id="PRO_0000260477"
FT   CHAIN           562..1145
FT                   /note="Reverse transcriptase/ribonuclease H"
FT                   /evidence="ECO:0000250"
FT                   /id="PRO_0000260478"
FT   CHAIN           1146..1440
FT                   /note="Integrase"
FT                   /evidence="ECO:0000250"
FT                   /id="PRO_0000260479"
FT   DOMAIN          457..535
FT                   /note="Peptidase A2"
FT                   /evidence="ECO:0000255|PROSITE-ProRule:PRU00275"
FT   DOMAIN          593..783
FT                   /note="Reverse transcriptase"
FT                   /evidence="ECO:0000255|PROSITE-ProRule:PRU00405"
FT   DOMAIN          1010..1143
FT                   /note="RNase H type-1"
FT                   /evidence="ECO:0000255|PROSITE-ProRule:PRU00408"
FT   DOMAIN          1197..1366
FT                   /note="Integrase catalytic"
FT                   /evidence="ECO:0000255|PROSITE-ProRule:PRU00457"
FT   ZN_FING         349..366
FT                   /note="CCHC-type 1"
FT                   /evidence="ECO:0000255|PROSITE-ProRule:PRU00047"
FT   ZN_FING         372..389
FT                   /note="CCHC-type 2"
FT                   /evidence="ECO:0000255|PROSITE-ProRule:PRU00047"
FT   DNA_BIND        1371..1420
FT                   /note="Integrase-type"
FT                   /evidence="ECO:0000255|PROSITE-ProRule:PRU00506"
FT   REGION          93..117
FT                   /note="Disordered"
FT                   /evidence="ECO:0000256|SAM:MobiDB-lite"
FT   MOTIF           98..101
FT                   /note="PTAP/PSAP motif"
FT   MOTIF           109..112
FT                   /note="PPXY motif"
FT   COMPBIAS        95..117
FT                   /note="Pro residues"
FT                   /evidence="ECO:0000256|SAM:MobiDB-lite"
FT   ACT_SITE        462
FT                   /note="For protease activity; shared with dimeric partner"
FT                   /evidence="ECO:0000255|PROSITE-ProRule:PRU10094"
FT   BINDING         659
FT                   /ligand="Mg(2+)"
FT                   /ligand_id="ChEBI:CHEBI:18420"
FT                   /ligand_label="1"
FT                   /ligand_note="catalytic; for reverse transcriptase
FT                   activity"
FT                   /evidence="ECO:0000250"
FT   BINDING         734
FT                   /ligand="Mg(2+)"
FT                   /ligand_id="ChEBI:CHEBI:18420"
FT                   /ligand_label="1"
FT                   /ligand_note="catalytic; for reverse transcriptase
FT                   activity"
FT                   /evidence="ECO:0000250"
FT   BINDING         735
FT                   /ligand="Mg(2+)"
FT                   /ligand_id="ChEBI:CHEBI:18420"
FT                   /ligand_label="1"
FT                   /ligand_note="catalytic; for reverse transcriptase
FT                   activity"
FT                   /evidence="ECO:0000250"
FT   BINDING         1019
FT                   /ligand="Mg(2+)"
FT                   /ligand_id="ChEBI:CHEBI:18420"
FT                   /ligand_label="2"
FT                   /ligand_note="catalytic; for RNase H activity"
FT                   /evidence="ECO:0000250"
FT   BINDING         1052
FT                   /ligand="Mg(2+)"
FT                   /ligand_id="ChEBI:CHEBI:18420"
FT                   /ligand_label="2"
FT                   /ligand_note="catalytic; for RNase H activity"
FT                   /evidence="ECO:0000250"
FT   BINDING         1074
FT                   /ligand="Mg(2+)"
FT                   /ligand_id="ChEBI:CHEBI:18420"
FT                   /ligand_label="2"
FT                   /ligand_note="catalytic; for RNase H activity"
FT                   /evidence="ECO:0000250"
FT   BINDING         1135
FT                   /ligand="Mg(2+)"
FT                   /ligand_id="ChEBI:CHEBI:18420"
FT                   /ligand_label="2"
FT                   /ligand_note="catalytic; for RNase H activity"
FT                   /evidence="ECO:0000250"
FT   BINDING         1208
FT                   /ligand="Mg(2+)"
FT                   /ligand_id="ChEBI:CHEBI:18420"
FT                   /ligand_label="3"
FT                   /ligand_note="catalytic; for integrase activity"
FT                   /evidence="ECO:0000250"
FT   BINDING         1265
FT                   /ligand="Mg(2+)"
FT                   /ligand_id="ChEBI:CHEBI:18420"
FT                   /ligand_label="3"
FT                   /ligand_note="catalytic; for integrase activity"
FT                   /evidence="ECO:0000250"
FT   SITE            123..124
FT                   /note="Cleavage; by viral protease"
FT                   /evidence="ECO:0000250"
FT   SITE            337..338
FT                   /note="Cleavage; by viral protease"
FT                   /evidence="ECO:0000250"
FT   SITE            430..431
FT                   /note="Cleavage; by viral protease"
FT                   /evidence="ECO:0000250"
FT   SITE            553..554
FT                   /note="Cleavage; by viral protease"
FT                   /evidence="ECO:0000250"
FT   SITE            561..562
FT                   /note="Cleavage; by viral protease"
FT                   /evidence="ECO:0000250"
FT   SITE            1145..1146
FT                   /note="Cleavage; by viral protease"
FT                   /evidence="ECO:0000250"
FT   LIPID           2
FT                   /note="N-myristoyl glycine; by host"
FT                   /evidence="ECO:0000250"
SQ   SEQUENCE   1440 AA;  159918 MW;  4548C129A2C689EC CRC64;
     MGKTYSSPVN PIPKAPKGLA IHHWLNFLQA AYRLQPGPSE FDFHQLRKFL KLAIKTPVWL
     NPINYSVLAR LIPKNYPGRV HEIVAILIQE TPAREAPPSA PPADDPQKPP PYPEHAQVEP
     QCLPVLHPHG APATHRPWQM KDLQAIKQEV SSSAPGSPQF MQTVRLAVQQ FDPTAKDLHD
     LLQYLCSSLV ASLHHQQLET LIAQAETQGI TGYNPLAGPL RVQANNPNQQ GLRREYQNLW
     LSAFSALPGN TKDPTWAAIL QGPEEPFCSF VERLNVALDN GLPEGTPKDP ILRSLAYSNA
     NKECQKLLQA RGQTNSPLGE MLRACQTWTP RDKNKILMIQ PKKTPPPNQP CFRCGQAGHW
     SRDCKQPRPP PGPCPLCQDP AHWKQDCPQL KADTKGSEDL LLDLPCEASH VRERKNLLRG
     GGLTSPRTIL PLIPLSQQRQ PILHVQVSFS NTSPVGVQAL LDTGADITVL PAYLCPPDSN
     LQDTTVLGAG GPSTSKFKIL PRPVHIHLPF RKQPVTLTSC LIDTNDQWTI LGRDALQQCQ
     SSLYLADQPS SVLPVQTPKL IGLEHLPPPP EVSQFPLNPE RLQALTDLVS RALEAKHIEP
     YQGPGNNPIF PVKKPNGKWR FIHDLRATNS LTRDLASPSP GPPDLTSLPQ DLPHLRTIDL
     TDAFFQIPLP AVFQPYFAFT LPQPNNHGPG TRYSWRVLPQ GFKNSPTLFE QQLSHILAPV
     RKAFPNSLII QYMDDILLAS PALRELTALT DKVTNALTKE GLPMSLEKTQ ATPGSIHFLG
     QVISPDCITY ETLPSIHVKS IWSLAELQSM LGELQWVSKG TPVLRSSLHQ LYLALRGHRD
     PRDTIELTST QVQALKTIQK ALALNCRSRL VSQLPILALI ILRPTGTTAV LFQTKQKWPL
     VWLHTPHPAT SLRPWGQLLA NAIITLDKYS LQHYGQICKS FHHNISNQAL TYYLHTSDQS
     SVAILLQHSH RFHNLGAQPS GPWRSLLQVP QIFQNIDVLR PPFIISPVVI DHAPCLFSDG
     ATSKAAFILW DKQVIHQQVL PLPSTCSAQA GELFGLLAGL QKSKPWPALN IFLDSKFLIG
     HLRRMALGAF LGPSTQCDLH ARLFPLLQGK TVYVHHVRSH TLLQDPISRL NEATDALMLA
     PLLPLNPTTL HQITHCNPHA LRNHGATASE AHAIVQACHT CKVINPQGRL PQGYIRRGHA
     PNVIWQGDVT HLHYKRYKYC LLVWVDTYSG VVSVSCRRKE TGSDCVVSLL AAISILGKPH
     SINTDNGTAY LSQEFQQFCS SLSIKHSTHV PYNPTSSGLV ERTNGILKTL ISKYLLDNHH
     LPLETAISKS LWTINHLNVL PSCQKTRWQL HQAQPLPSIP ENTLPPRASP KWYYYKIPGL
     TNPRWSGPVQ SLKEAAGAAL IPVGGSHLWI PWRLLKRGIC PRPESNAVAD PETKDHQLHG