CO1A2_MAMAE
ID CO1A2_MAMAE Reviewed; 1040 AA.
AC P85154;
DT 12-JUN-2007, integrated into UniProtKB/Swiss-Prot.
DT 22-FEB-2012, sequence version 3.
DT 25-MAY-2022, entry version 40.
DE RecName: Full=Collagen alpha-2(I) chain;
DE AltName: Full=Alpha-2 type I collagen;
OS Mammut americanum (American mastodon).
OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;
OC Eutheria; Afrotheria; Proboscidea; Elephantidae; Mammut.
OX NCBI_TaxID=39053;
RN [1]
RP PROTEIN SEQUENCE.
RC TISSUE=Bone;
RX PubMed=22021854; DOI=10.1126/science.1207663;
RA Waters M.R., Stafford T.W. Jr., McDonald H.G., Gustafson C., Rasmussen M.,
RA Cappellini E., Olsen J.V., Szklarczyk D., Jensen L.J., Gilbert M.T.,
RA Willerslev E.;
RT "Pre-Clovis mastodon hunting 13,800 years ago at the Manis site,
RT Washington.";
RL Science 334:351-353(2011).
RN [2]
RP PROTEIN SEQUENCE OF 38-86; 120-185; 249-419; 428-455; 465-532; 572-698;
RP 716-767; 782-917 AND 975-1001.
RX PubMed=19407199; DOI=10.1126/science.1165069;
RA Schweitzer M.H., Zheng W., Organ C.L., Avci R., Suo Z., Freimark L.M.,
RA Lebleu V.S., Duncan M.B., Vander Heiden M.G., Neveu J.M., Lane W.S.,
RA Cottrell J.S., Horner J.R., Cantley L.C., Kalluri R., Asara J.M.;
RT "Biomolecular characterization and protein sequences of the Campanian
RT hadrosaur B. canadensis.";
RL Science 324:626-631(2009).
RN [3]
RP PROTEIN SEQUENCE OF 156-185; 276-302; 398-419; 465-509; 597-629; 699-736;
RP 752-767; 804-827; 896-917 AND 975-985, IDENTIFICATION BY MASS SPECTROMETRY,
RP AND HYDROXYLATION AT PRO-176; PRO-182; PRO-284; PRO-293; PRO-410; PRO-473;
RP PRO-479; PRO-491; PRO-497; PRO-503; PRO-599; PRO-605; PRO-704; PRO-710;
RP PRO-728; PRO-731; PRO-761; PRO-806; PRO-815; PRO-824 AND PRO-902.
RC TISSUE=Bone;
RA Asara J.M.;
RL Submitted (SEP-2007) to UniProtKB.
CC -!- FUNCTION: Type I collagen is a member of group I collagen (fibrillar
CC forming collagen). {ECO:0000305}.
CC -!- SUBUNIT: Trimers of one alpha 2(I) and two alpha 1(I) chains.
CC {ECO:0000305}.
CC -!- SUBCELLULAR LOCATION: Secreted, extracellular space, extracellular
CC matrix {ECO:0000250}.
CC -!- TISSUE SPECIFICITY: Forms the fibrils of tendon, ligaments and bones.
CC In bones, the fibrils are mineralized with calcium hydroxyapatite.
CC {ECO:0000305}.
CC -!- PTM: Prolines at the third position of the tripeptide repeating unit
CC (G-X-Y) are hydroxylated in some or all of the chains.
CC {ECO:0000269|Ref.3, ECO:0000305}.
CC -!- MISCELLANEOUS: This protein sequence was reconstructed from 13,800 and
CC 160,000 to 600,000 year old bones. The tryptic peptides required
CC multiple purification steps in order to eliminate contaminants and to
CC increase the concentration of peptidic material.
CC -!- SIMILARITY: Belongs to the fibrillar collagen family. {ECO:0000255}.
CC ---------------------------------------------------------------------------
CC Copyrighted by the UniProt Consortium, see https://www.uniprot.org/terms
CC Distributed under the Creative Commons Attribution (CC BY 4.0) License
CC ---------------------------------------------------------------------------
DR AlphaFoldDB; P85154; -.
DR PRIDE; P85154; -.
DR GO; GO:0005581; C:collagen trimer; IEA:UniProtKB-KW.
DR GO; GO:0005576; C:extracellular region; IEA:UniProtKB-KW.
DR InterPro; IPR008160; Collagen.
DR Pfam; PF01391; Collagen; 9.
PE 1: Evidence at protein level;
KW Collagen; Direct protein sequencing; Extinct organism protein;
KW Extracellular matrix; Glycoprotein; Hydroxylation;
KW Pyrrolidone carboxylic acid; Repeat; Secreted.
FT CHAIN 1..1040
FT /note="Collagen alpha-2(I) chain"
FT /id="PRO_0000291375"
FT DOMAIN 12..70
FT /note="Collagen-like 1"
FT DOMAIN 69..126
FT /note="Collagen-like 2"
FT DOMAIN 337..387
FT /note="Collagen-like 3"
FT DOMAIN 390..448
FT /note="Collagen-like 4"
FT DOMAIN 438..493
FT /note="Collagen-like 5"
FT DOMAIN 525..582
FT /note="Collagen-like 6"
FT DOMAIN 636..694
FT /note="Collagen-like 7"
FT DOMAIN 822..873
FT /note="Collagen-like 8"
FT DOMAIN 969..1026
FT /note="Collagen-like 9"
FT REGION 1..228
FT /note="Disordered"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT REGION 244..1040
FT /note="Disordered"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 16..44
FT /note="Pro residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 169..183
FT /note="Pro residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 1009..1025
FT /note="Pro residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT MOD_RES 1
FT /note="Pyrrolidone carboxylic acid"
FT /evidence="ECO:0000250|UniProtKB:P02465"
FT MOD_RES 5
FT /note="Allysine"
FT /evidence="ECO:0000250|UniProtKB:P08123"
FT MOD_RES 98
FT /note="5-hydroxylysine; alternate"
FT /evidence="ECO:0000250|UniProtKB:P08123"
FT MOD_RES 176
FT /note="4-hydroxyproline"
FT /evidence="ECO:0000269|Ref.3"
FT MOD_RES 182
FT /note="4-hydroxyproline"
FT /evidence="ECO:0000269|Ref.3"
FT MOD_RES 284
FT /note="4-hydroxyproline"
FT /evidence="ECO:0000269|Ref.3"
FT MOD_RES 293
FT /note="4-hydroxyproline"
FT /evidence="ECO:0000269|Ref.3"
FT MOD_RES 410
FT /note="4-hydroxyproline"
FT /evidence="ECO:0000269|Ref.3"
FT MOD_RES 473
FT /note="4-hydroxyproline"
FT /evidence="ECO:0000269|Ref.3"
FT MOD_RES 479
FT /note="4-hydroxyproline"
FT /evidence="ECO:0000269|Ref.3"
FT MOD_RES 491
FT /note="4-hydroxyproline"
FT /evidence="ECO:0000269|Ref.3"
FT MOD_RES 497
FT /note="4-hydroxyproline"
FT /evidence="ECO:0000269|Ref.3"
FT MOD_RES 503
FT /note="4-hydroxyproline"
FT /evidence="ECO:0000269|Ref.3"
FT MOD_RES 599
FT /note="4-hydroxyproline"
FT /evidence="ECO:0000269|Ref.3"
FT MOD_RES 605
FT /note="4-hydroxyproline"
FT /evidence="ECO:0000269|Ref.3"
FT MOD_RES 704
FT /note="4-hydroxyproline"
FT /evidence="ECO:0000269|Ref.3"
FT MOD_RES 710
FT /note="4-hydroxyproline"
FT /evidence="ECO:0000269|Ref.3"
FT MOD_RES 728
FT /note="4-hydroxyproline"
FT /evidence="ECO:0000269|Ref.3"
FT MOD_RES 731
FT /note="4-hydroxyproline"
FT /evidence="ECO:0000269|Ref.3"
FT MOD_RES 761
FT /note="4-hydroxyproline"
FT /evidence="ECO:0000269|Ref.3"
FT MOD_RES 806
FT /note="4-hydroxyproline"
FT /evidence="ECO:0000269|Ref.3"
FT MOD_RES 815
FT /note="4-hydroxyproline"
FT /evidence="ECO:0000269|Ref.3"
FT MOD_RES 824
FT /note="4-hydroxyproline"
FT /evidence="ECO:0000269|Ref.3"
FT MOD_RES 902
FT /note="4-hydroxyproline"
FT /evidence="ECO:0000269|Ref.3"
FT CARBOHYD 98
FT /note="O-linked (Gal...) hydroxylysine; alternate"
FT /evidence="ECO:0000250|UniProtKB:P08123"
FT UNSURE 9
FT /note="L or I"
FT UNSURE 16
FT /note="L or I"
FT UNSURE 94
FT /note="L or I"
FT UNSURE 100
FT /note="I or L"
FT UNSURE 106
FT /note="L or I"
FT UNSURE 109
FT /note="L or I"
FT UNSURE 134
FT /note="I or L"
FT UNSURE 139
FT /note="L or I"
FT UNSURE 170
FT /note="I or L"
FT UNSURE 188
FT /note="I or L"
FT UNSURE 208
FT /note="L or I"
FT UNSURE 226
FT /note="L or I"
FT UNSURE 235
FT /note="L or I"
FT UNSURE 244
FT /note="L or I"
FT UNSURE 250
FT /note="I or L"
FT UNSURE 265
FT /note="I or L"
FT UNSURE 319
FT /note="L or I"
FT UNSURE 328
FT /note="L or I"
FT UNSURE 373
FT /note="L or I"
FT UNSURE 391
FT /note="L or I"
FT UNSURE 394
FT /note="I or L"
FT UNSURE 401
FT /note="I or L"
FT UNSURE 413
FT /note="I or L"
FT UNSURE 436
FT /note="L or I"
FT UNSURE 457
FT /note="L or I"
FT UNSURE 478
FT /note="L or I"
FT UNSURE 496
FT /note="I or L"
FT UNSURE 502
FT /note="L or I"
FT UNSURE 527
FT /note="I or L"
FT UNSURE 562
FT /note="L or I"
FT UNSURE 571
FT /note="I or L"
FT UNSURE 583
FT /note="L or I"
FT UNSURE 724
FT /note="I or L"
FT UNSURE 739
FT /note="L or I"
FT UNSURE 787
FT /note="I or L"
FT UNSURE 788
FT /note="L or I"
FT UNSURE 793
FT /note="I or L"
FT UNSURE 794
FT /note="L or I"
FT UNSURE 796
FT /note="L or I"
FT UNSURE 805
FT /note="L or I"
FT UNSURE 818
FT /note="L or I"
FT UNSURE 820
FT /note="I or L"
FT UNSURE 862
FT /note="L or I"
FT UNSURE 890
FT /note="L or I"
FT UNSURE 940
FT /note="L or I"
FT UNSURE 949
FT /note="L or I"
FT UNSURE 952
FT /note="L or I"
FT UNSURE 955
FT /note="L or I"
FT CONFLICT 169
FT /note="P -> L (in Ref. 2; AA sequence)"
FT /evidence="ECO:0000305"
FT CONFLICT 170
FT /note="I -> N (in Ref. 3; AA sequence)"
FT /evidence="ECO:0000305"
FT CONFLICT 175
FT /note="P -> S (in Ref. 2; AA sequence)"
FT /evidence="ECO:0000305"
FT CONFLICT 179..184
FT /note="PGAPGP -> SGSPGL (in Ref. 2; AA sequence)"
FT /evidence="ECO:0000305"
FT CONFLICT 251
FT /note="P -> D (in Ref. 2; AA sequence)"
FT /evidence="ECO:0000305"
FT CONFLICT 265
FT /note="I -> L (in Ref. 2; AA sequence)"
FT /evidence="ECO:0000305"
FT CONFLICT 304..305
FT /note="PN -> SS (in Ref. 2; AA sequence)"
FT /evidence="ECO:0000305"
FT CONFLICT 319
FT /note="L -> W (in Ref. 2; AA sequence)"
FT /evidence="ECO:0000305"
FT CONFLICT 496
FT /note="I -> L (in Ref. 3; AA sequence)"
FT /evidence="ECO:0000305"
FT CONFLICT 619
FT /note="P -> A (in Ref. 3; AA sequence)"
FT /evidence="ECO:0000305"
FT CONFLICT 622
FT /note="S -> P (in Ref. 3; AA sequence)"
FT /evidence="ECO:0000305"
FT CONFLICT 722
FT /note="A -> S (in Ref. 3; AA sequence)"
FT /evidence="ECO:0000305"
FT CONFLICT 751
FT /note="R -> N (in Ref. 2; AA sequence)"
FT /evidence="ECO:0000305"
FT CONFLICT 811
FT /note="A -> S (in Ref. 3; AA sequence)"
FT /evidence="ECO:0000305"
FT CONFLICT 814..815
FT /note="EP -> GL (in Ref. 2; AA sequence)"
FT /evidence="ECO:0000305"
FT CONFLICT 853
FT /note="S -> N (in Ref. 2; AA sequence)"
FT /evidence="ECO:0000305"
SQ SEQUENCE 1040 AA; 93184 MW; 362B97BFA3246C30 CRC64;
QYDAKGVGLG PGPMGLMGPR GPPGATGPPG SPGFQGPPGE PGEPGQTGPA GSRGPAGPPG
KAGEDGHPGK PGRPGERGVV GPQGARGFPG TPGLPGFKGI RGHNGLDGLK GQPGAPGVKG
EPGAPGENGT PGQIGARGLP GERGRVGGPG PAGARGSDGS VGPVGPAGPI GSAGPPGFPG
APGPKGEIGP VGNPGPSGPA GPRGEAGLPG VSGPVGPPGN PGANGLAGAK GAAGLPGVAG
APGLPGPRGI PGPVGAAGAT GARGIVGEPG PAGSKGESGS KGEPGSAGPQ GPPGPSGEEG
KRGPNGEAGS AGPAGPPGLR GGPGSRGLPG ADGRAGVMGP PGSRGASGPA GVRGPSGDSG
RPGEPGVMGP RGLPGSPGNV GPAGKEGPAG LPGIDGRPGP IGPAGARGEP GNIGFPGPKG
PAGDPGKNGD KGHAGLAGPR GAPGPDGNNG AQGPPGLQGV QGGKGEQGPA GPPGFQGLPG
PSGTAGEAGK PGERGIPGEF GLPGPAGPRG ERGPPGQSGA AGPTGPIGSR GPSGPPGPDG
NKGEPGVVGA PGTAGPSGPV GLPGERGAAG IPGGKGEKGE TGLRGDTGNT GRDGARGAPG
AVGAPGPAGA TGDRGEAGPA GSAGPAGPRG SPGERGEVGP AGPNGFAGPA GAAGQAGAKG
ERGTKGPKGE NGPVGPTGPV GAAGPAGPNG PPGPAGSRGD GGPPGATGFP GAAGRTGPPG
PAGITGPPGP PGAAGKEGLR GPRGDQGPVG RTGETGASGP PGFAGEKGSS GEPGTAGPPG
APGPQGILGP PGILGLPGSR GERGLPGVAG AVGEPGPLGI AGPPGARGPP GAVGSPGVNG
APGEAGRDGN PGSDGPPGRD GLPGHKGERG YPGNAGPVGT AGAPGPQGPL GPAGKHGNRG
EPGPAGSVGP VGAVGPRGPS GPQGARGDKG EAGDKGPRGL PGFKGHNGLQ GLPGLAGQHG
DQGSPGSVGP AGPRGPAGPS GPVGKDGRPG HAGAVGPAGV RGSQGSQGPS GPPGPPGPPG
PPGPSGGGYD FGYDGDFYRA