C6orf47
C6ORF47
Gene
[edit]C6orf47 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | C6orf47, D6S53E, G4, NG34, chromosome 6 open reading frame 47 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 90673; HomoloGene: 75155; GeneCards: C6orf47; OMA:C6orf47 - orthologs | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
General Information
[edit]In humans,Chromosome 6 open reading frame 47, C6ORF47, is a single exon gene that spans 2481 nucleotides that encodes for a 294 amino acid protein.[5] [6]
Location
[edit]In humans, this gene is located on the minus strand at 6p21.33.[7]
Gene Expression
[edit]Tissue expression in human C6ORF47 was found to ubiquitously expressed throughout all tissues. C6ORF47 gene is also seen to be over-expressed in the colon, urinary bladder, ovary, and pancreas.[7] NCBI GEO Profiles shows that C6ORF47 RNA is expressed ubiquitously varying from low expression (in most tissues) to high expression in a couple of areas like the Salivary Gland and Cerebellum.[9]
Research by Pontus Boström et al. looked into C6ORF47 mRNA expression using microarray data from macrophages from 4 healthy donors. The goal of this study was to investigate whether or not hypoxia can influence the accumulation of lipids in macrophages. These results would help identify whether or not the macrophages loaded with lipids in the atherosclerotic lesions are there because of the hypoxic regions. Human macrophages were exposed to hypoxia for 24 hours and showed an increased formation of cytosolic lipid droplets and increased tri-glyceride accumulation. Results showed that the hypoxic regions in the atherosclerotic lesions could contribute to forming lipid-loaded macrophages and accumulating triglycerides.8 As we can see below, expression of C6ORF47 shows that expression is almost 6 times greater in the non-hypoxic region than in the hypoxic regions, showing that C6ORF47 is likely not contributing to either the lipid accumulation or an essential process since expression decreased. Once put under hypoxic conditions, only essential processes are left on likely hence when C6ORF47 expression decreased.[10]
Transcription Factors
[edit]Below is a short list of transcription factors binding to the promoter region, contains 5' UTR and 500 nucleotides upstream. Bioline[11] software was utilized for the double-stranded DNA seqeunce. UCSC genome browers[12] was used for transcription factors and binding sites providing the information of the transcription factors that bind listed below in the table (click show button below).
Transcription Factor | Generalized Function |
---|---|
KLF17, Krüppel-like factor 17 | regulates gene expression, influencing cell differentiation and development. |
PROX1, Prospero homeobox 1 | Regulates lymphatic development, cell differentiation, and organogenesis processes. |
WT1, Wilms' tumor 1 | Regulates kidney development, cell growth, and tissue differentiation processes |
GATA1, GATA binding protein 1 | Controls red blood cell development and regulates hematopoiesis processes. |
THRB, Thyroid hormone receptor beta | Regulates thyroid hormone signaling, influencing metabolism and growth regulation. |
ZNF454, Zinc Finger Protein 454 | Regulates gene expression, potentially influencing cell differentiation and development. |
SP9, Specificity Protein 9 | Regulates cartilage development and skeletal patterning during embryogenesis. |
EGR3, Early Growth Response 3 | Regulates gene expression involved in neuronal activity and immune response. |
SOX4, SRY-box transcription factor 4 | Regulates cell fate, development, and differentiation in multiple tissues. |
EBF1, Early B-cell Factor 1 | Regulates B cell differentiation and immune system development. |
ZNF669, Zinc Finger Protein 669 | Regulates gene expression, potentially involved in development and differentiation. |
KLF1 Krüppel-like factor 1 | Regulates red blood cell development and hemoglobin expression. |
STAT3, Signal Transducer and Activator of Transcription 3 | Regulates immune response, cell survival, and inflammation processes. |
ZIC3, Zinc Finger of the Cerebellum 3 | Regulates brain and heart development, influencing neuronal patterning and function. |
NHLH2, Nighthawk-like Protein 2 | Regulates neural differentiation and development, influencing nervous system patterning. |
ZNF454, Zinc Finger Protein 454 | Involved in transcriptional regulation, potentially affecting gene expression and development. |
EBF2, Early B-cell Factor 2 | Regulates adipocyte differentiation and energy metabolism, influencing fat tissue development. |
ZNF42, Zinc Finger Protein 42 | Involved in regulating gene expression and cellular differentiation processes. |
ERF::FIGLA, ETS2 Repressor Factor and Factor of Germline Alpha | Transcription factor complex that regulates ovarian development and folliculogenesis. |
Single-Nucleotide-Polymorphisms (SNPs)
[edit]SNPs | Position | Base Change | Amino Acid Change | Mutation Type | Significance | Clinical Significance |
Rs963273525 | Amino Acid 1 | TC | MetVal | Missense | In start codon (CDS) | N/A |
Rs1800736098 | Base pair 8 | CA | N/A | Transversion mutation | Conserved Transcription binding region (NHLH2) in 5’ UTR that is conserved between all orthologs tested | N/A |
Rs1296872402 | Base pair 2425 | TG | N/A | Transversion mutation | PolyA signal (3’ UTR) that is conserved in all orthologs tested | N/A |
This table above illustrates 3 SNPs that occur within the CDS, 5' UTR, and 3' UTR. These SNPs were found using Variation Viewer[13] These SNPs were chosen due to location within C6ORF47 gene. Variation Viewer showed no pathogenic SNPs and only large deletions that include copious gene.
Protein
[edit]Basic Information
[edit]- EMBL-EBI-SAPS[14] found the human C6ORF47 protein to have a isoelectric point of 5.95.
- C6ORF47 protein was shown to be slightly more abundant than half of the proteins present in the human body.[15]
Family
[edit]The C6ORF47 protein belongs to the family of proteins referred to as MHC proteins (Major histocompatibility complex) which is a band on the short arm of chromosome located at 6p21.3 that spans 3.6 megabases. [16]The generalized function of MHC molecules is to bind peptide fragments that are from pathogens and display them on the surface of the cell for recognition by T cells.[17] C6ORF47 protein is considered to be part of the MHC Class III protein.[18] MHC class III proteins are noted to be poorly defined structurally and functionally. It is noted that the MHC Class III genes contain cytokines and heat shock proteins within this region. It was recently found that genes encoded in the telomeric region on the MHC class III and appears to be involved in specific and global inflammatory responses.[19]
Primary
[edit]Human C6ORF47 mRNA encodes for a 294 amino acid protein. SAPS also showed that the protein had shown enrichment of leucine, proline, and glycine in C6ORF47 protein compared to other human proteins.[14] It had also shown that a significantly lower amount of isoleucine as well as lower valine, tyrosine, threonine, phenylalanine, and asparagine than normal in the C6ORF47 protein when compared to other human proteins. Repeats of leucine residues spaced seven amino acids apart in the basic leucine zipper (as shown in blue text in the conceptual translation below on right) and was found to be conserved in mammalian orthologs of the C6ORF47 protein via Motif Scan. [20]
Secondary
[edit]PredictProtein[21] predicted that the secondary structure of the human C6ORF47 protein was 35.4% helix, 2.4% strand, and 62.2% loop.
Tertiary
[edit]PSORT II prediction tool[23] showed three transmembrane segments in amino acids 182-198, 222-238, and 246-262 of the human C6ORF47 protein.
It is also important to note that all of the mammalian orthologs presented show quite similar transmembrane regions (close in A.A sequence locations) besides the platypus (See table below for all Mammalian ortholgos used).
Due to other C6ORF47 orthologs mainly being much shorter than the mammalian sequences, the predicted cleavage site is usually slightly higher, while the transmembrane segments vary depending on the length of protein sequences. 1-2 transmembrane segments were found in reptiles, one of the two amphibians, and one fish ortholog, but it is by far still most popular to have 3 transmembrane segments in orthologs.
PSORT II[23] showed that the C6ORF47 protein is predicted to be localized in the endoplasmic reticulum (55.6%). DeepLoc[24] software further supports the idea that the C6ORF47 protein is localized to the endoplasmic reticulum, showing that there is about an 86.12% chance that it is localized there. It also supports the idea previous finding by PSORT II prediction and SOSUI about human C6ORF47 protein being a transmembrane protein (93.6% chance).
Post-Translational Modifications
[edit]Phosphorylation sites were experimentally proven on amino acids 34, 35, 71, and 90 in the human C6ORF47 protein via NCBI.[6] Sites 34 and 35 are predicted to be phosphorylated by Casein Kinase II.[20]
Endoplasmic Reticulum (ER) signals ensure the protein remains in the endoplasmic reticulum, aiding proper folding, quality control, and trafficking.
Sumoylation attaches SUMO proteins to targets, regulating nuclear transport, transcription, DNA repair, and protein stability. Sumolyation was found at amino acids 75, 114, and 147.[25]
O-linked β-N-acetylglucosamine modifies serine/threonine residues, regulating signaling, transcription, and protein-protein interactions dynamically and was found to be at amino acid 60.[26]
Interactions
[edit]FGFR3: An interaction of C6ORF47 and FGFR3 was found via a two-hybrid assay with an average detection confidence of medium. This was found via a BioGRID interaction database that was found in August 2022 during a large-scale dataset being scored individually and all other interactions globally.[7][28]
Fibroblast growth factor receptor 3, FGFR3, is part of the fibroblast growth factor receptor family that shares similar structure and functions. FGFR3 is known to span the membrane with one end remaining within the membrane while the other end projects to the outer surface of the cell.[29] Fibroblast growth factor receptor 3 is known to play an important role in cartilage development in the growth plate. FGFR3, commonly known as fibroblast growth factor receptor 3, is a tyrosine-protein kinase that acts on the cell-surface receptor for fibroblast growth factors and plays an essential role in cell proliferation, angiogenesis, differentiation, and apoptosis.[30] FGFR3 is known to interact with growth factors outside the cell and receive signals that regulate growth and development within the cell. [29]
Homology
[edit]Orthologs
[edit]C6ORF47 gene is estimated to have first appeared approximately 563 million years ago (MYA) in lampreys. C6ORF47 was found in ray-fined fish (actinopterygii), cartilaginous fish, lampreys, and lobe-finned fish (sacropterygii), but no hagfish suggesting that possibly this gene was inserted into lampreys. C6ORF47 is conserved to vertebrates with no traces of it being present before vertebrates as seen by its oldest ancestor lampreys (563 MYA). The C6ORF47 gene evolved quite rapid since it was shown to evolve slightly slower than Fibrinogen Alpha and it much faster than Cytochrome C. Orthologs used for this diagram included Human, House Mouse, African Bush Elephant, Koala , Painted Turtle, Eastern Brown Snake, Iberian ribbed newt, West African Lungfish, Danio rerio, Seven-gill Sharpnose Shark, and Sea Lamprey) (See Time-Calibrated comparative date of divergence diagram located to the down to the right).
Global Alignments with Human C6ORF47 protein with the seven-gill sharpnose shark C6ORF47 protein showed two noticeable large gaps found from human C6ORF47 protein in amino acids 44-62 and 153-173 . These gaps were present in all descendants of vertebrates until rodents and rabbits. The second global alignment with the human C6ORF47 protein and pacific pocket mouse (rodent) C6ORF47 protein shows that these gaps are no longer present indicating a possible insertions of these gaps in the protein in mammals. It is important to note that the pacific pocket mouse C6ORF47 protein was one of the least related sequences within the rodents from the orthologs table and still showed these 2 large gaps being no longer being present when aligned with the human C6ORF47 protein sequence.[31]
Ortholog Table for C6ORF47 Protein[edit] | |||||||||
C6ORF47 | Genus and Species | Common Name | Taxonomic Order | Date of Divergence (MYA) | Acession #[32] | Sequence (aa) | Identity(%) | Similariity (%) | Gaps (%) |
Mammals | Homo sapiens | Humans | Primates | 0 | NP_067007 | 294 | 100 | 100 | 0 |
Perognathus longimembris pacificus | Pacific Pocket Mouse | Rodentia | 87 | XP_048204128 | 293 | 79.3 | 84 | 0.3 | |
Mus musculus | House Mouse | Rodentia | 87 | NP_258438 | 293 | 75.9 | 81.3 | 0.3 | |
Loxodonta africana | African Bush Elephant | Proboscideans | 99 | XP_003422325 | 297 | 74.5 | 81.5 | 1.7 | |
Phascolarctos cinereus | Koala | Diprotodontia | 160 | XP_020829739 | 302 | 54 | 63.8 | 10.8 | |
Vombatus ursinus | Common Wombat | Diprotodontia | 160 | XP_027732497 | 300 | 55 | 66.1 | 6.5 | |
Ornithorhynchus anatinus | Platypus | Monotremata | 180 | XP_028911230 | 241 | 42.3 | 50.2 | 31.2 | |
Reptile | Chrysemys picta bellii | Painted Turtle | Testudines | 319 | XP_005289373 | 199 | 23.4 | 29.4 | 52 |
Terrapene triunguis | Three-toed Box Turtle | Testudines | 319 | XP_024079724 | 174 | 20.4 | 25.1 | 59.9 | |
Anolis sagrei | Brown Anole | Squamata | 319 | XP_060615449 | 217 | 25.9 | 34.5 | 36.7 | |
Pseudonaja textilis | Eastern Brown Snake | Squamata | 319 | XP_026575869 | 212 | 27.4 | 36.6 | 38.9 | |
Amphibians | Xenopus laevis | African Clawed Frog | Anura | 352 | XP_018088740 | 224 | 21.4 | 30.3 | 39.6 |
Pleurodeles waltl | Iberian ribbed newt | Urodela | 352 | KAJ1134448 | 268 | 27.1 | 32.9 | 36.2 | |
Fish | Protopterus annectens | West African Lungfish | Lepidosireniformes | 408 | XP_043939206 | 289 | 27.7 | 39.2 | 24.4 |
Misgurnus anguillicaudatus | Pond Loach | Cypriniformes | 429 | XP_055075080 | 302 | 25.4 | 39 | 27.7 | |
Cirrhinus molitorella | Mud Carp | Cypriniformes | 429 | KAK2887169 | 311 | 24 | 38 | 23.1 | |
Danio rerio | Zebra fish | Cypriniformes | 429 | NP_001410332 | 315 | 22.5 | 35.7 | 28.9 | |
Carcharodon carcharias | Great White Shark | Lamniformes | 462 | XP_041069364 | 250 | 22.5 | 30.4 | 40.9 | |
Heptranchias perlo | Seven-gill Sharpnose Shark | Hexanchiformes | 462 | XP_067830079 | 249 | 25.5 | 37.6 | 27.1 | |
Lethenteron reissneri | Far Eastern Brook Lamprey | Petromyzontiformes | 563 | XP_061406601 | 217 | 22.1 | 29.5 | 36.2 | |
Petromyzon marinus | Sea Lamprey | Petromyzontiformes | 563 | XP_032814877 | 215 | 22.7 | 30.4 | 35.3 |
The Table above illustrates 20 orthologs of C6ORF47 protein. This table shows a couple orthologs from each major class of class of vertebrates except Aves (Agnatha, Chondrichthyes, Osteichthyes, Amphibia, Reptilia, Mammalia). This is because the C6ORF47 gene is conserved in vertebrates. The identity, similarity, and gaps are referring to each of the orthologs protein amino acid contents being compared to the human C6ORF47 protein.
Abbreviation (From MYA Youngest to Oldest) | Common Name |
Hsa | Humans |
Mum | House Mouse |
Phc | Koala |
Ora | Platypus |
Heb (319 MYA) | Bynoes Gecko |
Ans | Brown Anole |
Pst | Eastern Brown Snake |
Xel | African Clawed Frog |
Plw | Iberian ribbed newt |
Pra | West African Lungfish |
Mia | Pond Loach |
Cim | Mud Carp |
Dar | Zebra fish |
Cac | West African Lungfish |
Hst | Pond Loach |
Ebl | Far Eastern Brook Lamprey |
Sel | Sea Lamprey |
Paralogs
[edit]No paralogs were found for the human C6ORF47 gene in humans.[7][33]
Conserved Regions
[edit]The promoter region was found to have many stretched of nucleotides that were conserved across mammalian orthotlogs like transcriptional bindings sites of at least one SP9 spot (just upstream to 5' UTR), NHLH2 and ERF:FIGLA (just just after the start of transcription), ZNF454 (shortly after previous mentioned transcription factor; ~20 nucleotides downstream), EBF1 and EBF2 (~330 basepairs downstream of transcriptional start), NR5A2, ZNF423, STAT3 (all found ~120 basepairs downstream of previous transcription factor mentioned), and ZND42 (found overlaying the start of the coding sequence).
Multiple sequence alignments with C6ORF47 orthologs showed that there were many amino acids on the C-terminal side of the protein that are conserved while there is much less conservation in the N-terminal side. This is likely due to the protein containing a large disordered region on the N-terminal side.
The 3' UTR was found to have 9 conserved areas in it. Listed below in the table is all conserved ares that were found for C6ORF47
miRNA | Position in the UTR | seed match |
Conserved sites in The 3' UTR | ||
hsa-miR-125b-5p | 85-92 | 8mer |
hsa-miR-4319 | 85-92 | 8mer |
hsa-miR-125a-5p | 85-92 | 8mer |
hsa-miR-138-5p | 204-210 | 7mer-m8 |
hsa-miR-24-3p | 438-445 | 8mer |
hsa-miR-137 | 677-684 | 8mer |
hsa-miR-325-3p | 679-685 | 7mer-1A |
hsa-miR-140-5p | 714-720 | 7mer-1A |
hsa-miR-142-3p.1 | 716-722 | 7mer-1A |
References
[edit]- ^ a b c ENSG00000235360, ENSG00000228177, ENSG00000228435, ENSG00000204439, ENSG00000226103, ENSG00000226531 GRCh38: Ensembl release 89: ENSG00000203623, ENSG00000235360, ENSG00000228177, ENSG00000228435, ENSG00000204439, ENSG00000226103, ENSG00000226531 – Ensembl, May 2017
- ^ a b c GRCm38: Ensembl release 89: ENSMUSG00000043311 – Ensembl, May 2017
- ^ "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
- ^ "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
- ^ "Homo sapiens chromosome 6 open reading frame 47 (C6orf47), mRNA". NCBI. 2024-04-04.
- ^ a b c "uncharacterized protein C6orf47 [Homo sapiens]". NCBI Protein. NCBI. Retrieved 26 September 2024.
- ^ a b c d "C6orf47 Gene - Chromosome 6 Open Reading Frame 47". Gene Card The Human Gene Database. Weizmann Institute of Science. Retrieved 26 September 2024.
- ^ "GDS1096 / 204968_at". www.ncbi.nlm.nih.gov. Retrieved 2024-12-15.
- ^ "GDS1096 / 204968_at". www.ncbi.nlm.nih.gov. Retrieved 2024-12-05.
- ^ Boström, Pontus; Magnusson, Björn; Svensson, Per-Arne; Wiklund, Olov; Borén, Jan; Carlsson, Lena M. S.; Ståhlman, Marcus; Olofsson, Sven-Olof; Hultén, Lillemor Mattsson (August 2006). "Hypoxia converts human macrophages into triglyceride-loaded foam cells". Arteriosclerosis, Thrombosis, and Vascular Biology. 26 (8): 1871–1876. doi:10.1161/01.ATV.0000229665.78997.0b. ISSN 1524-4636. PMID 16741148.
- ^ "Six-Frame Translation". www.bioline.com. Retrieved 2024-12-05.
- ^ "UCSC Genome Browser Home". genome.ucsc.edu. Retrieved 2024-12-05.
- ^ "Variation Viewer". www.ncbi.nlm.nih.gov. Retrieved 2024-12-13.
- ^ a b "SAPS". www.ebi.ac.uk. Retrieved 2024-12-05.
- ^ "PaxDb: Protein Abundance Database". pax-db.org. Retrieved 2024-12-14.
- ^ Mungall, A. J.; Palmer, S. A.; Sims, S. K.; Edwards, C. A.; Ashurst, J. L.; Wilming, L.; Jones, M. C.; Horton, R.; Hunt, S. E.; Scott, C. E.; Gilbert, J. G. R.; Clamp, M. E.; Bethel, G.; Milne, S.; Ainscough, R. (October 2003). "The DNA sequence and analysis of human chromosome 6". Nature. 425 (6960): 805–811. Bibcode:2003Natur.425..805M. doi:10.1038/nature02055. ISSN 1476-4687. PMID 14574404.
- ^ Charles A Janeway, Jr; Travers, Paul; Walport, Mark; Shlomchik, Mark J. (2001), "The major histocompatibility complex and its functions", Immunobiology: The Immune System in Health and Disease. 5th edition, Garland Science, retrieved 2024-10-16
- ^ Lehner, Ben; Semple, Jennifer I; Brown, Stephanie E; Counsell, Damian; Campbell, R. Duncan; Sanderson, Christopher M (2004-01-01). "Analysis of a high-throughput yeast two-hybrid system and its use to predict the function of intracellular proteins encoded within the human MHC class III region". Genomics. 83 (1): 153–167. doi:10.1016/S0888-7543(03)00235-0. ISSN 0888-7543. PMID 14667819.
- ^ Gruen, J R; Weissman, S M (2001-08-01). "Human MHC class III and IV genes and disease associations". Frontiers in Bioscience. 6: D960–72. doi:10.2741/gruen. ISSN 1093-9946. PMID 11487469.
- ^ a b "Motif Scan". myhits.sib.swiss. Retrieved 2024-12-05.
- ^ "PredictProtein - Protein Sequence Analysis, Prediction of Structural and Functional Features". predictprotein.org. Retrieved 2024-12-05.
- ^ "I-TASSER results". seq2fun.dcmb.med.umich.edu. Retrieved 2024-12-13.
- ^ a b "PSORT II Prediction". psort.hgc.jp. Retrieved 2024-12-05.
- ^ "DeepLoc 2.1 - DTU Health Tech - Bioinformatic Services". services.healthtech.dtu.dk. Retrieved 2024-12-05.
- ^ "GPS-SUMO: Prediction of SUMOylation Sites & SUMO-interacting Motifs". sumo.biocuckoo.cn. Retrieved 2024-12-14.
- ^ "YinOYang 1.2 - DTU Health Tech - Bioinformatic Services". services.healthtech.dtu.dk. Retrieved 2024-12-14.
- ^ "IBS 2.0: Illustrator for Biological Sequences". ibs.renlab.org. Retrieved 2024-12-04.
- ^ "STRING: functional protein association networks". string-db.org. Retrieved 2024-12-05.
- ^ a b "FGFR3 gene: MedlinePlus Genetics". medlineplus.gov. Retrieved 2024-12-13.
- ^ "STRING: functional protein association networks". string-db.org. Retrieved 2024-12-10.
- ^ "Emboss Needle". www.ebi.ac.uk. Retrieved 2024-12-15.
- ^ "uncharacterized protein C6orf47 [Homo sapiens]". NCBI Protein. NCBI. Retrieved 26 September 2024.
- ^ "Protein BLAST: search protein databases using a protein query". blast.ncbi.nlm.nih.gov. Retrieved 2024-10-16.