Jump to content

Talk:Human genome

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Former good articleHuman genome was one of the Natural sciences good articles, but it has been removed from the list. There are suggestions below for improving the article to meet the good article criteria. Once these issues have been addressed, the article can be renominated. Editors may also seek a reassessment of the decision if they believe there was a mistake.
Article milestones
DateProcessResult
October 1, 2006Good article nomineeListed
September 24, 2009Good article reassessmentDelisted
Current status: Delisted good article

quality

[edit]

according to the url below, a paper from top experts in a top journal (ie highly authoritative) says that there are many many gaps (unsequenced) regions in the human genome imo, the lack of attention paid to these gaps is somewhat misleading for the general public; eg when scientists use the word "complete" it means, per the dictionary, that we have no gap, no missign sequence genome yet this is empirically false http://www.nature.com/nature/journal/vaop/ncurrent/full/nature13907.html

Number of genes

[edit]

We don't know the exact number of genes in the human genome. There are at least 19,000 protein-coding genes and no more than 20,000. The number of noncoding genes is all over the map. There are only a few thousand (at most!) that have been confirmed as real genes but there are many more transcribed regions that some scientists call genes even though they have not been shown to have a function. This is a problem since the definition of a gene requires that it produce a FUNCTIONAL product.

The number of noncoding genes predicted by Ensembl is down to 25,967 in the latest annotation (CRCh38.13) but most of those are stretches of DNA that produce lncRNAs that have not been shown to have a function.

The issue was raised last year (see Noncoding DNA above) but not resolved. We need to present a consistent view of the number of genes in this article and we need to point out the controversy over the number of noncoding genes. Genome42 (talk) 14:41, 26 June 2023 (UTC)[reply]

Chromosome table

[edit]

The data in the large table on individual chromosomes is ten years old. Not only does it contain outdated information and unfounded speculation but it's also far too complicated and specific for a Wikipedia article.

I think we should remove it. What do others think? Genome42 (talk) 14:49, 26 June 2023 (UTC)[reply]

Reorganization

[edit]

I suggest that we reorganize the entire article to eliminate redundancy and conflicting information. I think we should begin by moving some information out of the introduction and putting it into the main body of the article.

I think the main subsections should be:

          Size of the human genome
          Content of the human genome
          Sequencing the human genome
          Information content (in bits)
          Genomic variation in humans
          Personal genomics
          Human genetic disorders (including gene knockouts)
          Evolution
          Mitochondrial DNA
          Epigenome
Genome42 (talk) 18:12, 6 July 2023 (UTC)[reply]

The section on content could have the following subsections:
          Protein-coding genes
          Non-coding genes
          Regulatory sequences
          Centromeres and telomeres
          Other functional elements
          Pseudogenes
          Transposons and viruses
          Junk DNA
If there are no objections, I'll begin by moving the current content into the new subsections. Then we can edit the information to make it more consistent and more readable. Genome42 (talk) 18:17, 6 July 2023 (UTC)[reply]

3,054,815,472

[edit]

3,054,815,472 base pairs? The box claims 3,117,275,501. Neither are, of course, right (or maybe both are). Why not give a little credit to the readers and presume that they can handle a bit of complexity? Like saying the number of base pairs is determined using a specific individual's dna (or a group of specific individuals). Why not mention that while we're a diploid species, as far as known we all contain tetraploid and octaploid cells? (This is true, I'm guessing, after a number of cell divisions in the embryo.) Why not say the reference dna is normally selected to be typical and avoids individuals with known aneuploidy, euploidy (except as mentioned), and sex chromosome (rare) variations. In counting the base pairs, there are errors, and any two (independent) researchers are NOT likely to come up with the same exact number, even if the same reference dna is used? Why not say that doing a complete genome is very, very difficult? (I'm not sure that it's ever been fully done ("fully" means counting all base pairs in one lab, by one group of researchers). Also, why not mention that even a reference person will have a number of mutations in their cells (those with nuclei) so that even "their" genome doesn't fully describe their "complete set"?98.21.208.178 (talk) 07:13, 28 February 2024 (UTC)[reply]

I tried to fix this and started the reorganization that I suggested in July 2023. Genome42 (talk) 23:09, 28 February 2024 (UTC)[reply]

Definition of "genome"

[edit]

The very first sentence of the article defines the human genome as, "The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei ..." Except for the ambiguous use of 'pairs,' this is a correct definition that corresponds to the definition in genome, "... the 'genome' refers to only one copy of each chromosome."

A recent edit introduced the term "haploid genome." This is equivalent to saying "haploid set of one copy of each chromosome" and that makes no sense. The phrase "diploid genome" is even worse since it implies that there are different kinds of genomes; haploid, diploid, and polyploid. There can be haploid, diploid, and polyploid sets of chromosomes but not genomes.

I made the appropriate edits. Genome42 (talk) 16:06, 19 September 2024 (UTC)[reply]

This (Genome42's) above post is problematic and imho so problematic as to be just plain wrong. First it ignores mitochondrial DNA, but at least it acknowledges this - sorta. (I say sorta because it is false that mDNA is "A small DNA molecule". It is the set of small DNA molecules found in a persons cells. The number of mDNA molecules in a cell varies by cell type as well as by inheritance and environmental factors. mDNA genetics is non-Mendelian, so it makes some sense to consider nuclear DNA and mDNA separately. The mDNA rings found in one person's cells are not necessarily identical, so to speak of it as "a" molecule is clearly misleading. I'd say wrong except I don't know the frequency of sequence variation, if it's low then speaking of "a" sequence may be an adequate first approximation, IDK. IMHO, the first paragraph of the lead is wrong. Mostly by omission. "The" human genome is a REFERENCE SEQUENCE and last I heard NOT one found in a single human (i.e. it is a composite). This is a problem since the term is used both for the reference sequence and for the set of sets of (nuclear) DNA found in the cells of the entire population of homo sapiens. Also, while it's understood that my/your/his/her/their genome *is* each a human genome, it's also generally understood that there will be sequence differences (but I'd guess that the extent of the differences is both under- and over- estimated). I strongly disagree that "the genome refers only to one copy of each chromosome". It's well-known that understanding of the gene coding, especially protein coding, requires knowledge of both the gene-copies, maternal and paternal. The effects of the gene on a person's phenotype depends on both copies. Depending on how this article is written, it may be necessary to distinguish between the reference genome (there's more than one, of course) and an individual's genome. IDK. It seems like it'd be confusing to mix the several meanings of the term, especially when a statement is true for one meaning and false or indeterminate for the other. The Engineer in me says that any discussion of a physical object should include some measures of its range, as well as the range of uncertainty. It is wrong to say that the Y chromosome contains 62,460,029 base pairs. I'm not sure what the uncertainty is for that *particular* measurement but I doubt that there's been enough error analysis to believe the error is below 1 bp. Generally, few lab techniques are without significant random error and it should be generally assumed that even when calibration is frequent each work-flow (machines and technician set) will have non-random (biasing) errors. A little be less unwarranted certainty is, imho, appropriate here.98.19.177.99 (talk) 18:07, 5 February 2025 (UTC)[reply]