Talk:AlphaFold
This is the talk page for discussing improvements to the AlphaFold article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
Archives: 1, 2Auto-archiving period: 12 months |
A fact from AlphaFold appeared on Wikipedia's Main Page in the Did you know column on 8 February 2021 (check views). The text of the entry was as follows:
|
This article is rated C-class on Wikipedia's content assessment scale. It is of interest to multiple WikiProjects. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Yesterday Deepmind deprecated old weights (only for Multimer)
[edit]So AF 2.3.0 is trained on much more proteins from before and has some new hacks. https://github.com/deepmind/alphafold/commit/9b18d6a966b9b08b2095dd77d8414a68d3d31fc9
We have fine-tuned new AlphaFold-Multimer weights using identical model architecture but a new training cutoff of 2021-09-30. Previously released versions of AlphaFold and AlphaFold-Multimer were trained using PDB structures with a release date before 2018-04-30, a cutoff date chosen to coincide with the start of the 2018 CASP13 assessment. The new training cutoff represents ~30% more data to train AlphaFold and more importantly includes much more data on large protein complexes. The new training cutoff includes 4× the number of electron microscopy structures and in aggregate twice the number of large structures (more than 2,000 residues)[^1]. Due to the significant increase in the number of large structures, we are also able to increase the size of training crops (subsets of the structure used to train AlphaFold) from 384 to 640 residues. These new AlphaFold-Multimer models are expected to be substantially more accurate on large protein complexes even though we use the same model architecture and training methodology as our previously released AlphaFold-Multimer paper. 109.252.170.50 (talk) 22:17, 12 December 2022 (UTC)
CASP15 presentations, all (top 5) winners for not RNA are AF2-based
[edit]Ground truth https://predictioncenter.org/casp15/TARGETS_PDB/ (not all, no RNA)
https://predictioncenter.org/casp15/doc/presentations/
Video will be also later available. No (!) AF 2.3 with new weights (some are here: https://github.com/deepmind/alphafold/blob/9b18d6a966b9b08b2095dd77d8414a68d3d31fc9/docs/casp15_predictions.zip)
Openfold participated too. 109.252.170.50 (talk) 22:25, 12 December 2022 (UTC)
Alphafold fixes ancient DNA problem by reading proteins
[edit]By directly reading from Genyornis newtoni egg. https://www.pnas.org/doi/10.1073/pnas.2109326119 109.252.170.50 (talk) 22:29, 12 December 2022 (UTC)
"Responses" section
[edit]Almost all of the "Responses" section corresponds to a short period of time in late 2020 after AlphaFold 2 was unveiled but before technical details were given and the code made open-source. This is too focused, speculative, and presents relatively little interest nowadays. I propose to remove almost all of this section, and to update the "AlphaFold 2, 2020" subsection with recent content. Alenoach (talk) 06:42, 3 May 2024 (UTC)
- Yes, I think this section might be removed, or at least shortened and rewritten. My very best wishes (talk) 21:27, 12 May 2024 (UTC)
- Thanks for the response. By the way, are you sure that the section "Protein folding problem" needed to be removed? My impression is that it was a useful introduction for readers that don't know what protein folding is, and to explain the "historical" context and the methods that were used before AlphaFold. I don't think this section is really making the confusion that AlphaFold would be simulating the process of protein folding as suggested in the edit summary. Do you agree? Alenoach (talk) 22:13, 12 May 2024 (UTC)
- OK. I self-reverted. 15:24, 13 May 2024 (UTC)
- Thanks for the response. By the way, are you sure that the section "Protein folding problem" needed to be removed? My impression is that it was a useful introduction for readers that don't know what protein folding is, and to explain the "historical" context and the methods that were used before AlphaFold. I don't think this section is really making the confusion that AlphaFold would be simulating the process of protein folding as suggested in the edit summary. Do you agree? Alenoach (talk) 22:13, 12 May 2024 (UTC)
- I shortened the section. Alenoach (talk) 22:59, 12 May 2024 (UTC)
- Looks good. My very best wishes (talk) 15:24, 13 May 2024 (UTC)
- I think it might still nevertheless be useful to mention that there were these concerns when AF2 was first released (including perhaps the Spiegel quote), but with the release of the code and with the experience of use, those criticisms have largely gone away. At the moment I find the latest version a bit unbalanced in respect of the initial reaction -- citing the most enthusiastic puff pieces, but not those with reservations. I think we actually would make the positivity about AF2 more credible by citing that there were some initial reservations, but those have not lasted. (Apart from any that have?) Jheald (talk) 21:03, 13 May 2024 (UTC)
- Sure, you can add back some content on the reservations, as long as it's interesting and understandable for readers, and not too outdated. Alenoach (talk) 21:15, 13 May 2024 (UTC)
- I agree this section sounds like an advertisement (it should not be), but Alenoach did good work by removing parts that are definitely outdated after the just released AF-3 and two latest versions of AF-Multimer (can be found in colabfold), which are significantly better than the first Multimer version (one with frequent overlaps of atoms used here).
- This is complicated, and should be considered separately for monomeric proteins, protein complexes and complexes with ligands:
- Monomeric structures. One central issue was nicely illustrated by Figure 1 in this article ("The good, the bad and the ugly"), i.e. 30% of sequence is predicted with low confidence score and should be discarded (a "dark matter") - fig. 2 in same paper; note these are monomeric structures. Even though the paper was published in 2021, this is still main issue of AF (and probably also of proteins themselves) - for monomers and complexes.
- Complexes. That issue is worse for complexes, since they are typically determined with lower "protein-protein" scores ("ipTM" scores, see here). Actually, the ipTM scores are usually low or medium range, and the precision for large complexes is mediocre, meaning that different sets of residues from two subunits interact in the experimental (correct) and the modeled by AFM structures, as one can judge by calculating the well know DockQ score [1]. Some people are trying to generate as many divergent models as possible using different AFM versions (e.g. ptm, v2 and v3 from Colabfold) and select best of them using available experimental data, including experimental structures of partial complexes.
- Complexes with non-protein ligands. There is currently a single article in Nature by authors of AF-3; there are no independent assessments. My very best wishes (talk) 22:04, 13 May 2024 (UTC)
- I think it might still nevertheless be useful to mention that there were these concerns when AF2 was first released (including perhaps the Spiegel quote), but with the release of the code and with the experience of use, those criticisms have largely gone away. At the moment I find the latest version a bit unbalanced in respect of the initial reaction -- citing the most enthusiastic puff pieces, but not those with reservations. I think we actually would make the positivity about AF2 more credible by citing that there were some initial reservations, but those have not lasted. (Apart from any that have?) Jheald (talk) 21:03, 13 May 2024 (UTC)
- Looks good. My very best wishes (talk) 15:24, 13 May 2024 (UTC)
- P.S. Right now there are many new publications beyond CASPs that assess various AF versions for predicting mutations, multiple conformations, complexes, etc. But this would take a lot of time to include to the page. Overall, AF can produce a lot of interesting structures that look real, but one must verify each specific model very carefully through massive use of all available experimental data on mutations, structures, functions, complexes, etc. See Hallucination (artificial intelligence); AF may have same issue [2]. My very best wishes (talk) 19:16, 13 May 2024 (UTC)
How AF2 works ?
[edit]Another thing that would be good to see updated is the discussion on how AF2 works.
The present text in the article was written just after AF2 was first unveiled, when details were limited. In time AF2 became much better understood, and by the time the source code was released the system was rather more understood, so that the code was considered to be substantially as by then expected. IMO indicating how AF2 achieved what it did is really important, but our article doesn't really reflect the understanding that developed; in some regards what we currently give is at best unhelpful, at worst substantially misleading.
Alas, I don't have any time I can put into this at the moment, but IMO this is another section that could do with a substantial review / rework / rewrite (with sources). Thx, Jheald (talk) 19:11, 14 May 2024 (UTC)
- I see, you are talking about section AlphaFold#AlphaFold_2,_2020. I do not think this is misleading, but you are very welcome to improve. I just checked AF-3. The server is easy to use and very fast (they have made defunct Colabfold Google labs server). There is no dramatic improvement compare to the latest AF2 version for large complexes, only small improvements in some cases. Cases like a homodimer of [3] are still a mess. The set of ligands is ridiculously insufficient, the set of PTMs is better, although also rather incomplete. Possibly a breakthrough for RNA and DNA complexes, but I did not check those. My very best wishes (talk) 15:34, 20 May 2024 (UTC)
- Honestly, you seem to have much more expertise on this domain than most of us, so it's a bit hard to really make sense of your comments. I have the same impression as Jheald that this part needs to be updated, but I also don't feel knowledgeable enough. If you are motivated, feel free to modify the article directly, while keeping it relatively easy to understand for readers (mostly well-educated outsiders I guess). No obligation of course, and thanks for your work. Alenoach (talk) 18:28, 26 May 2024 (UTC)
- To avoid WP:OR, we need to borrow/summarize a simplified explanation from sources, such as [4], [5] (few first paragraphs), [6]. But ultimately, this just a "black box"; a user will not have a slightest idea on why exactly such and such structure has been generated. The result does depend on the quality of input, such as the multiple sequence alignment (MSA) (because the correlations in MSAs play a role) , and the existence of similar structures in the PDB which affect the parameters obtained during training of the model. Perhaps I will add something later. My very best wishes (talk) 21:19, 27 May 2024 (UTC)
- Honestly, you seem to have much more expertise on this domain than most of us, so it's a bit hard to really make sense of your comments. I have the same impression as Jheald that this part needs to be updated, but I also don't feel knowledgeable enough. If you are motivated, feel free to modify the article directly, while keeping it relatively easy to understand for readers (mostly well-educated outsiders I guess). No obligation of course, and thanks for your work. Alenoach (talk) 18:28, 26 May 2024 (UTC)
- P.S. It does provide significant improvement for protein complexes in cases when some proteins in a complex (such as Tyrosin-protein kinase Lck) have important PTMs, such as lipidation. One must make such PTMs during their modeling with AF3. Overall, this is a fantastic modeling tool, but one that requires a significant biological expertise to interpret the results of calculations, verification of every model through comparison with experimental data, sampling (e.g. by using calculating with AF2), sometimes modeling of protein complexes by pieces (e.g. complexes of large transmembrane Tyr kinases), etc. My very best wishes (talk) 16:00, 22 May 2024 (UTC)
- Wikipedia Did you know articles
- C-Class Molecular Biology articles
- Unknown-importance Molecular Biology articles
- C-Class Computational Biology articles
- High-importance Computational Biology articles
- WikiProject Computational Biology articles
- All WikiProject Molecular Biology pages
- C-Class Computing articles
- Low-importance Computing articles
- All Computing articles
- C-Class Computer science articles
- Low-importance Computer science articles
- WikiProject Computer science articles
- C-Class Google articles
- Low-importance Google articles
- WikiProject Google articles