Wikipedia:Featured article candidates/Cross-site leaks/archive1
- The following is an archived discussion of a featured article nomination. Please do not modify it. Subsequent comments should be made on the article's talk page or in Wikipedia talk:Featured article candidates. No further edits should be made to this page.
The article was archived by Gog the Mild via FACBot (talk) 26 March 2024 [1].
- Nominator(s): Sohom (talk) 00:24, 15 February 2024 (UTC)
Say you clicked on that sketchy link that you shouldn't have clicked on, what's the worst that could happen ? This article seeks to answer that exact question by providing a technical introduction to an age old attack that has recently drawn some interest in the academic web security community.
A product of 4 months of almost-continual effort, this article has recieved a extensive GA review from RoySmith and has subsequently been peer reviewed by TechnoSquirrel69. This is my first time nominating an article for the featured star, and I would love to hear any feedback comments that y'all might have -- Sohom (talk) 00:24, 15 February 2024 (UTC)
JimKillock: Support
[edit]Extended content
|
---|
I think it is great to be making articles like this of a good standard. I am sure it is well researched and accurate given the review you have done. However, technical matters like this are very hard to make accessible to an average reader, and I have to say, I really struggle reading this, although I consider that I have a basic lay knowledge of how some of these things might fit together. that said, it also seems a particularly challenging topic to convey in simple terms. The introductory (lead) section is what really matters here. If this can explain the basic concept well enough, then the other sections may be comprehensible. You might want to see if you can try explaining it in reply here, in an over simplified manner, to see if that gives a guide to the edits needed make this sufficiently readable by a general reader. Hope that helps. Jim Killock (talk) 20:00, 17 February 2024 (UTC)
|
Supporting on the basis of an edited simplified introduction, but I recommend that other non-technical editors have a read and make their own assessment and give advice. This is a difficult topic to ensure there is a basic, top-level version accessible enough for WP's general audience. --Jim Killock (talk) 18:11, 10 March 2024 (UTC)
Coordinator note
[edit]This has been open for more than three weeks and has yet to pick up a support. Unless it attracts considerable movement towards a consensus to promote over the next three or four days I am afraid that it is liable to be archived. Gog the Mild (talk) 21:11, 8 March 2024 (UTC)
- @Gog the Mild Any suggestions on where I might be able to attract more reviewers? This article more on the technical side, and a lack of regular reviewers would be expected, since the subject matter (CS/Privacy) is pretty different from most regular FACs (Not that that's a bad thing) :) Sohom (talk) 23:46, 8 March 2024 (UTC)
- @Sohom Datta: I've listed the article at Wikipedia:WikiCup/Reviews needed. Like I mentioned to you earlier today, I'll see if I can come around for another review myself this weekend. Maybe you could also try reaching out to some of the devs on the Wikimedia Discord? —TechnoSquirrel69 (sigh) 01:47, 9 March 2024 (UTC)
- I've notified the Computer Science and semi-active Computer Security Wikiprojects (I should have done this earlier). I don't think the devs on Wikimedia Discord are my best bet, but I'll reach out and see what I can do :) Sohom (talk) 23:45, 9 March 2024 (UTC)
- @Sohom Datta: I've listed the article at Wikipedia:WikiCup/Reviews needed. Like I mentioned to you earlier today, I'll see if I can come around for another review myself this weekend. Maybe you could also try reaching out to some of the devs on the Wikimedia Discord? —TechnoSquirrel69 (sigh) 01:47, 9 March 2024 (UTC)
- My boilerplate advice is
Reviewers are more happy to review articles from people whose name they see on other reviews (although I should say there is definitely no quid pro quo system on FAC). Reviewers are a scarce resource at FAC, unfortunately, and the more you put into the process, the more you are likely to get out. Personally, when browsing the list for an article to review, I am more likely to select one by an editor whom I recognise as a frequent reviewer. Critically reviewing other people's work may also have a beneficial impact on your own writing and your understanding of the FAC process.
- My boilerplate advice is
Gog the Mild (talk) 20:06, 11 March 2024 (UTC)Sometimes placing a polite neutrally phrased request on the talk pages of a few of the more frequent reviewers helps. Or on the talk pages of relevant Wikiprojects. Or of editors you know are interested in the topic of the nomination. Or who have contributed at PR, or assessed at GAN, or edited the article. Sometimes one struggles to get reviews because potential reviewers have read the article and decided that it requires too much work to get up to FA standard. I am not saying this is the case here - I have not read the article - just noting a frequent issue.
Comments from TechnoSquirrel69
[edit]A little late, but saving a spot here. I'm not able to get to the review this weekend, but I'll try to get back here with some comments as soon as time permits. —TechnoSquirrel69 (sigh) 21:05, 10 March 2024 (UTC)
@Sohom Datta: It seems that time did not permit, unfortunately. However, I'm officially back; have a review! Citation numbers from this revision.
Media review
[edit]- File:XS-Leaks Attack Steps.svg is freely licensed and tagged accordingly.
- File:Histogram of cross-site leaks cache timing attack example.png is freely licensed and tagged accordingly.
- Code blocks included in the article are taken from Van Goethem et al., which is freely licensed. The text is appropriately attributed.
- Media review passed.
Other comments
[edit]- So I think the entirety of footnote 4 is unnecessary. The attribution for the code is already given in the references, and if it contains an error of some sort (which I'm not pretending to understand, just assuming), then it should be silently corrected. This is kind of clarification is useful for editors, but I'd expect it more in an HTML comment than in the actual article.
- There are a bunch of citations where you've duplicated part of the URL in the
|website=
parameter. It's much more useful and consistent to identify the name of the website or entity and link to the article about it if possible. For example, in citation 2:|website=developer.mozilla.org
→|publisher=[[MDN Web Docs]]
. Also, use italics only when referring to the name of a publication (The Daily Swig) and not just for every website (Medium), similarly to how the name would be treated in running prose.
- Done, lmk if I missed any
- In citation 15: lose the underscore.
- Done
- In citation 28: Add the author, remove "Cybersecurity news and views".
- Done
- There are a few other citations also missing authors.
- Should be fixed, lmk if I missed any
I'm doing a more detailed review of the prose, which I'll add here once I'm done. Let me know if you have any questions in the meantime! —TechnoSquirrel69 (sigh) 21:59, 21 March 2024 (UTC)
@Sohom Datta: Alright, we talked in much more detail earlier, but I'll just summarize my feedback here for the record.
- A reader needs additional context for the subject matter, which takes the form of explaining how the system is supposed to work before getting into how to exploit it.
- Don't dumb down the content, try to abstract it so readers can understand the general concepts without needing specific knowledge of all the moving parts. When you do have to get into technical territory, make sure to contextualize it in plainer terms.
- The diagram in § Background is confusing and borderline illegible. The article is probably not well served by it, so consider options to replace it or remove it altogether.
—TechnoSquirrel69 (sigh) 04:04, 22 March 2024 (UTC)
Comments from Mike Christie
[edit]I'm not a security expert but I do have some technology background so I'll see if I can provide some useful comments.
- The first thing that strikes me is that the lead is too long for the size of the article -- it's almost 25% of the length of the body. I would cut at least a quarter of it; it only needs to summarize and point to the content in the body.
- Most of the bulk of the lede comes from a detailed, simplified example of a how a attack is performed. (Which was something JimKillock wanted) Would it make sense to merge that with the Mechanism section?
- Per your comment below, I think moving some of the detail to the "Mechanism" section would work well. Mike Christie (talk - contribs - library) 11:02, 12 March 2024 (UTC)
- Most of the bulk of the lede comes from a detailed, simplified example of a how a attack is performed. (Which was something JimKillock wanted) Would it make sense to merge that with the Mechanism section?
- let me know if the current version is any better. Sohom (talk) 00:32, 25 March 2024 (UTC)
- You link to information leakage twice in the lead.
- Fixed
- Still there -- you link from "leak information" in the first paragraph and "information leakage" in the third; I was going to remove the second link but realized you might prefer to keep that one. Mike Christie (talk - contribs - library) 11:02, 12 March 2024 (UTC)
- Fixed
- Looking at File:XS-Leaks Attack Steps.svg, I suggest adding a little more to the caption explaining the sequence. Perhaps add "Here the attacker can deduce that the victim is logged in to the vulnerable site".
- I've expanded some of the captions.
- That does help. How about using the "upright" param to increase the size of the image a bit? It's not readable without clicking through on most screen sizes. Mike Christie (talk - contribs - library) 11:02, 12 March 2024 (UTC)
- I've expanded some of the captions.
- Just making sure I understand the mechanics: client-side Javascript sends a request to victim.leak, which replies; the body of that http response is hidden from attack.leak by the browser, but the http header of the response is returned to the Javascript's execution context, which means it can be forwarded to attack.leak. Is that correct?
- Mostly, the HTTP header is not returned/read by the attacker, but some of the effects of the Content-Disposition header can be observed by attack.leak. (https://xsleaks.dev/docs/attacks/navigations/#download-trigger gives a nice overview of the attack).
- I can see the types of attack are so numerous that it's not feasible for you to list every single one. However, the detection of downloads seems like it is a clear enough example you might consider adding a mention of it to the "Other techniques" paragraph. And is what you say about the Content-Disposition header a generally true statement for most of the attacks? If so it seems like that's a technical detail that ought to be mentioned. Mike Christie (talk - contribs - library) 11:02, 12 March 2024 (UTC)
- Mostly, the HTTP header is not returned/read by the attacker, but some of the effects of the Content-Disposition header can be observed by attack.leak. (https://xsleaks.dev/docs/attacks/navigations/#download-trigger gives a nice overview of the attack).
- I've elected to remove references to the download attack in the new diagram per the new feedback.
- Have there been any known instances of this attack in the wild?
- None that have been documented in RS.
- It's hard to source a negative but I think we should say this if we can source it. Mike Christie (talk - contribs - library) 11:02, 12 March 2024 (UTC)
- None that have been documented in RS.
- I've not been able to find any sources that prove the negative, the closest RS comes to describing in-the-wild instances are Terjanq's and Luan Herrara's attack. :( Sohom (talk) 00:32, 25 March 2024 (UTC)
In the last paragraph of the lead, "traditionally", "modern" and "more recently" imply a time frame; can we put dates on these? "Until the 2010s" and "since about 2020", or whatever the sources would support. Otherwise the language is going to date relatively quickly."Cross-site leaks allow attackers to break this cross-origin barrier, which is inherent in web app contexts": The previous sentence described the barrier as preventing arbitrary execution, so I think "break" is too strong here -- really it's a read-only breach. How about "Cross-site leaks allow attackers to obtain information despite [or through] this cross-origin barrier". And what does "which is inherent in web app contexts" add that hasn't been said in the previous sentences?- Removed the last part, and reworded the rest
"To perform a cross-site leak, the attacker must identify and include at least one state-dependent URL in the victim app." This makes it sound as if the attacker is including something in victim.leak; what I think you mean is "To perform a cross-site leak, the attacker must identify at least one state-dependent URL in the victim app for use in the attack app".- Done
"To demonstrate ... is taken": suggest "The following example of ... demonstrates a common scenario of ..." -- I think the "is taken" wording sounds a bit strained.- Done
Just out of curiosity, and to see if I understand the mechanism properly, if the attacker used an icon loaded from their own network, wouldn't that give them more specific information than timing a CDN icon return to see if it was cached- Yes, that would definitely give the attacker more information (and make the attack easier), but in this case, the assumption we are making is that the attacker cannot tamper with the content of the victim website, just make requests to it
- Right -- I'd misunderstood the mechanism. Rereading I don't think more is needed; I just misread it. Mike Christie (talk - contribs - library) 11:02, 12 March 2024 (UTC)
- Yes, that would definitely give the attacker more information (and make the attack easier), but in this case, the assumption we are making is that the attacker cannot tamper with the content of the victim website, just make requests to it
Suggest expanding "iff" and unlinking it; no need to abbreviate to that level.- Done
"but used an amplification technique in which the input was crafted to extensively grow the size of the responses, leading to a proportional growth in the time taken to generate the responses, thus increasing the attack's accuracy". What would we lose if this was shortened to "but used a technique in which the input was crafted to grow the size of the responses, leading to a proportional growth in the time taken to generate the responses, thus increasing the attack's accuracy"?- Done
"Since 2020, there has been some interest among the academic security community to standardize these attacks." Suggest "Since 2020, there has been some interest among the academic security community in standardizing the classification of these attacks".- Done
You might consider changing to {{Use Oxford spelling}} instead of {{Use British English}}, since you're using "-ize" endings.- Done
"... for which there is no established, uniform classification. These attacks are typically categorized by ...": seems contradictory.- I guess I want to emphasize "established" and "uniform" in the previous sentence.
- "As of 2021, researchers have identified over 38 leak techniques that target components of the browser, and new techniques are discovered due to ongoing changes in web platform APIs": I'm not sure what the second half of this is saying. Does it refer to discoveries that post-date the 2021 list of 38 techniques? Or is it a general statement about how new techniques can appear?
- It's a general statement on how new techniques appear.
- Could we make this "As of 2021, researchers have identified over 38 leak techniques that target components of the browser. New techniques are typically [or often] discovered due to ongoing changes in web platform APIs"? Assuming the source supports this? Mike Christie (talk - contribs - library) 11:02, 12 March 2024 (UTC)
- It's a general statement on how new techniques appear.
"timing attacks could infer cross-origin execution times across embedded contexts": what does "across embedded contexts" mean?- "Embedded contexts" would be mostly iframes (and other more obscure framing techniques)
- OK -- I think that's fine as is; I'm not a web developer but I think anyone familiar with the field would have no trouble with this. Mike Christie (talk - contribs - library) 11:02, 12 March 2024 (UTC)
- "Embedded contexts" would be mostly iframes (and other more obscure framing techniques)
"showed the Performance API could leak": "Performance API" needs a link or a footnoted explanation; I assume it's one of Chrome's APIs but that should be clearer.- Added
- "In contrast, if the handler onerror is triggered with a specific error event, the attacker can use that information to distinguish between HTTP content types, status codes and media-type errors": again just checking my understanding -- wouldn't this information already be available in the http status code?
- Yes it would, but the browser would not allow cross-origin pages to access the http status codes
- OK -- this is the same question as above about the Content-Disposition header; I hadn't understood exactly what information is allowed to be seen by the browser, and was assuming some aspects of the status were directly visible. I think in the mechanism section some statement that incorporates what you've told me in answer to these two questions would be helpful. Mike Christie (talk - contribs - library) 11:02, 12 March 2024 (UTC)
- Yes it would, but the browser would not allow cross-origin pages to access the http status codes
- If the sources give enough information, what could the "global limits" reveal? And is this section different from the last sentence of "Timing attacks" which talks about a pool party attack?
(And is it "pool party" or "pool-party"? You have both.)- This section is not different from the last sentence, these attacks have been categorized by Knittel as both timing and as part of the new "global limits" type. The paper dicussing pool-party attacks uses the "pool-party" convention, I'll stick with that.
- I think it would be helpful if the reader knew in the global limits section that the previously mention pool-party attack was an example of this type of attack. Perhaps in the timing attacks section add something like "this is an example of a global limits attack"? Or the reverse: in the later section mention the earlier timing attack as an example. Mike Christie (talk - contribs - library) 11:02, 12 March 2024 (UTC)
- This section is not different from the last sentence, these attacks have been categorized by Knittel as both timing and as part of the new "global limits" type. The paper dicussing pool-party attacks uses the "pool-party" convention, I'll stick with that.
"an attacker could leak whether or not a Cross-Origin-Opener-Policy header was set": can we say what this would reveal to the attacker?- So, the presence or absence of a header doesn't reveal much on it's own. However, it's a mechanism to tell two responses apart. Ditto for the one above.
Suggest linking "stateless" to stateless protocol.- Done
"By disallowing the embedding of the website in untrusted contexts, the malicious app can no longer ...": needs rephrasing; as written this says it's the malicious app that is doing the disallowing.- Rephrased
Am I right in thinking that the Fetch metadata headers do nothing by themselves, but require the targeted app to take action depending on their content? So they enable a defence but are not in themselves a defence?- Yep, they enable a defence but they are not defences in themselves (it allows for disallowing specific "risky" requests)
-- Mike Christie (talk - contribs - library) 13:16, 11 March 2024 (UTC)
- @Mike Christie Thank you so much for the review. I've implemented most of the feedback and left a few inline explanations.
- I'm a bit confused regarding the lede (a lot of the bulk comes from implementing User:JimKillock's (courtesty ping) suggestions regarding simplified overview of the topic for general readers). I wonder if moving some of the example related content into the "mechanism" section would be a good idea :) Sohom (talk) 04:55, 12 March 2024 (UTC)
- Hi both, take a read of
- WP:TECHNICAL: It is especially important to make the lead section understandable using plain language, and it is often helpful to begin with more common and accessible subtopics, then proceed to those requiring advanced knowledge or addressing niche specialties.
- WP:EXPLAINLEAD: For these reasons, the lead should provide an understandable overview of the article. While the lead is intended to mention all key aspects of the topic in some way, accessibility can be improved by only summarizing the topic in the lead and placing the technical details in the body of the article. … In general, the lead should not assume that the reader is well acquainted with the subject of the article. Terminology in the lead section should be understandable on sight to general readers whenever this can be done in a way that still adequately summarizes the article, and should not depend on a link to another article.
- WP:ONEDOWN A general technique for increasing accessibility is to consider the typical level where the topic is studied (for example, secondary, undergraduate, or postgraduate). … The lead section should be particularly understandable, but the advice to write one level down can be applied to the entire article, increasing the overall accessibility. Writing one level down also supports our goal to provide a tertiary source on the topic, which readers can use before they begin to read other sources about it. Writing one level down also supports our goal to provide a tertiary source on the topic, which readers can use before they begin to read other sources about it. In terms of the example, For example, a long-winded mathematical proof of some result is unlikely to be read by either a general reader or an expert, but a short summary of the proof and its most important points may convey a sense to a general reader without reducing the usefulness to an expert reader.
- I think a simple lead and then layering the basic description afterwards would fit the above from the WP:MOS, but you would need to take care that the lead itself remains comprehensible to an "average non-technical reader". Jim Killock (talk) 07:48, 12 March 2024 (UTC)
- Hi both, take a read of
Arbitrary break: Sohom
[edit]@JimKillock, Mike Christie, and TechnoSquirrel69: (also @Joereddington: who left some comments at WikiProject Computer Security :) I've rewritten the lede and the background. I've elected to remove the detailed description of the attack from the lede (the example and description have been moved to the mechanism section) and instead provide a brief overview of the salient aspects of the attack. The background section has been expanded to provide some context on why a attacker might want to perform the attack and explain the impact of the same-origin policy in a better way (it also goes into detail about the ideal way everything should work). The confusing drawing in the background+mechanism section has been replaced with a much better and simplified diagram that does not include references to the download identification attack. (after a lot of feedback from Technosquirrel69) Sohom (talk) 00:32, 25 March 2024 (UTC)
- Thanks - I think the shortened lead approach can work well here, and entirely agree with it being moved; however, the current lead contains a lot of unexplained concepts which are of course broken down in the background you wrote. If this wasn't Wikipedia, I would add a sentence to guide the unitiated to hold on (eg, "a simple explanation of the process is provided below"). Also if this wasn't Wikipedia I would suggest removing more of the potentially confusing and not fully expanded concepts in order to ensure the reader doesn't feel they've lost the thread and stopped.
- Given all that may break the rules, I would aim for a very simple over-view up front along the lines of: In a cross site attack, the user is duped into visiting a malicious website, that asks the users' browser to get information from another web service, like a search engine, without the user knowing about it. Because the other web service was "asked" by the users' web browser, it complies with the request. The malicious website can then learn something about the user's relationship with the web service, through things like the length of time it takes for a request to come back, or the amount of information the web service gives to the user. While the malicious website cannot read the information from the web service directly, as it is collected by the user in their web browser, the malicious website can make accurate inferences that reveal specific facts about the user.
- You could even incorporate here or perhaps in the background section: For example, the attacker could ask your browser to search a web based email service. The attacker would then pick two queries (say "dog" and "ggdkjsvkjfdsgfdjkgjfdsdj"). They know that "ggdkjsvkjfdsgfdjkgjfdsdj" will always return a empty response. Given this, the attacker will then observe the difference when your browser gets an empty response versus a non-empty response. Once they do that, they are able to make the two queries, and if both responses are empty, they know that you don't own dogs, else they now know that you own or talk about dogs. Jim Killock (talk) 04:58, 25 March 2024 (UTC)
- I've tried to simplify the language of the first portion of the lede and incorporate some of your suggestions. Let me know if it works now. Regarding the rest, I have reservations about including the exact example (which I had outlined previously). However, I've incorporated a part of the text in the Mechanism section. Sohom (talk) 16:44, 25 March 2024 (UTC)
- I understand the reluctance, and I've no wish to keep pushing my own view here. But I would ask you if you can honestly say that per WP:EXPLAINLEAD and WP:TECHNICAL that the "lead section [is] understandable using plain language", consistently does "not assume that the reader is well acquainted with the subject" and that "Terminology in the lead section [is] understandable on sight to general readers".
- The recommended tool hemingwayapp is giving the lead a rating of "Grade 14 Poor. Aim for 9. and says "11 of 28 sentences are very hard to read".
- Personally I think it is possible to make the introductory remarks simpler, which was my aim in writing a few lines to show how it could be approached. And when asked casually, you have yourself given me very good and impressively clear simple explanations. Jim Killock (talk) 20:59, 25 March 2024 (UTC)
- I've tried to simplify the language of the first portion of the lede and incorporate some of your suggestions. Let me know if it works now. Regarding the rest, I have reservations about including the exact example (which I had outlined previously). However, I've incorporated a part of the text in the Mechanism section. Sohom (talk) 16:44, 25 March 2024 (UTC)
Coordinator note 2
[edit]This has been open for nearly six weeks and has attracted a lot of comments but only declarations of support. It currently feels more like a PR than a FAC. There still seems a way to go to achieve any consensus to promote so I am to put it to bed now and ask that further work take place away from FAC with discussion on the article talk page, or possibly PR. I anticipate seeing it back here soon, although the usual two-week wait applies. You can of course again ping the reviewers to comment at the next FAC. Cheers. Gog the Mild (talk) 12:33, 26 March 2024 (UTC)
- Closing note: This candidate has been archived, but there may be a delay in bot processing of the close. Please see WP:FAC/ar, and leave the {{featured article candidates}} template in place on the talk page until the bot goes through. Gog the Mild (talk) 12:33, 26 March 2024 (UTC)
- The above discussion is preserved as an archive. Please do not modify it. No further edits should be made to this page.