Wikipedia talk:Search engine indexing (proposal)
Archive: NOINDEX of noticeboards
Proposals
[edit]This page is designed to capture the current state of affairs, and to allow refinement of a proposal that was started at this Village Pump (policy) thread. Jehochman Talk 16:26, 5 January 2009 (UTC)
- I've updated and rounded this out, it now reflects better the actual state of affairs, and the VP proposal. FT2 (Talk | email) 18:45, 5 January 2009 (UTC)
- I have made a couple of rough FAQs, but obviously on the 2nd of them you and others are more expert. FT2 (Talk | email) 19:05, 5 January 2009 (UTC)
- Keep going and I will add my thoughts along the way. It will be best to explain the pros and cons and seek to work out a solution that addresses as many concerns as possible. On major articles, you are correct that external linkage dominates. I am more concerned about the large number of poorly developed pages that struggle to rank. Jehochman Talk 19:48, 5 January 2009 (UTC)
- I am unconvinced that internal linkages count for much. The fact that a Wikipedia talk page has 200 links to WP:SOMEPAGE or some article, will surely be heavily discounted almost to zero, since Wikipedia is well known to be massively interconnected. It'll be the incoming links that determine ranking, even for less common topics. Do you want to add below a brief review of "Background info on Google ranking of Wikipedia mainspace" in the meantime, as a summary? FT2 (Talk | email) 00:02, 6 January 2009 (UTC)
- A common misconception is that external links are more significant than internal links. The reason Wikipedia pages rank well is that we have a small set of pages that draw a ton of links, and then we effectively distribute that link juice to a large number of pages that get little "linklove". This proposal may impact the distribution and cause a lot of those lesser pages to drop in rankings, if it is implemented poorly. In general, it is best to let Google decide what pages should appear. A useful thing for you to do in advance would be to examine the traffic stats and see how much traffic is sent to the various space. You'd probably find that very little traffic is being sent to the pages you want to NOINDEX, and that Google is already taking care of this without the need to invest copious amounts of volunteer time or without the need to inconvenience the majority of people who prefer to use Google instead of our internal site search. I believe we should generally let Google index, except when we have a clear reason not to allow them. As I've said, places like RFA, RFA, RFCU, SSP, Noticeboards, Userspace and other venues where editors are the focus rather than content may be worth noindexing. However, I think it will be bad to noindex discussions about encyclopedia content. People have a legitimate interest in looking up discussions about a topic just as they may wish to search for the topic itself. Jehochman Talk 02:44, 6 January 2009 (UTC)
- I am unconvinced that internal linkages count for much. The fact that a Wikipedia talk page has 200 links to WP:SOMEPAGE or some article, will surely be heavily discounted almost to zero, since Wikipedia is well known to be massively interconnected. It'll be the incoming links that determine ranking, even for less common topics. Do you want to add below a brief review of "Background info on Google ranking of Wikipedia mainspace" in the meantime, as a summary? FT2 (Talk | email) 00:02, 6 January 2009 (UTC)
- Keep going and I will add my thoughts along the way. It will be best to explain the pros and cons and seek to work out a solution that addresses as many concerns as possible. On major articles, you are correct that external linkage dominates. I am more concerned about the large number of poorly developed pages that struggle to rank. Jehochman Talk 19:48, 5 January 2009 (UTC)
- I have made a couple of rough FAQs, but obviously on the 2nd of them you and others are more expert. FT2 (Talk | email) 19:05, 5 January 2009 (UTC)
Ranking
[edit]Ranking is raised by some as a question. Jehochman posts as follows:
- "One concern is that de-indexing talk pages may result in PageRank drops throughout Wikipedia because Talk pages are linked from every article. It may be necessary to use NOFOLLOW on any links that point to robots excluded pages. This may also improve the efficiency of spidering the site depending on the method used to exclude the pages."
- Response - Although a non-expert I stand by two thoughts - 1/ rankings are not our "goal", and 2/ Google is sophisticated; if we despider discussion pages the significance of content pages is unlikely to be much affected.
- "Another concern is that de-indexing large numbers of pages may create an incentive for parasitic marketers to copy de-indexed content from Wikipedia to set up mirror sites embedded with ads. In this case objectionable content may still find its way to public view. There are already many mirrors of Wikipedia in existance. Removing our content from the search index will tend to boost the rankings of mirror sites and send more traffic to those sites and less to Wikipedia."
- Response - See above. "We have to make this stuff available otherwise other people will" doesn't work for wide swathes of projectspace already; it's unlikely to be an issue here. If other sites do spider non-content, then that's fine; the issue is it is not under the Wikipedia banner, nor with the Wikipedia standing; nor is it as likely to get the same visibility or implied standing as it would on Wikipedia in most cases; nor if it is a sudden issue is it likely to be quite as big an issue. Net benefit. FT2 (Talk | email) 00:50, 6 January 2009 (UTC)
Educating users
[edit]Jehochman also posted - "It is also concerning that many unsophisticated users prefer Google search and do not understand how to use Wikipedia's internal search feature. De-indexing large amounts of pages may make Wikipedia less accessible to the ordinary user."
- Response - Users need educating that internal search exists. Blast it in the banner for a month, put it on the noticeboards, write it in Signpost, "From <DATE>, Google will no longer be indexing talk pages and user space. Please use internal search for these instead. Help on internal search can be found <HERE>." Things do change, and for sure, users who wish to search communal discussions will have to click internal search and use that instead. That's going to be a given.
- Since it handles similar format to Google and Google advanced search this shouldn't be difficult; it's a matter of education. New users will learn when they edit, "this is how to search for non-content and discussion threads". They'll find it is actually better, and handles useful things like "sections" and "namespaces" that Google doesn't. Non-editors seeking content will find it fine with Google. Non-editors seeking non-content will either need to use internal search, or accept that the site norm is, we aren't spidering it externally.
- Last, never forget our "ordinary user" is after content, not editor discussions. That is our "user", our "reader" and the person we do this for, and apparently, pretty much 99% are not editors. FT2 (Talk | email) 00:50, 6 January 2009 (UTC)
- Why do you presume to know what users are after? Let them search as they like, using whatever tool they like. This whole proposal is very paternalistic. "We know better than the user." No we don't. Jehochman Talk 02:48, 6 January 2009 (UTC)
- We probably don't want to be paternalistic... but we DO want to counter the "But I want Google" comments. We want people to realise that our internal search is up to the job and getting better (but we also want to encourage people to report issues so it can get even better still). How to do that without being paternalistic? Dunno but it needs saying. ++Lar: t/c 03:53, 6 January 2009 (UTC)
- What to spider is the issue, and this proposal sets out why certain changes may be desirable for the project. Those changes require, inevitably, a switch from external to internal search by people seeking material within editorial chat and userspace. That in turn requires 1/ education that it is possible, and 2/ a change in the tool they reach for to do so. People do this all the time, and since the usage of both searches is very similar the bridge is a low investment and non-prohibitive one.
- Anyone who uses Google simply, can enter a similar query in internal search; anyone who uses Google advanced search can either use a similar search on internal search or has evidenced they can comprehend more advanced searches (possibly with brief help). The results of course will be intuitive to any Google user, and Google will continue to index all mainspace and other "content". FT2 (Talk | email) 04:15, 6 January 2009 (UTC)
- Creating a monopoly for the internal search function is not the way to convince people to use it. (See also Microsoft. Contrast: Cluetrain Manifesto.) People should be allowed to use whatever search tool they prefer. What if somebody wants to find out about "wikipedia blocking process" and they want results from Wikipedia and other sites. The proposal by FT2 seems like it would enforce the "power cabal's" view. Only "officially sanctioned" pages would appear in Google search results. Dissenting views would be stiffled completely. The wrongfulness of this proposal is stunning. Jehochman Talk 14:02, 6 January 2009 (UTC)
Internal search is much better now; for example see here and here. These search boxes do the trick for those noticeboards. Leverage what we have which works fine. rootology (C)(T) 14:23, 6 January 2009 (UTC)
- (edit conflict) If "wikipedia blocking process" is part of a policy, guideline, encyclopedic content, a projectspace page that users decide (via consensus) to spider, and so on, then it'll be found. If it's 15 talk pages covering "User:Andrew Doe abusing wikipedia blocking process", it'll need searching for internally. FT2 (Talk | email) 14:45, 6 January 2009 (UTC)
- If it is there and can be found, why require people to use our search tool rather than their favorite search tool? If info is harmful and unchecked or inaccurate, then it should be removed. We have two different philosophies here which cannot be reconciled. We both agree that bad content should be removed. I suggest removing bad content when found and otherwise allowing the site to be spidered, except for pages where there is a demonstrated risk of bad content appearing. You suggest de-spidering most pages and only allowing checked pages to appear when proven that the risk of bad content is low. Both are potentially valid strategies, but there are trade offs. My plan places a greater value on transparency, convenience and freedom of choice. Your plan places a greater value on centralized control and risk avoidance. Jehochman Talk 15:35, 6 January 2009 (UTC)
Revisited
[edit]Forgive me if this has already been mentioned, there is an awful lot of discussion to read through, and I'm not even sure it is technically possible. Can certain pages outside of the article namespace be made to show in other search engines only if accompanied by the keyword "wikipedia"? For example, under this proposal, if I type in "peer review" to Yahoo!, I get back only the article peer review. But if I type in "peer review wikipedia" I get back both the article and WP:Peer review. As it stands right now, I type in "peer review" and get back both, even though I have no use for Wikipedia's peer review (no offense, I just wanted a plain ol' article from any website). Just a thought, might be technically impossible though. --64.85.223.233 (talk) 07:52, 2 March 2009 (UTC)
Template: namespace?
[edit]Is not considered in any of the proposals thus far. For the sake of completeness, it probably should be. Thoughts? Happy‑melon 21:49, 6 January 2009 (UTC)
- I think it shouldn't be indexed. There's no reason why page fragments should be turning up in Google search results. --Carnildo (talk) 00:17, 7 January 2009 (UTC)
- Agree. FT2 (Talk | email) 00:39, 7 January 2009 (UTC)
- Absolutely correct. On most websites I build there is a /templates directory for pieces of pages that are reused. These are never indexed, because they are duplicative. One thing we need to do is always include the name of a template in a comment in the code generated by the template. It is maddening when somebody subst's a template and I cannot figure out which one they were using. Jehochman Talk 00:39, 7 January 2009 (UTC)
File space override: why?
[edit]To me, File: (which used to be images:) is interchangeable with the Main space; it's all encyclopedia content. What is the reasoning for allowing exclusions on File: ? I can see the rationale on Portal space--some of them may be internal portals, for project stuff; and for Categories--some may conflict with BLP in some fashion but may still have value for the encyclopedia to a degree, but should be devalued on SEO. But for images, and public domain audio/video? rootology (C)(T) 22:28, 6 January 2009 (UTC)
- Images like these, I guess, we have innumerable images that were uploaded purely to illustrate technical discussions and bug reports, that are of no encyclopedic use whatsoever. Happy‑melon 22:37, 6 January 2009 (UTC)
- Yeah, that's a tiny fraction of our hundreds of thousands (or more) media files on en.wp however. My concern is that someone--anyone--would or will try to leverage such a NOINDEX option vs. "prurient" images, based on whatever standard individuals may try to have. Another factor is fair use--we allow it. Will someone try to push to NOINDEX fair use media (something I would not object to)? What is the acceptable scope of NOINDEX media? rootology (C)(T) 22:42, 6 January 2009 (UTC)
- I tried that already. Digging around for the discussion. MBisanz talk 22:45, 6 January 2009 (UTC)
- The fact that it's a tiny fraction of the space explains why it should be indexed by default, but not why we wouldn't want to have an override. It seems elementary good housekeeping to be as clear as possible. The situation is somewhat analogous to the Category: space, where there are maintenance structures intermixed with content cats. I'm not sure where I'd stand on the non-free issue (something to think about), but it would be silly to deprive ourselves of the option. I'd be more concerned about the images-of-Mohammed and images-of-body-parts ulcers leaking over into a NOINDEX dispute, but that's both rather trivial and easily resolved by the processes we already have. Happy‑melon 23:39, 6 January 2009 (UTC)
- Yeah, that's a tiny fraction of our hundreds of thousands (or more) media files on en.wp however. My concern is that someone--anyone--would or will try to leverage such a NOINDEX option vs. "prurient" images, based on whatever standard individuals may try to have. Another factor is fair use--we allow it. Will someone try to push to NOINDEX fair use media (something I would not object to)? What is the acceptable scope of NOINDEX media? rootology (C)(T) 22:42, 6 January 2009 (UTC)
Would this mean that images wouldn't show up on google image search? JoshuaZ (talk) 04:52, 8 January 2009 (UTC)
- It would mean that the image pages themselves would not show up on the search (when those pages have been manually noindexed). Images would still show up whenever they are used in a page that is indexed; so all the images that are used in articles would still appear. So if you searched for "BMW logo", the page File:BMW logo.svg would not show up, but the article BMW (where the logo is used) would.
- Incidentally, I think this might have been overegged as a problem: there actually aren't many of our images that show up in Google Image search ([1]). Happy‑melon 11:38, 8 January 2009 (UTC)
- And they would all show up by default. The only real concern with this is defining the conditions when they can be exempted. rootology (C)(T) 14:36, 8 January 2009 (UTC)
"When" File space can be NOINDEXed
[edit]Thanks for the prior responses. To be honest, I have no problem with it, and I'm sure many others won't, provided it is firmly detailed when something in File: space can be set NOINDEX. I will be against this one facet of it absolutely unless there is no dangerous or nonsensical way for anyone to IAR, bluster, or shoehorn random files and media (our "customer facing content") off of search engines without a) consensus b) meeting defined qualifications--nothing arbitrary. Can we do this? I think the proposal is bulletproof otherwise. rootology (C)(T) 14:39, 8 January 2009 (UTC)
- Option 1 is to put the all-encompassing "when there is consensus to do so" panacea into the proposal and sort it out later. Option 2 is to try and work something out now. The downside of option 2 is that it will fragment the discussion into little camps of people who want Image:Foo or File:Bar to be noindexed for very-important-reason XYZ. Downside of Option 1 is that some people have so little faith in consensus that they're incapable of agreeing to something in principle unless they already know what consensus has decided... Happy‑melon 15:46, 8 January 2009 (UTC)
- There's a very simple test you can do: any image used in an article must be indexed. Anything else may be noindexed, based on consensus. --Carnildo (talk) 23:49, 8 January 2009 (UTC)
- That's a really sensible take on it, assuming that the non-free aspect of it will likely never get consensus. Anything in main space really has to be indexed, with no user (including Jimmy) having any power to supercede it. rootology (C)(T) 16:21, 19 January 2009 (UTC)
- There's a very simple test you can do: any image used in an article must be indexed. Anything else may be noindexed, based on consensus. --Carnildo (talk) 23:49, 8 January 2009 (UTC)
Possible file: criteria?
[edit]A good way to do this is to think of common criteria that will capture most media (ideally allowing INDEX/NOINDEX to be applied within some existing template), then handle the hopefully few exceptions by consensus. I'm not an expert, but possibly these are criteria which might categorize most of our locally-hosted media reasonably well:
- If it's a free image used in an article - always spider
- If it's a non-free image, or PD issues (PD-USA but not PD elsewhere) used in an article - up for debate (should we spider it separately from its article?) - provisional "leave as is, unchanged, separate decision if needed for this"
- If it's used in a policy or guideline - always spider (same as the policy or guideline itself)
- Voice recordings of a spidered page - Always spider (voice recordings of Wikipedia pages are likely free, but check)
- Unrestored versions of media that are used in a spidered page, if and only if the media is public domain in the United States but not elsewhere (otherwise host on Commons) - always spider
- Featured pictures and other featured media (hosted locally even if on Commons) - always spider
- Material that doesn't meet the above but hasn't got copyright "issues" and is a candidate for moving to Commons, until the move is done or it's decided it won't be moved - spider if desired
- If it's not used in a spidered page, it probably shouldn't be spidered
- Other?
I'm thinking most images outside these criteria will be either on Commons (hence outside this proposal) or used in userspace only (if it's not used in a spidered page it probably shouldnt be spidered). As a basic starting point - any good? If not, leave it unchanged rather than hold up the rest for this one. FT2 (Talk | email) 00:36, 9 January 2009 (UTC)
- I think these are good, but possibly a bit prescriptive. It's probably easiest to be as general as possible: treat them as the most general possible categories. At the highest level we have "free" and "non-free"; if the consensus to noindex nonfree content goes through, this is a very easy distinction: all non-free content is noindexed. Of the free content, some is 'used' (by which I take "is linked directly from") in articles and some is not: I think that saying "all free media used in articles should be indexed" would not be too controversial. The remaining stuff is a mixture of unused encyclopedic free content (that should really be on commons), maintenance stuff like WikiProject logos and the bug reports I linked earlier, and other random junk. I think it's probably safest to say "this material should be noindexed unless there is consensus otherwise". Any encyclopedic material in those dregs shouldn't be there, it should be on Commons instead, so we don't have too much to worry about there. Obviously if those 'dregs' turn out to contain five thousand encyclopedic images, we'll have to rethink. But I think a 'flowchart' approach to this is probably better than trying to cover individual cases. Happy‑melon 10:32, 9 January 2009 (UTC)
Talk namespace
[edit]I don't believe this is unspidered by Google, unless it was done very recently. The problem is the search indicated is mis-formed (bad syntax). Try this experiment - pick a talk page with posts back to 2007 or earlier, pick a distinctive phrase from that page, and search for it (in quote marks) on wiki.riteme.site. Example from Talk:Science
- "world is destroyed by technology" site:wiki.riteme.site
- Google search: link
- Result: "This is Google's cache of http://wiki.riteme.site/?title=Talk:Science. It is a snapshot of the page as it appeared on 20 Sep 2008..."
As well, other search engines may not do this. To avoid confusion I'm removing the note until confirmed. FT2 (Talk | email) 23:38, 8 January 2009 (UTC)
Strongly support noindexing of "user" and "user talk"
[edit]Discussion moved to Wikipedia:Requests_for_comment/User_page_indexing.
The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
I'm not sure about other namespaces (in fact I think that "Wikipedia:" and "Help:" should still be indexed), but I definitely think that the "User:" and "User talk:" namespaces should be labelled as "not indexed".
The thing is that quite a lot of our users use their real names as their account names; and since Google and other search engines usually rank all Wikipedia-related pages quite high, this creates a small but significant BLP problem.
There are several scenarios that can lead to the problem. First, the user pages can easily be vandalized. Now, of course, users are supposed to watch their own pages; but there are several reasons why this may not happen. A user may have retired. Or a user might take a wiki-break, or simply not pay attention to what happens on their user page. And, if a google search for that user's (real-life) name returns the vandalized Wikipedia user page as one of the top results, that would be extremely embarassing and unpleasant for the user.
Alternatively, of course, a user might get a genuine (or even mistaken) warning, which again would come up on a Google search page. Believe me it can be extremely unpleasant if a Google search for one's name returns something like "This user is a suspected sockpuppet of ...", or "This user has been blocked indefinitely ...", or something equally negative, as one of the top results. In fact, not such a long time ago there was a problem with an (established) user who complained when his name had appeared on a Wikipedia page titled "BADUSERS" [or something like that].
It will also (hopefully) greatly reduce the temptation for users to create vanity biographies/COI pages/POV-fork pages in userspace; if those are not indexed by main search engines, there would be no real point in that effort.
So, to sum up, the fact that user pages are indexed by Google, combined with the fact that many users choose their real-life names for their accounts, creates a small but definite BLP problem. Of course, you may counter that it is every user's personal choice whether or not to use their real name, and that users are personally responsible for the content of their user and talk pages; but the fact is that many people (especially new users) are unaware that everything they do on Wikipedia will remain forever there, and will remain forever accessible from the search engines, and will in fact be among the top results for searches for their names. And so, in the interest of preserving peace and avoiding this kind of problem, I stronly suggest that marking all "User:" and "User talk:" pages as "not indexed" is a good idea. -- Ekjon Lok (talk) 01:30, 9 January 2009 (UTC)
- This is a good idea for a whole lot of reasons, but I don't think BLP applies to users (someone correct me if I'm wrong, but lets not distract this into a mega BLP debate). rootology (C)(T) 13:46, 9 January 2009 (UTC)
- I don't see why it wouldn't assuming that the user has positively identified himself as a real person. Since it's evidence in a direction we're leaning in anyway, it's not something we need be absolutely certain about. If it is a legitimate BLP issue, then we're making a good-faith attempt to improve our position in that area by noindexing User: space. If not, then the other valid reasons for noindexing still stand. Happy‑melon 14:24, 9 January 2009 (UTC)
- I agree with this part of the proposal (though I disagree with other parts). Wikipedia is not a hosting service. The reasons against indexing userspace are even stronger than the general principle that we should allow indexing of most of our content. I believe that noindexing userspace and selectively noindexing a few specific areas of project space would solve the vast majority of concerns while minimizing the inconvenience to users who like to user search engines to find pages in Wikipedia. Jehochman Talk 14:21, 9 January 2009 (UTC)
- Can someone explain to me why we shouldn't let users index if they so choose? As I understand the status quo, userspace is not indexed but users can choose to index it. If a user is comfortable enough with what is in their userspace that they don't mind it being searchable why should we stop them? JoshuaZ (talk) 18:45, 9 January 2009 (UTC)
- I think the answer is a question: How does it benefit our articles or our millions of readers to have User/User talk indexed? rootology (C)(T) 18:54, 9 January 2009 (UTC)
- Increased transparency. We lose nothing from keeping them searchable. JoshuaZ (talk) 19:00, 9 January 2009 (UTC)
- Transparency for who, though? I'm trying to think of a reason you would be googling users, and can't think of any. Since anyone can arbitrarily {{NOINDEX}} any page under their username they want--I do that on all my sandboxes and nearly all my sub pages already, since I don't want dumps of articles searchable--what transparency benefit is there? The only reason I can think of that people would arguably want SEO magic on sub pages is to have the content on their user/user talk appear on search engines. But what's the benefit of that? For example, User:Rootology/Sandbox 7. That's a dump from a DRV that's on my to-do list to expand, and then bring back to DRV. I have enough sources to honestly get it to GA-level without any serious effort. But as a deleted article, it should not be popping up on Google, so I keep the NOINDEX up on my sandbox navbox template at User:Rootology/Sandboxes. If it's a question of transparency for finding stuff, we already have the prefix URL trick we can do that is better than Google since it lists every sub page automagically. Like so: Special:PrefixIndex/User:Rootology rootology (C)(T) 19:09, 9 January 2009 (UTC)
- If we will continue to allow users to use {{NOINDEX}} then I'm fine with that. JoshuaZ (talk) 19:17, 9 January 2009 (UTC)
- No, that's the backwards way. It's better to have them use an {{INDEX}} situation instead, since there is no benefit to be had from passively putting all this content out there for casual readers who are unlikely to know how to do things like a PREFIX or contribution history search. And, using an INDEX (which will be much more limited in scope by it's very nature) will allow people to review what is being flagged for search engine pickup. If something inappropriate is picked up then (like an AFD'd or DRV'd article) anyone can per policy simply pull the INDEX template off that page then. rootology (C)(T) 19:48, 9 January 2009 (UTC)
- Ok. You've convinced me. JoshuaZ (talk) 20:15, 11 January 2009 (UTC)
- No, that's the backwards way. It's better to have them use an {{INDEX}} situation instead, since there is no benefit to be had from passively putting all this content out there for casual readers who are unlikely to know how to do things like a PREFIX or contribution history search. And, using an INDEX (which will be much more limited in scope by it's very nature) will allow people to review what is being flagged for search engine pickup. If something inappropriate is picked up then (like an AFD'd or DRV'd article) anyone can per policy simply pull the INDEX template off that page then. rootology (C)(T) 19:48, 9 January 2009 (UTC)
- If we will continue to allow users to use {{NOINDEX}} then I'm fine with that. JoshuaZ (talk) 19:17, 9 January 2009 (UTC)
- Transparency for who, though? I'm trying to think of a reason you would be googling users, and can't think of any. Since anyone can arbitrarily {{NOINDEX}} any page under their username they want--I do that on all my sandboxes and nearly all my sub pages already, since I don't want dumps of articles searchable--what transparency benefit is there? The only reason I can think of that people would arguably want SEO magic on sub pages is to have the content on their user/user talk appear on search engines. But what's the benefit of that? For example, User:Rootology/Sandbox 7. That's a dump from a DRV that's on my to-do list to expand, and then bring back to DRV. I have enough sources to honestly get it to GA-level without any serious effort. But as a deleted article, it should not be popping up on Google, so I keep the NOINDEX up on my sandbox navbox template at User:Rootology/Sandboxes. If it's a question of transparency for finding stuff, we already have the prefix URL trick we can do that is better than Google since it lists every sub page automagically. Like so: Special:PrefixIndex/User:Rootology rootology (C)(T) 19:09, 9 January 2009 (UTC)
- Increased transparency. We lose nothing from keeping them searchable. JoshuaZ (talk) 19:00, 9 January 2009 (UTC)
- I think the answer is a question: How does it benefit our articles or our millions of readers to have User/User talk indexed? rootology (C)(T) 18:54, 9 January 2009 (UTC)
A variation of this discussion was recently opened at Wikipedia talk:What Wikipedia is not#a question of policy concerning WP:SOAP and google indexing of user pages. The dominant concern there was not the specific user or user talk pages but the ability to abuse the permissiveness that we give to user sub-pages. That is, a user can create a "draft" in a sub-page on practically any non-notable or fringe topic and gain the benefits of Wikipedia's credibility without the oversight or review of content that is in the mainspace.
If that is a valid concern (and the evidence suggests that it should be), then in my opinion not only should User and User Talk pages be defaulted to NOINDEX, but the ability to override it with the INDEX function should probably be disabled. While it's theoretically possible to review for inappropriate use of INDEX, I don't think that our editors have that much free time. I don't see sufficient benefit to indexing of the userspace to justify the review effort. Rossami (talk) 23:51, 24 June 2009 (UTC)
- addendum - Just saw the argument below that allowing indexing is good for those selected Foundation members who need to publicize their contact information. That's such a limited number of people, I think we can find a work-around that doesn't continue to expose the project to continued user sub-page abuse. Rossami (talk) 23:56, 24 June 2009 (UTC)
- I've strongly supported NOINDEXing pages by default and providing an INDEX tag for those who want it (I did create the {{INDEX}} tag). It just makes common sense that we want to push our content above other pages we happen to have. MBisanz talk 00:04, 25 June 2009 (UTC)
Comment: A related discussion is also occuring at Wikipedia:Village pump (policy)#Mandatory / Automatic NOINDEX of user space pages. —Preceding unsigned comment added by Rossami (talk • contribs) 12:53, Jun 25, 2009
- Strongly support - as long as people are allowed to keep spam, NPOV violations, biased essays, etc., in their userpages and subpages, they can and do gain important mindspace in Google and elsewhere. NOINDEX is long overdue in this matter. --Orange Mike | Talk 14:13, 25 June 2009 (UTC)
- Strongly opppose any new no-indexing ventures, and strongly support re-indexing most of what we've no-indexed. Let Google do what it was designed to do. –xenotalk 14:34, 25 June 2009 (UTC)
- Also oppose per xeno above. R. Baley (talk) 18:09, 25 June 2009 (UTC)
- Strongly Support default no-indexing of User Pages. Google was designed to search within the limits set by each site. Wikipedia should have google index those pages which Wikipedia wants representing the Wikipedia encyclopedia, not the personal pages of each user that edits on Wikipedia. If certain users need to be indexed, then let them justify why they need to be in google, and have them use the {{INDEX}} which can be easily monitored.--stmrlbs|talk 18:16, 25 June 2009 (UTC)
- RFC on WP:Soap, Google, User Pages - with google example of what can happen
- Mandatory / Automatic NOINDEX of user space pages
- Strong support. I was under the impression that _noindex_ was the norm here. When did that get changed, or didn't it ever happen? Userspace shouldn't be indexed by Google. -- Brangifer (talk) 05:32, 26 June 2009 (UTC)
- I see that I was mistaken. According to this thread, it is external links that "have rel=nofollow set, there will be little to no SEO benefit", which is perfectly fine. I still support that userspace not be indexed. -- Brangifer (talk) 05:36, 26 June 2009 (UTC)
- Strong support I am fine with allowing selective {{INDEX}} as this then creates a place where we can monitor it. Unomi (talk) 09:40, 26 June 2009 (UTC)
- Strongly Support noindex. I hate that this discussion is happening in 3 different places. Anyway, as long as we are going to let people keep promotional "fake articles" in userspace, such as User:JRC3 it should not be indexed. It's bad enough that many sites will mirror userspace and then allow it to be indexed. Gigs (talk) 14:11, 26 June 2009 (UTC)
- Oppose vehemently. This removes from the informational commons vital background context. Issues with individual pages should be dealt with on an individual basis. This proposal is another ominous retreat from the principles of the Wikipedia project. If in doubt, always defer to a more open system and trust that it will self-regulate against harm. Skomorokh 15:56, 26 June 2009 (UTC)
- Support i see no reason why user pages or user talk pages SHOULD be indexed... if there is a strong reason for why a page should be indexed then we could one a case by case basis allow {{INDEX}} 70.71.22.45 (talk) 17:05, 26 June 2009 (UTC)
- Support per Rootology, et al. Give users the option to add {{INDEX}}} though and they can be judged on their appropriateness on a case by case basis. -- əʌləʍʇ əuo-ʎʇuəʍʇ ssnɔsıp 17:35, 26 June 2009 (UTC)
Discussion moved to Wikipedia:Requests_for_comment/User_page_indexing.
Strengthen language
[edit]I broadly support this proposal as I commented on the village pump, however I would like the language to be strengthened a little in two areas. On Categories instead of saying "all other categories (i.e. content categories) remain Indexed by default." it should be along the lines of "all other categories (i.e. content categories) cannot be overridden and shall remain indexed." On Files similarily it should be clear that free media used in mainspace cannot be overridden and will remain indexed. This clearly seems to be to be the intent of the proposal from my read, but it should be made clear so that we do not have to keep opposing proposal after proposal for this or that category/image to be noindexed. Davewild (talk) 18:44, 9 January 2009 (UTC)
- The issue is that to say they "cannot" be overridden is incorrect; adding a NOINDEX tag to a free image will despider them just as effectively as a non-free image. Saying "cannot" implies a technical safeguard that does not in fact exist. What it could be strengthened to, however, is "should not": this is as prescriptive as we need to go, and is as prescriptive as any other policy or guideline. I envisage that this page will be part technical proposals, which will be discussed, decided upon, and then reduced to a short explanation of the situation, and part a policy/guideline governing how INDEX/NOINDEX is applied to the namespaces where overrides are enabled. Happy‑melon 21:42, 9 January 2009 (UTC)
- Agree with you, that "should not" is a much better form of words than mine as you say. I've been bold and changed the category wording. Davewild (talk) 21:58, 9 January 2009 (UTC)
Projectspace
[edit]Jehochman posted -
- Concurring opinion by Jehochman:
- I oppose default noindexing of the Wikipedia, Talk, Wikipedia talk, Portal talk, and Category talk name spaces. We should default to indexing, and selectively noindex any such pages that excessively risk harm to the reputations of living people. My reasons are that people should be free to use whatever search tool they like to search Wikipedia, and indexing of discussions and processes improves the transparency of Wikipedia's operations. I support the other aspects of this proposal. Jehochman Talk 14:36, 9 January 2009 (UTC)
I've moved this here for now, as it is much more a discussion point, and it's considerably too conditional to be useful on the proposal page. It's also a point raised here too, and doesn't need (at draft stage) raising both on the page and its talk page. Either a comment for discussion, or a view for the talk page, right now, and putting it here will help whichever it is. FT2 (Talk | email) 03:44, 10 January 2009 (UTC)
- I've restored it to the project page. You do not WP:OWN this page. You do not get to post favorable views there, and move other views off to a less visible corner. Jehochman Talk 03:47, 10 January 2009 (UTC)
- No. It's because a (brief) indication of some endorses may help others. You'll notice the brevity of this (self included). A long post as above is inappropriate:
- FT2 - 0 words
- Lar - 0 words
- Daniel - 0 words
- Durova - 21 words (and even then a bit too much)
- Coren - 22 words (and even then a bit too much)
- Jehochman - 83 words (!!)
- No. It's because a (brief) indication of some endorses may help others. You'll notice the brevity of this (self included). A long post as above is inappropriate:
- If you want to add yourself, write it as "partial endorse - see talk page [LINK]"; the partial-ness is fine, it's the length that isn't. A few sigs, not a few essays. FT2 (Talk | email) 04:04, 10 January 2009 (UTC)
- Fixed, I think. See below. FT2 (Talk | email) 04:33, 10 January 2009 (UTC)
- If you want to add yourself, write it as "partial endorse - see talk page [LINK]"; the partial-ness is fine, it's the length that isn't. A few sigs, not a few essays. FT2 (Talk | email) 04:04, 10 January 2009 (UTC)
Jehochman, thanks for posting an objection here. I copy below my earlier posting at the village pump about NOINDEXing projectspace:
- This is a tool for the Ministry of Truth. Though no cabals exist now, one may exist in the future, that chooses to make a future Essjay an unperson. Shutting out robots would help such a breakdown of the wiki way.
- During the IWF Virgin Killer incident, a UK Google user would have been misled into believing that Wikipedia had no public discussion about the album cover, and may not have found Wikipedia's own search engine. Readers should be able to use the search engine of their choice, to search the public discussions of the encyclopedia community, as well as the encyclopedia itself.
- --Hroðulf (or Hrothulf) (Talk) 17:30, 12 January 2009 (UTC)
Endorsers
[edit]I am moving these comments here to treat the matter fairly. If my view cannot be posted on the project page, neither should these views be posted. Nobody WP:OWNs this page.
- Proposed/endorsed by:
- FT2 (Talk | email) 01:09, 6 January 2009 (UTC)
- ++Lar: t/c 05:22, 6 January 2009 (UTC)
- Daniel (talk) 07:17, 6 January 2009 (UTC)
- Slight inconvenience in searching non-content namespaces is outweighed by reduction in potential damage to users who edit under real names. DurovaCharge! 00:59, 9 January 2009 (UTC)
- Concur, and indeed even discussion of living persons as oftentimes happens on noticeboards and other project space pages is also thus protected. — Coren (talk) 03:15, 10 January 2009 (UTC)
Done. Jehochman Talk 03:55, 10 January 2009 (UTC)
- See below and your talk page, this may solve it. FT2 (Talk | email) 04:33, 10 January 2009 (UTC)
User comments moved from proposal
[edit]The following users added comments to their endorsement. With permission of Coren and Durova, their extended comments have been moved to the talk page.
- Durova - "Slight inconvenience in searching non-content namespaces is outweighed by reduction in potential damage to users who edit under real names."
- Coren - "Concur, and indeed even discussion of living persons as oftentimes happens on noticeboards and other project space pages is also thus protected.
- Jehochman (partial support only) - "I oppose default noindexing of the Wikipedia, Talk, Wikipedia talk, Portal talk, and Category talk name spaces. We should default to indexing, and selectively noindex any such pages that excessively risk harm to the reputations of living people. My reasons are that people should be free to use whatever search tool they like to search Wikipedia, and indexing of discussions and processes improves the transparency of Wikipedia's operations. I support the other aspects of this proposal."
I have not (yet) added Jehochman's signature back, but left it to him, but it's clear he will likely wish to be added, possibly with a note that it is a partial endorsement only, and likewise a link to comments. His comment is copied above, so it can be compared on exact equal footing with others who made comments. See extended note at User talk:Jehochman#Endorsement for details.
FT2 (Talk | email) 04:33, 10 January 2009 (UTC)
- Why do we need user comments, endorsements, opposition, etc, on the main page at all? Surely that's what this talk page is for. The project page should be a clear and definitive statement of what is being proposed, which can be discussed and/or torn apart on the talk page at leisure; putting user comments, endorsements, and arguments, on the project page itself is an invitation for the sort of dispute to be found above. Happy‑melon 10:21, 10 January 2009 (UTC)
- I agree. Let's create sections here for people to endorse, discuss, or oppose the proposal. Jehochman Talk 14:38, 10 January 2009 (UTC)
Plan A
[edit]Proposed/endorsed by:
- FT2 (Talk | email) 01:09, 6 January 2009 (UTC)
- ++Lar: t/c 05:22, 6 January 2009 (UTC)
- Daniel (talk) 07:17, 6 January 2009 (UTC)
- DurovaCharge! 00:59, 9 January 2009 (UTC) (See user's comments on talk page)
- Coren (talk) 03:15, 10 January 2009 (UTC) (See user's comments on talk page)
Plan B
[edit]I generally like the idea of having a policy on search engine indexing, but would prefer the following scheme:
Namespace | Default state | Override allowed? | Notes |
---|---|---|---|
Mainspace | Indexed | No | |
User: | Noindexed | No | Currently NoIndexed, may be overridden. |
Wikipedia: | Indexed | Yes |
|
File: | Indexed | Yes | Some content (non-encyclopedic material such as bug reports, internal project logos, etc) may be noindexed on a consensus basis. A discussion of NOINDEXing non-free media is likely to take place, separately to this proposal. |
Mediawiki: | Noindexed | No | |
Template: | Noindexed | No | |
Help: | Indexed | No | |
Category: | Indexed | Yes | 'Maintenance' categories will be manually NOINDEXed, all other categories (i.e. content categories) should not be overridden and shall remain Indexed. |
Portal: | Indexed | Yes | |
All Talk namespaces:(Talk:, Wikipedia talk:, File talk:, etc) where the related namespace is indexed. | Indexed | Yes |
- Endorsements
-
- Jehochman Talk 14:43, 10 January 2009 (UTC)
- Strongly prefer to Plan A for reasons noted on the VP poll. We should accept that part of wikipedia's challenge to the world is that we can manage a user generated encyclopedia without a clearly defined hierarchy. The execution of that, warts and all, needs to be left scrtuable to the world (of course NOINDEXING it still allows viewing, but a good deal of queries on the subject will come from searches). Protonk (talk) 18:21, 20 January 2009 (UTC)
- Comments
- I would have thought that noindexing talk namespaces was the least controversial aspect of this proposal... Happy‑melon 15:47, 10 January 2009 (UTC)
- Google will ignore talk pages if they think they are not useful. My feeling is that we should index everything, except those things demonstrated to be inappropriate for indexing. Talk pages are generally innocuous. Therefore, they should be indexed. The fact that an issue was discussed at Wikipedia but not added to an article, for instance because it was not reliable information, may be of interest to our readers. I would like as much transparency as possible in our editorial processes. Jehochman Talk 15:56, 10 January 2009 (UTC)
- Google isn't going to ignore anything for that reason; they (possibly, we're no longer sure) noindexed the Talk: namespace because it was dumping massive quantities of junk at the top of their search results. That the world's largest search engine thought that we were landing them with enough unencyclopedic content to be a problem indicates that there is something we should be concerned about here. It seems that you fundamentally disagree with the rest of us on which direction we should be approaching this from, and of course that's fine, so I guess we'll just have to agree to disagree. I don't believe that we should leave the decision of what's "useful" to google; we should be making a proactive decision on our own as to what we want to advertise at the very top of pretty much any search you care to think of. Happy‑melon 20:46, 11 January 2009 (UTC)
- Two issues: I'd be more willing to support this if one could override user space to allow indexing there. I also see absolutely no reason not to index template space. JoshuaZ (talk) 20:41, 11 January 2009 (UTC)
- When would we ever have content in the User: namespace that we could be proud of but which was not appropriate for one of the indexed namespaces? I think the reasons given in the discussion above are convincing that we shouldn't be indexing Template:, certainly by default, although I wouldn't have a heart-attack if we made it overrideable. Again, what content would be in the Template: namespace that we want to showcase, but which isn't suitable for an indexed namespace? Happy‑melon 20:46, 11 January 2009 (UTC)
- The way I see it the fundamental argument allowing searches in general is that it increases transparency which is something this project cares a lot about. If a user wishes to increase transparency by indexing their user section I see no reason not to. I agree that in general templates will not need to be indexed but I see no reason to insist that that never occur. JoshuaZ (talk) 20:53, 11 January 2009 (UTC)
- When would we ever have content in the User: namespace that we could be proud of but which was not appropriate for one of the indexed namespaces? I think the reasons given in the discussion above are convincing that we shouldn't be indexing Template:, certainly by default, although I wouldn't have a heart-attack if we made it overrideable. Again, what content would be in the Template: namespace that we want to showcase, but which isn't suitable for an indexed namespace? Happy‑melon 20:46, 11 January 2009 (UTC)
User benign | User malignant | |||
---|---|---|---|---|
Index | Noindex | Index | Noindex | |
Content benign | 2 | -1 | -1 | 0 |
Content malignant | -1 | 0 | -2 | 0 |
- Think about it as a matrix: we can either allow overrides or not, and the userpage that the user tries to index can be either benign or harmful to our image, and the user themselves can be either acting in good faith, or be trying to take advantage of wikipedia's awesome PageRank score for their own ends (be that avertising, SEO, spam, whatever). Think of the possible upsides and downsides in each case: I've tried to put it in a table. If you agree with my somewhat arbitrary numbers, game theory says you should go for 'noindex' unless the probability of the user/content not being benign is really really small. Of course that's just a rather random example, but the point is that userspace is where we have the least control through policy and policing of what's present there, and the least to gain from showcasing that area to the world. Let's be honest, it's not the most savoury of places. Happy‑melon 21:15, 11 January 2009 (UTC)
- SEO'ing is at this point a non-issue since all external links are nofollow. I think you drastically overestimate the likely hood of malicious content. One way to solve this would be to have the INDEX template automatically include userspace where it appears in a category of indexed userpages. People can then keep track of how they are used. The probability of actual damage is unlikely since random users trying to take advantage of Wikipedia will likely have no knowledge of how to index pages anyways (look at for example the extreme cluelessness of the vast majority of people trying to use Wikipedia as a webhost) JoshuaZ (talk) 21:21, 11 January 2009 (UTC)
- I guess I'm not fundamentally opposed to the idea, as long as we keep a close eye on what's actually being indexed from that namespace. I've just filed a bug (T18979) for a tracking category a la Category:Hidden categories, which we can use as a check to ensure that all the pages using __NOINDEX__ actually use
{{NOINDEX}}
, to which we can add more fancy category tracking. As long as there's an easily-accessible list of what's going out to Google, I can see that there is minimal risk of us playing up something undesirable. So shall we set Template: and User: to 'noindex, overrideable'?? Happy‑melon 22:37, 11 January 2009 (UTC)- That would be my preference. JoshuaZ (talk) 22:40, 11 January 2009 (UTC)
- I guess I'm not fundamentally opposed to the idea, as long as we keep a close eye on what's actually being indexed from that namespace. I've just filed a bug (T18979) for a tracking category a la Category:Hidden categories, which we can use as a check to ensure that all the pages using __NOINDEX__ actually use
Why are the mediawiki and template spaces being NOINDEXed in plan b? Protonk (talk) 18:22, 20 January 2009 (UTC)
- Because they don't contain content that we want to present as our 'finished product'; they contain material that doesn't make sense to outsiders except in the context of our other content. See #Template: namespace? above for some more discussion over the Template: namespace. MediaWiki: namespace also has issues with actively undesirable content, as pages such as eg the spam, title and image blacklists contain unsavoury content, but cannot be easily marked with __NOINDEX__ as the contents of the page A) have wide-ranging impact and B) are not actively parsed. Using the NOINDEX/INDEX tags are particularly problematic in the MediaWiki namespace as they could easily end up being inserted onto every page or something similar, and since it's the very bottom of the heap in terms of reader-facing-ness, it makes every sense to noindex it by default. Happy‑melon 22:58, 20 January 2009 (UTC)
Override for userpages
[edit]From a comment I left to the village pump
There are definitely userpages that need indexing, for example userpages from WMF officials since they provide specific contact information, and to a lesser extent OTRS volunteers and other similar positions. I also don't see strong reasons not to allow users in good-standing to index their pages. The INDEX magic word is searchable so this can be controlled. Cenarium (Talk) 01:09, 12 January 2009 (UTC)
- That's basically what is discussed in this section. rootology (C)(T) 01:12, 12 January 2009 (UTC)
- Well, this is my position: noindex userspace by default, but allow local override. Cenarium (Talk) 03:30, 13 January 2009 (UTC)
- Thats the consensus on that section. :) rootology (C)(T) 14:53, 13 January 2009 (UTC)
- I wouldn't be so quick to proclaim that. Gigs (talk) 14:16, 26 June 2009 (UTC)
- Thats the consensus on that section. :) rootology (C)(T) 14:53, 13 January 2009 (UTC)
- Well, this is my position: noindex userspace by default, but allow local override. Cenarium (Talk) 03:30, 13 January 2009 (UTC)
Portal: and Help:
[edit]We're currently proposing to treat these namespaces differently, although they both contain pure reader-facing content. I don't really see why this should be the case. My position is probably that we don't need NOINDEX in the portal namespace, so it should be disabled. Thoughts? Happy‑melon 18:05, 18 January 2009 (UTC)
Is there a consensus?
[edit]Do we have something like a consensus for this policy as it currently stands? JoshuaZ (talk) 01:45, 9 February 2009 (UTC)
- Probably not, but mainly through lack of participation. We need to finish cleaning up the proposal, then do a media blitz to get some community participation. Happy‑melon 08:47, 9 February 2009 (UTC)
- Personally I find this misguided. Partly cos the vast number of talk pages must help the page-rank of the rest of Wikipedia but also most discussion forums on the web are indexed. Is there any reason why some other website could not put all the Wikipedia talk pages up as indexed content? --BozMo talk 19:46, 25 February 2009 (UTC)
- Agree with Happy melon - there's no consensus here. Where was this publicized? I just found this page from an off-hand comment at one of the pumps. –xenotalk 14:38, 25 June 2009 (UTC)
User pages
[edit]The proposal as written suggests currently user pages are not indexed but user talk pages are. This appears to be the opposite of what's going on in practice Nil Einne (talk) 11:22, 4 May 2009 (UTC)
Just a note
[edit]I think the high WP: page search rankings are also responsible for all the usual problems at Wikipedia:Local Embassy if that hasn't been mentioned already. I'd propose just noindexing that page immediately. Sillyfolkboy (talk) (edits) 17:23, 7 May 2009 (UTC)
Any idea why non-Latin terms don't index?
[edit]Any idea why non-Latin alphabet terms don't seem to index in Google? If I search for a string of text in an article like "Dìbǐlìsī - 第比利斯 (simplified characters) (Chinese), Gürƶex - Гуьржех (Chechen), Guržeğe - ГуржегӀе (Ingush)", the Wikipedia page doesn't come up but the mirrors do. Sometimes it will show the non-Latin term as "missing" (like this) and the mirrors come up again. What gives? (Please ping me if you know.) — AjaxSmack 00:50, 4 May 2018 (UTC)
Why Article not appear in Google Search?
[edit]Does anyone know why article not appear in Google Search? Happygirl1976 (talk) 14:19, 18 June 2020 (UTC)
Approved article but not index on google
[edit]Cri6 was accepted by by member(s) of WikiProject Articles for creation on 11 November but I've noticed today that it's not indexed on Google while Talk:Cri6 indexed! Could anyone please index it? Many thanks Artinnit (talk) 19:57, 18 November 2020 (UTC)