Wikipedia:Computer-generated content

The following is a draft working towards a proposal for adoption as a Wikipedia policy.
The proposal must not be taken to represent consensus, but is still in development and under discussion, and has not yet reached the process of gathering consensus for adoption. Thus references or links to this page should not describe it as policy, guideline, nor yet even as a proposal.

Shortcut

WP:CGC

This page in a nutshell: Computer algorithms are increasingly used to augment human-generated material and Wikipedia editors are fully responsible for ensuring the quality of all content submitted by such means and the veracity any claims made about it.

Various types of computer algorithms, neural networks, machine learning, and approaches sometimes termed artificial intelligence are used to assist humans in generating creative material.

Each editor of Wikipedia is responsible for ensuring that everything they submit to Wikipedia meets its policies, guidelines, standards and best practices, and that any and all facts and claims are accurate.

Content issues

Any computer tool used to generate content should be applied with care, since they can produce material that is biased, non-verifiable, constitutes original research, violates copyrights or plagiarizes, or does not comply with other policies or guidelines.

Behavioral issues

Persistently introducing computer-generated content in violation of Wikipedia policies and guidelines will be considered disruptive editing and can lead to a block from further editing.

Examples include using LLMs^{[clarification needed]} or other algorithms to create promotional or hoax drafts that looks superficially good, but are actually not upon closer inspection.

Examples

Language models and algorithms

Computer algorithms for processing and predicting human language have a long use. More recently, language models are used in many applications, for example machine translation, autocorrect, autocomplete, incremental search, natural language generation, part-of-speech tagging, parsing, optical character recognition (OCR), handwriting recognition, grammar induction, and information retrieval.

As with any content submitted, the editor must carefully scrutinize any part of the content which has been generated by the computer to ensure that it is accurate. Often the use of such tools on Wikipedia may be discouraged due to the poor quality of the output from currently available technology. The use of large language models (e.g. ChatGPT) to create articles would most likely result in various types of erroneous material being submitted if every single word were not carefully scrutinized. The same can be said of machine translation. Because of the pervasive presence of similar technology in everyday tools it is not possible to ban it entirely from Wikipedia, but editors should always be aware of the presence of anything that they themselves did not directly input, and avoid relying on computers as a substitute for their own creativity and mental processes where possible.

Image processing and generation

Traditionally image processing algorithms were designed by humans, but recently machine learning (often called artificial intelligence) has become more common, everywhere from smart phone cameras to image editing software. Editors should be cognizant that any kind of processing done to an image can potentially distort the information conveyed. If image restoration is done it should be disclosed and described in the image file's description and in the captions where it is used in articles. Any other changes that materially affect the appearance of an image should in general be disclosed to the reader along with the presentation of the image.

The use of deep learning models (e.g. DALL-E) to generate entire images, or fill in substantial missing parts of them, or extrapolate from them is not an accepted practice on Wikipedia at this time.

Notes

References