User:PNG recompression
This user account is a bot operated by A proofreader (talk). It is used to make repetitive automated or semi-automated edits that would be extremely tedious to do manually, in accordance with the bot policy. This bot does not yet have the approval of the community, or approval has been withdrawn or expired, and therefore shouldn't be making edits that appear to be unassisted except in the operator's or its own user and user talk space. Administrators: if this bot is making edits that appear to be unassisted to pages not in the operator's or its own userspace, please block it. |
This user is a bot | |
---|---|
(talk · contribs) | |
Operator | A proofreader (t · c) |
Author | idem |
Approved? | no |
Flagged? | no |
Task(s) | Losslessly recompress PNG images |
Edit rate | As fast as possible (will be changed to 1 per 10 seconds) |
Edit period(s) | Once every month |
Automatic or manual? | Automated |
Programming language(s) | Java |
Exclusion compliant? | Yes |
Source code published? | Here (will be moved to Wikipedia after adjustments) |
Emergency shutoff-compliant? | Not yet |
PNG recompression is a bot that will be going through the bot approval process and whose sole purpose will be to losslessly recompress all PNG images on the English Wikipedia using the open-source tools OptiPNG, advdef and advpng.
Results expected
[edit]On average, a PNG image recompressed by this bot is expected to be shrunk by 15% of its size, unless it has been recompressed already, in which case this bot will not re-upload the image.
Caveats
[edit]As it is currently written, the bot uses OptiPNG, which strips all ancillary chunks in PNG images, which may remove the meaning of an image when used in certain pages. For example,
- example images on the article about Portable Network Graphics that are used to demonstrate ancillary chunks;
- example images on the article about gamma correction that contain gAMA chunks;
- example images on the article about the pixel aspect ratio that contain pHYs chunks;
- example images on the articles about ICC profiles, color spaces, chromaticity, and the white point, which may contain iCCP and cHRM chunks;
- example images whose metadata is important to demonstrate the use of metadata in PNG images, in the form of tEXt, iTXt and zTXt chunks.
OptiPNG additionally removes the color data for fully-transparent pixels, which may remove the meaning of certain images, for example
- File:Ipu.png, which is meant to depict an Invisible Pink Unicorn, having pink in the pixel data despite the image having a fully-filled alpha channel.
In all cases, the meaning of other images may be removed, for example
- images whose goal is to show PNG images that are not recompressed, if any exist.
Server load expected
[edit]The initial run will read all images from the wiki using the MediaWiki API; however,
- it will kill connections that are proven not to be downloading PNG files after reading 8 bytes;
- it will not unnecessarily read files whose size is under 8 KB;
- because the content of PNG images is not compressible, gzip compression will not be requested;
- it will only re-upload images if the upload would save over 10% of the original image's size.
Futue runs will be able to skirt many downloads, recompression passes and uploads:
- it can avoid reading a file if its last revision, as indicated by Special:Allpages, was made by this user;
- it can avoid reading a file if the SHA-1 hash of its last revision, as indicated by the MediaWiki API, matches the SHA-1 hash of the last revision it has seen, even if it was not re-uploaded to the wiki because it did not save enough bytes;
- it can avoid reading a file if the timestamp of its last revision, as indicated by Special:Allpages, matches the timestamp of the last revision it has seen, even if it was not re-uploaded to the wiki because it did not save enough bytes.
As this bot is expected to create an additional revision for about half of the PNG images on this wiki, disk usage on the Wikimedia server farm may become a concern.
During the uploads, SHA-1 hashes will be recalculated and some database operations will take place, which may place load on the CPU and disk.
As this bot breaks caching by making browsers download cached images again, and a viewer may be expected to download a few full-sized images per visit, bandwidth on the Wikimedia server farm may become a concern for a short while. This bandwidth spike will be distributed more or less evenly by the fact that not all PNG images are re-uploaded at once.
Source code
[edit]For the time being, the source code for PNG recompression is hosted on an external wiki, on which it is currently running. Please see here for the initial code. Also see PNGOptimisationBot (t · c).
Adjustments to be made
[edit]- Change the upload rate to be 1 in 10 seconds.
- Use the maxlag parameter, requiring a maximum database replication lag of 3 seconds.
- Possibly adjust the ancillary chunks removed by the tools, replacing OptiPNG with a tool that can preserve the chunks.
- Add the ability to disable the bot by posting a message to its talk page.
- Post the source code after modifications on the English Wikipedia.