Commons:Village pump/Proposals
This page is used for proposals relating to the operations, technical issues, and policies of Wikimedia Commons; it is distinguished from the main Village pump, which handles community-wide discussion of all kinds. The page may also be used to advertise significant discussions taking place elsewhere, such as on the talk page of a Commons policy. Recent sections with no replies for 30 days and sections tagged with {{Section resolved|1=--~~~~}} may be archived; for old discussions, see the archives; the latest archive is Commons:Village pump/Proposals/Archive/2025/02.
- One of Wikimedia Commons’ basic principles is: "Only free content is allowed." Please do not ask why unfree material is not allowed on Wikimedia Commons or suggest that allowing it would be a good thing.
- Have you read the FAQ?
![]() |
SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 5 days and sections whose most recent comment is older than 30 days. |
RfC: Changes to the public domain license options in the Upload Wizard menu
[edit]![]() | An editor has requested comment from other editors for this discussion. If you have an opinion regarding this issue, feel free to comment below. |
Should any default options be added or removed from the menu in the Upload Wizard's step in which a user is asked to choose which license option applies to a work not under copyright? Sdkb talk 20:19, 19 December 2024 (UTC)
Background
[edit]The WMF has been (at least ostensibly) collaborating with us during its Upload Wizard improvements project. As part of this work, we have the opportunity to reexamine the step that occurs after a user uploads a work that they declare is someone else's work but not protected by copyright law. They are then presented will several default options corresponding to public domain license tags or a field to write in a custom tag:
It is unclear why these are the specific options presented; I do not know of the original discussion in which they were chosen. This RfC seeks to determine whether we should add or remove any of these options. I have added one proposal, but feel free to create subsections for others (using the format Add license name
or Remove license name
and specifying the proposed menu text). Sdkb talk 20:19, 19 December 2024 (UTC)
Add PD-textlogo
[edit]Should {{PD-textlogo}} be added, using the menu text Logo image consisting only of simple geometric shapes or text
? Sdkb talk 20:19, 19 December 2024 (UTC)
Support. Many organizations on Wikipedia that have simple logos do not have them uploaded to Commons and used in the article. Currently, the only way to upload such images is to choose the "enter a different license in wikitext format" option and enter "{{PD-textlogo}}" manually. Very few beginner (or even intermediate) editors will be able to navigate this process successfully, and even for experienced editors it is cumbersome. PD-textlogo is one of the most common license tags used on Commons uploads — there are more than 200,000 files that use it. As such, it ought to appear in the list. This would make it easier to upload simple logo images, benefiting Commons and the projects that use it. Sdkb talk 20:19, 19 December 2024 (UTC)
- Addressing two potential concerns. First, Sannita wrote,
the team is worried about making available too many options and confusing uploaders
. I agree with the overall principle that we should not add so many options that users are overwhelmed, but I don't think we're at that point yet. Also, if we're concerned about only presenting the minimum number of relevant options, we could use metadata to help customize which ones are presented to a user for a given file (e.g. a.svg
file is much more likely to be a logo than a.jpg
file with metadata indicating it is a recently taken photograph). - Second, there is always the risk that users upload more complex logos above the TOO. We can link to commons:TOO to provide help/explanation, and if we find that too many users are doing this for moderators to handle, we could introduce a confirmation dialogue or other further safeguards. But we should not use the difficulty of the process to try to curb undesirable uploads any more than we should block newcomers from editing just because of the risk they'll vandalize — our filters need to be targeted enough that they don't block legitimate uploads just as much as bad ones. Sdkb talk 20:19, 19 December 2024 (UTC)
- "we could use metadata" I'd be very careful with that. The way people use media changes all the time, so making decisions about how the software behaves on something like that, I don't know... Like, if it is extracting metadata, or check on is this audio, video, or image, that's one thing, but to say 'jpg is likely not a logo and svg and png might be logos' and then steer the user into a direction based on something so likely to not be true. —TheDJ (talk • contribs) 10:52, 6 January 2025 (UTC)
- Addressing two potential concerns. First, Sannita wrote,
Oppose. Determining whether a logo is sufficiently simple for PD-textlogo is nontrivial, and the license is already frequently misapplied. Making it available as a first-class option would likely make that much worse. Omphalographer (talk) 02:57, 20 December 2024 (UTC)
Comment only if this will result in it being uploaded but tagged for review. - Jmabel ! talk 07:14, 20 December 2024 (UTC)
- That should definitely be possible to implement. Sdkb talk 15:13, 20 December 2024 (UTC)
Support Assuming there's some kind of review involved. Otherwise
Oppose, but I don't see why it wouldn't be possible to implement a review tag or something. --Adamant1 (talk) 19:10, 20 December 2024 (UTC)
Support for experienced users only. Sjoerd de Bruin (talk) 20:20, 22 December 2024 (UTC)
Oppose peer Omphalographer ,{{PD-textlogo}} can use with a logo is sufficient simply in majority of countries per COM:Copyright rules (first sentence in USA and the both countries peer COM:TOO) my opinion (google translator). AbchyZa22 (talk) 11:02, 25 December 2024 (UTC)
Oppose in any case. We have enough backlogs and don't need another thing to review. --Krd 09:57, 3 January 2025 (UTC)
- How about we just disable uploads entirely to eliminate the backlogs once and for all?[Sarcasm] The entire point of Commons is to create a repository of media, and that project necessarily will entail some level of work. Reflexively opposing due to that work without even attempting (at least in your posted rationale) to weigh that cost against the added value of the potential contributions is about as stark an illustration of the anti-newcomer bias at Commons as I can conceive. Sdkb talk 21:36, 3 January 2025 (UTC)
Oppose. I think the template is often misapplied, so I do not want to encourage its use. There are many odd cases. Paper textures do not matter. Shading does not matter. An image with just a few polygons can be copyrighted. Glrx (talk) 19:47, 6 January 2025 (UTC)
Support adding this to the upload wizard, basically per Skdb (including the first two sentences of their response to Krd). Indifferent to whether there should be a review process: on one hand, it'd be another backlog that will basically grow without bound, on the other, it could be nice for the reviewed ones. —Mdaniels5757 (talk • contribs) 23:57, 6 January 2025 (UTC)
Support New users which upload logos de facto always use wrong tags such as CC-BY-4.0-own work. Go to bot-created lists such as User:Josve05a/Logos or cats like Category:Unidentified logos, almost all logos uploaded by new users have such invalid licencing - all of which has to be reviewed & fixed at some point. People will upload logos that are too complex/nonfree etc regardless of this option, but adding the option might increase the change that they familarize themselfes with the requirements for uploading logos and apply the correct tag. ~TheImaCow (talk) 21:12, 22 February 2025 (UTC)
- Note that {{PD-textlogo}} should probably be applied together with {{TM}} (possibly restricted by trademark) ~TheImaCow (talk) 21:40, 28 February 2025 (UTC)
General discussion
[edit]Courtesy pinging @Sannita (WMF), the WMF community liaison for the Upload Wizard improvements project. Sdkb talk 20:19, 19 December 2024 (UTC)
- Thanks for the ping. Quick note: I will be on vacation starting tomorrow until January 1, therefore I will probably not be able to answer until 2025 starts, if needed. I'll catch up when I'll have again a working connection, but be also aware that new changes to code will need to wait at least mid-January. Sannita (WMF) (talk) 22:02, 19 December 2024 (UTC)
- Can we please add a warning message for PDF uploads in general? this is currently enforced by abuse filter, and is the second most common report at Commons talk:Abuse filter. And if they user pd-textlogo or PD-simple (or any AI tag) it should add a tracking category that is searched by User:GogologoBot. All the Best -- Chuck Talk 23:21, 19 December 2024 (UTC)
- Yes, please. Even with the abuse filter in place, the vast majority of PDF uploads by new users are accidental, copyright violations, and/or out of scope. There are only a few appropriate use cases for the format, and they tend to be uploaded by a very small number of experienced users. Omphalographer (talk) 03:11, 20 December 2024 (UTC)
- Can we please add a warning message for PDF uploads in general? this is currently enforced by abuse filter, and is the second most common report at Commons talk:Abuse filter. And if they user pd-textlogo or PD-simple (or any AI tag) it should add a tracking category that is searched by User:GogologoBot. All the Best -- Chuck Talk 23:21, 19 December 2024 (UTC)
Comment, the current version of the MediaWiki Upload Wizard contains the words "To ensure the works you upload are copyright-free, please provide the following information.", but Creative Commons (CC) isn't "copyright-free", it is a free copyright ©️ license, not a copyright-free license. I'm sure that Sannita is keeping an eye on this, so I didn't ping
herhim. It should read along the lines of "To ensure the works you upload are free to use and share, please provide the following information.". --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 12:19, 24 December 2024 (UTC)- @Donald Trung: Sannita (WMF) presents as male, and uses pronouns he/him/his. Please don't make such assumptions about pronouns. — 🇺🇦Jeff G. ツ please ping or talk to me🇺🇦 14:02, 24 December 2024 (UTC)
- My bad, I've corrected it above. For whatever reason I thought that he was a German woman because I remember seeing the profile of someone on that team and I probably confused them in my head, I just clicked on their user page and saw that it's an Italian man. Hopefully he won't feel offended by this mistake. Just saw that he's a fellow Whovian, but the rest of the comment remains unaltered as I think that the wording misrepresents "free" as "copyright-free", which are separate concepts. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 14:09, 24 December 2024 (UTC)
- (Hello, I'm back in office) Not offended at all, it happens sometimes on Italian Wikipedia too. Words and names ending in -a are usually feminine in Italian, with some exceptions like my name and my nickname that both end in -a, but are masculine. :) Sannita (WMF) (talk) 13:15, 2 January 2025 (UTC)
- Wiki markup: {{gender:Sannita (WMF)|male|female|unknown}} → male. Glrx (talk) 03:07, 3 January 2025 (UTC)
- (Hello, I'm back in office) Not offended at all, it happens sometimes on Italian Wikipedia too. Words and names ending in -a are usually feminine in Italian, with some exceptions like my name and my nickname that both end in -a, but are masculine. :) Sannita (WMF) (talk) 13:15, 2 January 2025 (UTC)
- My bad, I've corrected it above. For whatever reason I thought that he was a German woman because I remember seeing the profile of someone on that team and I probably confused them in my head, I just clicked on their user page and saw that it's an Italian man. Hopefully he won't feel offended by this mistake. Just saw that he's a fellow Whovian, but the rest of the comment remains unaltered as I think that the wording misrepresents "free" as "copyright-free", which are separate concepts. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 14:09, 24 December 2024 (UTC)
Unarchiving, as unclosed. Sdkb talk 07:50, 6 February 2025 (UTC)
Category naming for proper names
[edit]There are currently multiple CfD disputes on the naming of categories for proper names (Commons:Categories for discussion/2024/12/Category:FC Bayern Munich and Commons:Categories for discussion/2024/12/Category:Polonia Warszawa). The problem is caused by an unclear guideline. At COM:CAT the guideline says: "Category names should generally be in English. However, there are exceptions such as some proper names, biological taxa and names for which the non-English name is most commonly used in the English language". The first problem is that sometimes people do not notice that there is no comma before the "for" and think that the condition applies for all cases. This might also be caused by some wrong translations. The other problem is the "some" as there are no conditions defined when and when not this applies. I think we have four options:
- Translate all proper names
- Translate proper names when English version is commonly used (enwiki uses a translated name)
- Do not translate proper names but transcribe non Latin alphabets
- Always use the original proper name
Redirects can exist anyways. The question what to do with locations they have multiple official local names in multilingual regions is a different topic to be discussed after there is a decision on the main question. GPSLeo (talk) 11:40, 28 December 2024 (UTC)
- I don't think it's a bad thing that the rule gives room for case-by-case decisions. The discussions about this are very long, but it's rarely about a real problem with finding or organising content. So my personal rule would be something like ‘If it's understandable to an English speaker, is part of a subtree curated by other users on an ongoing basis, and you otherwise have no engagement in that subtree, don't suggest a move just because of a principle that makes no difference here.’ Rudolph Buch (talk) 14:37, 28 December 2024 (UTC)
- 100% That should be the standard. People are to
limp wristedweak when it comes to dealing with obviously disingenuous behavior or enforcing any kind of standards on here though. But 99% of time this is only a problem because someone wants to use category names as their personal nationalist project. It's just that no one is willing to put their foot down by telling the person that's not what categories are for. Otherwise this would be a nonissue. But the guideline should be clear that category names shouldn't be in the "native language" if it doesn't follow the category tree and/or is only being done for personal, nationalistic reasons. --Adamant1 (talk) 18:40, 28 December 2024 (UTC)
- 100% That should be the standard. People are to
- I think that at least in most cases the right answer is something like #2 except:
- I wouldn't always trust en-wiki to get it right, especially on topics where only one or two editors have ever been involved there, and we might well have broader and more knowledgeable involvement here.
- Non-Latin alphabets should be transliterated.
- The thing is, of course, that is exactly the one that frequently requires judgement calls, so we are back where we started.
- Aside: in my experience, some nationalities (e.g. German) have a fair number of people who will resist the use of an English translation no matter how common, while others (e.g. Romanian) will "overtranslate". On the latter, as an American who has spent some time in Romania, I'm always amazed when I see Romanians opt for English translations for things where I've always heard English-speakers use the Romanian (e.g. "Roman Square" for "Piața Romana"; to my ear, it is like calling the composer Giuseppe Verdi "Joseph Green"). - Jmabel ! talk 18:59, 28 December 2024 (UTC)
- I've made the sentence in COM:CAT, quoted by OP, into a list, to remove ambiguity. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:56, 12 January 2025 (UTC)
Oppose all four suggested solutions because they would be too disruptive, each. Several outcomes are possible: top-down regulations to be mass-executed by a handful of editors who could instead do more meaningful work, while a large group of editors raises up in protests against a consensus that they didn't participate in (see recently: "Historical images of..."), is one possible outcome. Another possibility is a toothless rule that is generally ignored in practice, except when it can be wielded to cudgel others.
The current rule for proper names is sufficient and remains flexible enough to be handled by contributors of all kinds. My arguments against each of the four general ideas: Solution 1 and 2 are hardheaded English WP supremacy, those should be discarded right away, Commons is a multilingual community project. Solution 3 sounds best at first, but it is again inflexible: Non-Latin category names exist by the thousands - Cyrillic and East Asian publications most prominently (Category:武州豊嶋郡江戸庄図, note how the Japanese title uses Kanzi), and it's not as if transliterated-to-Pinyin Chinese is easier to understand for non-writers. That means (imo) that on the lowest levels in the category tree, native names should be allowed in whatever script, as long as the generic parent categories like "Category:Books from Russia about military" that would be used to navigate the cat-tree are still in English. Regarding solution #4: "Always proper names" could be interpreted by some to raise language barriers against foreign editors on a much higher level: I prefer to find Chinese provinces under English categories names like "Anhui" and "Guangdong", not as Category:北京. I prefer Arabic personal names transliterated (by whatever method, even), and so on. --Enyavar (talk) 22:10, 27 January 2025 (UTC)
A large group of editors raises up in protests against a consensus that they didn't participate in (see recently: "Historical images of...")
@Enyavar: Two users threw a tantrum about it four years later. There wasn't, and still isn't, large objections to the outcome of the CfD though. The main issue is that large changes to existing category systems like these just don't scale. It's impossible to police or otherwise deal with thousands of categories. With "Historical images of" categories specifically, they are being created at a much quicker peace then the opposite and it's impossible to get people to stop creating them. There's a fundamental issue on here where it's impossible to cleanup or otherwise deal with bad categorization systems once they are created and reach a certain threshold of usage. Realistically know one from Germany is going to police how other German users name categories. They just demagogue and otherwise throw fits about it anytime someone tries to standardize things. --Adamant1 (talk) 06:32, 23 February 2025 (UTC)
RfC: Should Commons ban AI-generated images?
[edit]![]() | An editor has requested comment from other editors for this discussion. If you have an opinion regarding this issue, feel free to comment below. |
Should Commons policy change to disallow the uploading of AI-generated images from programs such as DALLE, Midjourney, Grok, etc per Commons:Fair use?
Background
[edit]AI generated images are a big thing lately and I think we need to address the elephant in the room: they have unclear copyright implications. We do know that in the US, AI-generated images are not copyrighted because they have no human author, but, they are still very likely considered derivative works of existing works.
AI generators use existing images and texts in their datasets and draw from those works to generate derivatives. There is no debate about that, that is how they work. There are multiple ongoing lawsuits against AI generator companies for copyright violation. According to this Washington Post article, the main defense of AI generation rests on the question of if these derivative works qualify as fair use. If they are fair use, they may be legal. If they are not fair use, they may be illegal copyright violations.
However, as far as Commons is concerned, either ruling would make AI images go against Commons policy. Per Commons:Fair use, fair use media files are not allowed on Commons. Obviously, copyright violations are not allowed either. This means that of the two possible legal decisions about AI images, both cannot be used on Commons. There is no possible scenario where AI generated images are not considered derivative in some way of copyrighted works; it's just a matter of if it's fair use or not. As such, I think that AI-generated images should be explicitly disallowed in Commons policy.
Discussion
[edit]Should Commons explicitly disallow the uploading of AI-generated images (and by proxy, should all existing files be deleted)? Please discuss below. Di (they-them) (talk) 05:00, 3 January 2025 (UTC)
- Enough. It is a great waste of time to have the same discussion over and over and over. I find it absurd to think that most AI creations are going to be considered derivative works. The AI programs may fail that test, but what they produce clearly isn't. Why don't we wait until something new has happened in the legal sphere before we start this discussion all over?--Prosfilaes (talk) 06:21, 3 January 2025 (UTC)
OpposeNo, it shouldn't and they are not derivative works and if they are uploaded by the person who prompted them they also are not fair use but PD (or maybe CCBY). They are not derived from millions of images, like images you draw are not "derived" from public works you previously saw (like movies, public exhibitions, and online art) that inspired or at least influenced you.
There is no debate about that, that is how they work.
False.the main defense of AI generation rests on the question of if these derivative works qualify as fair use.
Also false. Prototyperspective (talk) 09:52, 3 January 2025 (UTC)
- Most AI-generated images, unless the AI is explicitly told to imitate a certain work, are not "derivative works" in the sense of copyright, because the AI does a thing similar to humans when they create new works: Humans have knowledge of a lot of pre-existing works and create new works that are inspired by them. AI, too, "learns" for example what the characteristics of Impressionist art are through the input of a lot of Impressionist paintings, and is then able to create a new image in Impressionist style, without that image being a derivative work of any specific work where copyright regulations would apply - apart from the fact, of course, that in this specific example, most of the original works from the Impressionist period are public domain by now anyway. The latter would also be an argument against the proposal: Even if it were the case that AI creates nothing but "derivative works" in the sense of copyright, derivative works of public domain original art would still be absolutely fine, so this would be no argument for completely banning AI images. Having said all that, I think that we should handle the upload of AI images restrictively, allow them only selectively, and Commons:AI-generated media could be a bit stricter. But a blanket ban wouldn't be a reasonable approach, I think. Gestumblindi (talk) 11:12, 3 January 2025 (UTC)
- We want images for a given purpose. It's a user who uploads such an image. He is responsible for his work. We shouldn't care how much assistance he had in the creation process. But I'd appreciate an agreement on banning photorealistic images designed for deceiving the viewer. AI empowers users to create images of public (prominent) people and have these people appear more heroic, evil, clean, dirty, important or whatever than they are. But we have this problem with photoshop already. I don't want such images in Wikimedia even if most people know a given image to be a hoax (such as those of Evil Bert from sesame street). Vollbracht (talk) 01:42, 4 January 2025 (UTC)
- This discussion isn't about deception or usefulness of the images, it's about them being derivative works. Di (they-them) (talk) 02:12, 4 January 2025 (UTC)
- You got the answer on "derivative works" already. I can't see a legal difference between a photoshopped image and an image altered by AI or a legal difference between a paintbrush artwork and an AI generated "artwork". Still as Germans say: "Kunst kommt von können." (Art comes from artistic abilities.) It's not worth more than the work that went into it. If you spend no more than 5 min. "manpower" in defining what the AI shall generate, you shouldn't expect to have created something worthy of any copyright protection or anything new in comparison to an underlying work of art. We don't need more rules on this. When deriving something keep the copyright in mind - no matter what tool you use. Vollbracht (talk) 03:34, 4 January 2025 (UTC)
- This discussion isn't about deception or usefulness of the images, it's about them being derivative works. Di (they-them) (talk) 02:12, 4 January 2025 (UTC)
- Look at other free-upload platforms and you get to the inevitable conclusion that AI uploads will ultimately overwhelm Commons by legal issues or sheer volume. Because people. But with no new legal impulses and no cry for action from tech Commons, I see no need for a new discussion at this point. Alexpl (talk) 05:59, 4 January 2025 (UTC)
- As I understand it, there are three aspects of an AI image:
- The creations caused by the computer algorithm. Probably not copyrighted anywhere because an algorithm is not an animal.
- An AI prompt, entered by a human. This potentially exceeds the threshold of originality, in which case the AI output probably is a derivative work of the prompt. Maybe we need a licence of the AI prompt from the person who wrote it, unless the prompt itself is provided and determined to be below the threshold of originality.
- Sometimes an AI image or text is a derivative work of an unknown work which the AI software found somewhere on the Internet. Here it might be better to assume good faith and only delete if an underlying work is found. --Stefan2 (talk) 11:54, 7 January 2025 (UTC)
- Re 2: note that short quotes can also be put onto Wikipedia which is CCBY-SA and Wikiquote. Moreover, that applies to the prompt, but media files can also be uploaded without input prompt attached. In any case, if the prompt engineer licenses the image under CCBY or PD then it can be uploaded and I only upload these kind of AI images even if further may also be PD. Re 3: that depends on the prompt, if you're tailoring the prompt in some specific way so it produces an image like that then it may create an image looking very similar...e.g. if you prompt La Vie, 1903 painting by Pablo Picasso, in the style of Pablo Picasso, the life it's likely produce an image looking like the original. I also don't think that it would be good to assume that active contributors would without disclosing it do so. Prototyperspective (talk) 12:16, 7 January 2025 (UTC)
- If you ask for
La Vie, 1903 painting by Pablo Picasso, in the style of Pablo Picasso, the life
, then you are very likely to get a derivative work. - If you ask for
a picture of a cat
, then there is no problem with #2, but you have no way of knowing how the AI tool produced the picture, so you are maybe in violation of #3 (you'll find out if the copyright holder sues you). --Stefan2 (talk) 12:53, 7 January 2025 (UTC)
- If you ask for
Oppose Whatever the details of AI artwork and derivatives are, there's a serious lack of people checking for copyright violations to begin with and anyone who tries to follow any kind of standards when it comes to AI artwork just get cry bullied, threatened, and/or sanctioned for supposedly causing drama. So there's really no point in banning it or even moderating in any way what-so-ever to begin with. The more important thing is properly labeling it as such and not letting people pass AI artwork off on here as legitimate, historically accurate images. The only other alternative would be for the WMF to take a stance on it one or another, but I don't really see that happening. There's nothing that can or will be done about all the AI slop on here until then though. --Adamant1 (talk) 06:51, 9 January 2025 (UTC)
- Conditional
Support. I do not support an outright and total ban of any and all AI generated imagery (in short: AI file) on Commons, that's going too far. But I would support a strict enforcement and an strict interpretation of our scope policy in regards to such imagery. By that, I mean the following.
- I support the concept that any upload of AI generated imagery has to satisfy the existence and demonstration of a concise and legitimate use case on Wikimedia projects before uploading the data on Commons. If any AI file is not used, then it's blanketly out of scope. Reasoning: Most Wikimedia projects have a rule of only hold verifiable information. AI files have a fundamental issue with this requirement of verifiability, as the LLM models (Large Language Models) used do not allow for a correlation between input and output. This is exemplified by the inability of the LLM creators to remove the results of rights infringing training data from the processing algorithms, they can only tweak the output to forbid the LLM outputting infringing material like song or journalistic texts.
- I support a complete ban of AI generated imagery that depicts real-life celebrities or historical personnages. For celebrities, the training data is most likely made of copyrighted imagery, at least partly. For historical personnages, AI files will likely deceive a viewer or reader in that the AI file is historically accurate. Such a result, deceiving, is against our project scope, see COM:EDUSE.
- I support the notion of using AI files to illustrate concepts that fall within the purview of e.g. social sciences. I could very well see a good use case to illustrate e.g. poverty, homelessness, sexuality topics and other potentially contentious themes at the discretion of the writing Wikipedian. AI files may offer the advantage in that most likely no personality rights will get touched by the depiction. For this use case, AI files would have to strictly satisfy our COM:Redundant policy: as soon as there is an actual human made media file, a photograph, movie or sound recording that actually fulfils the same purpose as the AI file, then the AI file gets blanketly out of scope.
- I am aware that these opinions are quite strict and more against AI generated imagery. That's due to my background thoughts about the known limitations of generative software and a currently unclear IP right situation about the training data and the output of these LLM. I lack the imagination on how AI files could currently serve to improve the mission of disseminating knowledge, save for some limited use cases. Regards, Grand-Duc (talk) 19:07, 9 January 2025 (UTC) PS. For further reference: Commons:Deletion requests/Files in Category:AI-generated portraits.
- Re 1.: some people complain that people upload images without use case, other people complain when people add useful media they created themselves to articles – it's impossible to make it right. Moreover, Commons isn't just there as a hosting site for Wikipedia & Co but also a standalone site. Your point about LLM is good and I agree but this discussion is not about LLMs but AI media creation tools.
- Re 2.: paintings are also inaccurate. Images made or modified with AI (or made with AI and then edited with eg Photoshop) are not necessarily inaccurate. I'm also very concerned about the known limitations of generative software but that doesn't really support your points and doesn't support that Commons should censor images produced with a novel toolset. Prototyperspective (talk) 19:34, 9 January 2025 (UTC)
- All the AI media creation tools, be it Midjourney, Grok, Dall-E and the plethora of other offerings are based upon LLM. So, any discussion about current "AI media creation tools" is the same as discussing the implications of LLM in practice, IP law and society. And yes, Commons wants to also serve other sites and usages (like school homework for my son, did so in the past and will do in the future). But as anybody may employ generative AI, there is no need to use Commons to endorse any and all potential use - as I tried to demonstrate, AI files are only seldom useful to disseminate knowledge, see Commons:Project scope.
- Paintings are often idealized, yes, introducing inaccuracies. But in that case, the work is vouched for by a human artist, who employed his creativity and his knowledge based upon the learnings in his life to produce a given result. These actions cannot be duplicated at the moment by generative AI, only imitated. And while mostly educated humans will recognize a painting as a creation of a fellow human that will certainly contain inaccuracies, the stories about "alternative facts", news bubbles, deepfakes etc. show that generative AI products are often neither recognized as such and taken at face value. Regards, Grand-Duc (talk) 19:56, 9 January 2025 (UTC)
- No, those are not the same implication. You however got closer to understanding the concept and basics of prompt engineering which is about getting the result you intend or imagined despite all the flaws LLMs have.
People have developed all sorts of techniques and tricks to make these tools produce the images they have in mind at a quality they'd like to have them. If you think people ask AI generator tools to illustrate a subject by just providing the concept's name like "Cosmic distance ladder" and then assuming it produces an accurate good image showing that you'd be wrong. Moreover, most AI images do look like digital art and not photos and are generally labelled as such. Prototyperspective (talk) 22:05, 9 January 2025 (UTC)
- No, those are not the same implication. You however got closer to understanding the concept and basics of prompt engineering which is about getting the result you intend or imagined despite all the flaws LLMs have.
- Oppose per Prosfilaes it is not at all guaranteed that we're in a Hobson's choice here. Some AI images may well be bad, but banning them all just in case is ridiculous. --GRuban (talk) 21:53, 9 January 2025 (UTC)
Support with the possible exception of images that are themselves notable. Blythwood (talk) 20:22, 11 January 2025 (UTC)
- Mostly support, but not for the reasons proposed. While I don't disagree with the argument that AI-generated content could potentially be considered a derivative work, this argument isn't currently supported by law, and I don't think that's likely to change in the near future. However, very few AI-generated images have realistic educational use. Generated images of real people, places, or objects are inherently inferior to actual photos of those subjects, and often contain misleading inaccuracies; images of speculative topics tend towards clichés (like holograms and blue glowing robots for predictions of the future, or ethnic stereotypes for historical subjects); and AI-generated diagrams and abstract illustrations are inevitably meaningless gibberish. The vast majority of AI-generated images inserted into Wikimedia projects are rejected by editors on those projects and subsequently deleted from Commons; those that aren't removed tend to be more the result of indifference (especially on low activity projects) than because they actually provide substantial educational value. Omphalographer (talk) 20:15, 20 January 2025 (UTC)
Oppose - No evidence there is actual legal risk. The U.S. Copyright office has declared many times now that A.I. generated images are not copyrighted unless they are clearly derivative works. Any case of an image being a derivative work needs to be handled on a case-by-case basis, just like any other artwork. If Commons is actually concerned about copyright issues with derivative works, we need to delete about a thousand cosplay images first. (No, I'm not saying that all cosplay images are copyrighted derivative works, but a lot of them are.) Nosferattus (talk) 02:02, 21 January 2025 (UTC)
Oppose if a replacement sister project is not established.
Support if a sister project like meta:Wikimedia Commons AI is introduced and AI images can be moved to that project. As I already stated in 2024, I believe that AI-generated images and human-made images should be kept separate in order to protect, preserve, defend and encourage the integrity of purely human creativity. S. Perquin (talk) 20:33, 22 January 2025 (UTC)
Oppose A blanket ban because "If they are fair use, they may be legal. If they are not fair use, they may be illegal copyright violations." is not accurate: as courts have already found in the United States, intellectual property such as a copyright can only be applied to a human person intentionally making a creative work, not a software process or an elephant or a hurricane in a paint store. Individual AI-generated works may well be copyright infractions, but that would be for the same reasons as if a human person made a work that was influenced by existing copyrighted works, such as being virtually identical to the source work. I am not a lawyer and nothing I write here or anywhere should be taken as any kind of proper financial, legal, or medical advice. —Justin (koavf)❤T☮C☺M☯ 22:54, 22 January 2025 (UTC)
- A blanket ban for copyright reasons would likely encompass a number of uses that would not violate copyright. If this is unwarranted, there may be smaller categories that are more reasonable to consider per Commons:PRECAUTIONARY. For example, AI images of living individuals we do not have a free copy of, might be one area this could apply. I recall there was an AI image of a en:Brinicle discussed previously and deleted, which very obviously resembled the BBC footage of a Brinicle, likely as that was the first ever footage of this phenomena and remains part of a very limited set, but it's hard to work that into a general prohibition. CMD (talk) 02:28, 23 January 2025 (UTC)
- @Koavf: One issue with AI generated images is that we don't usually have a way of knowing the country of origin and they are currently copyrighted in the United Kingdom. Although they are PD in the United States. The issue is that policy requires something not be copyrighted in the country of origin, not just the United States. That's not to say AI generated images can't just be nominated for deletion on a "per image" basis when (or if) it's determined if said image was created in a country where they are copyrighted, but that goes against Commons:PRECAUTIONARY and no other images get a free pass from it in the same way that AI generated artwork seems to. I. E. some people have made the argument that there doesn't need to be a source for AI generated images "because AI", which clearly goes against the guidelines. Not to say I think there should be a blanket ban either though but there should at least be more scrutiny when it comes to where AI artwork on here originates from and enforced of the Commons:PRECAUTIONARY when it isn't clear. --Adamant1 (talk) 08:13, 23 January 2025 (UTC)
Oppose nothing new provided here. Until there is a broad legal consensus they’re copyright violations, they’re legal. And we can’t ban an entire medium just because there’s a lot of justifiable controversy around it— how are we supposed to illustrate DALL-E itself in that case? Dronebogus (talk) 09:08, 23 January 2025 (UTC)
- Like Grand-Duc, Omphalographer and others said, I think it's better to argue from a COM:SCOPE standpoint. I think it's worthwhile to add illustrative examples to the said policy or to a subpage of it, if necessary - examples where an AI image is unlikely to be in scope, perhaps along with other similar materials like amateur artworks. --whym (talk) 08:56, 25 January 2025 (UTC)
Oppose per Blythwood. — 🇺🇦Jeff G. ツ please ping or talk to me🇺🇦 11:39, 25 January 2025 (UTC)
Oppose a blanket ban as per the OP; illustrations made by AI are potentially useful. However if we want to keep them, AI creations have to adhere to the Commons:Scope and should be judged more critically than other content. AI images that are just created and uploaded for no educative purpose, should get deleted, especially if someone makes them en masse. AI images intended for misinformation should also lead to user bans (on repeated offenses after fair warnings). Best, --Enyavar (talk) 22:23, 27 January 2025 (UTC)
Support ban against AS (Artificial Stupidity), alternatively strict limitations:
- only users with autopatrol right may upload AI/AS-generated files
- upoad maximally 2 AI/AS-generated files per 24 hours
- Mass uploading of nonsense has become a big problem and waste of resources. Taylor 49 (talk) 00:03, 5 February 2025 (UTC)
- A skilled smart person aware of the issues & limitations can still use stupid tools to produce good results. I see neither mass uploading nor a big problem so far with AI images. Prototyperspective (talk) 00:11, 5 February 2025 (UTC)
Comment Really? Commons:Deletion requests/Files uploades by User:DenisMironov1 Taylor 49 (talk) 00:28, 5 February 2025 (UTC)
- In rare cases people upload a medium-sized batch of AI images. The filetitle already informs it's AI-generated. People DR'd it and it's gone.
No issue at all as far as I can see and less problematic than people uploading 100 20MB photos of the same mundane subject for which there already are >200 varied pics or 100 separate scans of bookpages instead of a pdf file. There aren't many cases of people uploading mid or large numbers of lowquality AI files and it's easily dealt with. I've been checking recent AI uploads for a long time and it on avg was only a few files per few days so moving slowly and easy to organize. Prototyperspective (talk) 18:09, 5 February 2025 (UTC)
- In rare cases people upload a medium-sized batch of AI images. The filetitle already informs it's AI-generated. People DR'd it and it's gone.
Comment I'd like to add some info to the discussion. Recently, Meta Inc. (the owner of Facebook) got caught torrenting from a shadow library, Anna's archive in occurrence: https://torrentfreak.com/meta-torrented-over-81-tb-of-data-through-annas-archive-despite-few-seeders-250206 for the purpose of training their models. So, it's IMHO not far-fetched to say that any and all AI generated media is akin to the Fruit of the poisonous tree - unless proven good, it may be sensible to assume that they would be likely a copyright violation. Regards, Grand-Duc (talk) 18:07, 10 February 2025 (UTC)
- If you ever watched some pirated films or read a few downloaded books of a genre from which you learn or get inspiration from you can still produce a film or image of the same genre that is not a copyright violation. Prototyperspective (talk) 19:11, 10 February 2025 (UTC)
- Yes, but "inspiration" is, in the same vein as curiosity and creativity, a purely human behavioural trait (well, simians, Corvidae, Psittacidae and several Odontoceti show these too). No machine is able to replicate that, so machine-processed copyvio data remain a copyvio. Regards, Grand-Duc (talk) 19:32, 10 February 2025 (UTC)
- Yes, machines / software are different from humans. That doesn't change anything about the point made. Machines can machine-learn from copyright-restricted content and then produce free content as much as humans can biological-learn from copyright-restricted content and then produce free content if you prefer me to be more precise with the terminology. Prototyperspective (talk) 19:40, 10 February 2025 (UTC)
- Yes, but "inspiration" is, in the same vein as curiosity and creativity, a purely human behavioural trait (well, simians, Corvidae, Psittacidae and several Odontoceti show these too). No machine is able to replicate that, so machine-processed copyvio data remain a copyvio. Regards, Grand-Duc (talk) 19:32, 10 February 2025 (UTC)
Major damage to Wikimedia Commons
[edit]As far as I can see, major damage has successively been done to Wikimedia Commons over the last few years by chopping up categories about people into individual "by year" categories making it
- virtually impossible to find the best image to use for a certain purpose, and
- virtually impossible to avoid uploading duplicates since searching/matching imges has become virtually impossible.
Here is a perfect exsmple. I have a really good, rare picture of her, but I'll be damned if I'm willing to wade through all the "by-year" categories to try to see if Commons already has it. The user who uploaded this didn't even bother to place it in a personal category. Why should they, with all the work required to try to find the category at all & fit the image in there?
I am mot objecting to the existence of categories "by year", Searching is the problem.
What if anything can be done about this mess which is steadliy getting worse all the time? Could some kind of bot fix it?
I really feel that this is urgent now and cannot be ignored any longer. The project had become worth much much much less through the problem described. Or have I missed/misunderstood something here? SergeWoodzing (talk) 10:00, 23 January 2025 (UTC)
- This is a duplicate discussion of Commons:Administrators' noticeboard#URGENT! Major damage to this project. CMD (talk) 12:22, 23 January 2025 (UTC)
- Yes, but the user was told there to bring it here. - Jmabel ! talk 17:28, 23 January 2025 (UTC)
- Contemporary VIP´s produce a ton of images. Sorting them by year makes sense - otherwise you would have to deal with hundreds of files in one category. As for Wikipedia: Go to the most recent useable photo of "Sophie" and use that. And if it is not the most flattering... well that´s life. Alexpl (talk) 12:33, 23 January 2025 (UTC)
- Uh, I don't think that's really what we ought to do. I've tried for many years always to use the best possible images. --SergeWoodzing (talk) 17:52, 23 January 2025 (UTC)
- It feels like the fear is a bit too huge to me, but (if you're looking for images of Donald Trump), you can enter deepcategory:"Donald Trump by year" in the regular search for example, et voilá! You can see many Donald images at once without looking into each subcat :) (Also see COM:Search for tags and flags) --PantheraLeo1359531 😺 (talk) 12:51, 23 January 2025 (UTC)
- Splitting images by year is often counterproductive, but that's necessary when there are a lot of them for one person. Yann (talk) 15:45, 23 January 2025 (UTC)
- See Help:Gadget-DeepcatSearch and TinEye image reverse search among other things. Prototyperspective (talk) 16:04, 23 January 2025 (UTC)
Please! I have not suggested that images should not (also) be sorted by year, so there is no need to defend that kind of sorting. I've asked for a search remedy & will now try the tips we've been given here. Thank you for them! --SergeWoodzing (talk) 17:52, 23 January 2025 (UTC)
- @SergeWoodzing: Another solution I've seen is addition of a flat category for all of a topic's files, to achieve your purpose but still allow for the granularity that others like to achieve. — 🇺🇦Jeff G. ツ please ping or talk to me🇺🇦 18:31, 23 January 2025 (UTC)
- I agree with the general issue brought up by SergeWoodzing. "By year" in many cases only makes sense if a large number of files cannot be meaningfully sorted in another way. Non-recurring events, certainly. But things that undergo little changes from year to year, do not necessarily need atomized categories, and I also noticed that by-year categories are steady on the rise since 2018. Not always for the better. The following examples are not what literally exists right now, but it would be easy to find real cases just like them.
- I have recently encountered more and more "books about <topic> by year", and that means that either a very broad topic like "biology" is split up by year (which later makes it much harder to split the topic by "botany" vs. "zoology", or as "by countinent", "by language", or to meaningfully search the publications) or that a very narrow topic like "American-Mexican War" gets splintered into single-file categories. How can two books about the same war be categorically different because the one was published in December 1857 and the other in March 1858?
- "Maps by year" are my pet peeve. Nearly all maps before the 21st century had many-years-long production processes. The further back we go in time, the less the publication year of an (old) map matters, the categories should rather differentiate the location and topic, not the day/month/year of publication. An old city plan of Chennai, an old topo map of Rajasthan, and an old geological map of Bengal all from the same year, have so little in common that it is pointless to primarily group them under "<year> maps of India". (I have been vocal about this several times here on the pump already, and got some support too).
- "Tigers by year": I think everyone should see the absurdity. Photos of tigers should be grouped by location (zoo, country) or by growth stage (juvenile, adult...), not whether they were photographed in 1998 vs. 2015.
- Location by year. Without exif/metadata, one photo of Taj Mahal looks like the other, with no telling that one was created 2008 and the other 2021. It makes much more sense to primarily sort them by architectural elements (main building, gates, interior...) than by year. Thankfully, this is done too, but not always. And sure, with some events even two years make a huge difference, like Verdun 1913 and Verdun 1915; but many "non-changing" locations are fine without a by-year split-up.
- "Person by year" is the OP topic already. Actually, in my opinion this makes mostly sense as long as MANY files exist. If there are just 25 files of a celebrity, please do NOT split that collection up into 12 by-year subcategories. Doing so is a case of well-intentioned obstruction, as access just gets harder with no further benefit.
- Much more could be said. So while I am cautiously supportive of several important use cases of by-year categories, atomization has to stop. --Enyavar (talk) 02:12, 24 January 2025 (UTC)
- While I also find it annoying to search through "by year" categories, there's merit in them.
- Books about <topic> by year: Theories change and new discoveries are made. From a historical perspective, it is interesting what biologists from the 19th century thought and assumed about biological processes and how those assumptions and thoughts differ from those in the 20th century. There is merit in stuff like "history of medicine" and similarly there's merit in stuff like "history of biology".
- Maps by year: the year helps to determine the copyright status of a map (because not all take that into consideration when uploading files). The year puts the map in context, like right now with the war in Ukraine: maps of Ukraine produced by Russia in 2024 will look different from those produced in 2010, and they will also look different from those produced by Ukraine in 2024 despite being from the same year. Such maps show how borders have changed throughout the years (be it a factual change or be it because someone thinks that they have changed).
- Tigers by year: This may seem ridiculous now, but you have to keep in mind that everything will become history one day. Wouldn't it be cool if we had photos of mammoths by year? If we had them, we'd be able to determine precisely when their population started to decline and in what area first and in which area last, and eventually, we'd be able to tell in what year exactly the last mammoth was photographed. The "tigers by year" category only seems ridiculous if one assumes that things are not going to change throughout the years. I mean, what if tigers start to change evolutionary by starting to grow saber-tooths again? Wouldn't you want to know when and where exactly the change started to occur? Don't you think that future generations would want to know how tigers looked like in 1950 vs 2000 vs 2050? You might think that such a change would take very long, but a look at the German Shepherd dog says otherwise: the dog that was first entered into the breed registry in 1895 was quite different from the one we now know as a German Shepherd (see [1][2][3]). It would even make sense to have a category for "tigers in zoos by year" because it would document how conditions regarding animal welfare have changed over the years.
- Location by year: Things do change. There are repairs, there are implementations of new regulations (such as ramps for people in wheelchairs etc.), there is deterioration of condition, etc. A building can be gone in no time, even without there being some major events causing the change but just through neglect (see this building in 2023, 2023 vs. 2024).
- I think it would be helpful if files were automatically added to hidden time-related (and maybe rough location-related) categories based on EXIF data (if available) instead of having to add this information manually and retrospectively[4]. It would save time for the uploader and it would help to focus more on content-related categories when categorizing a file, yet still keep time-related categories accessible for those interested in them. Nakonana (talk) 17:20, 23 February 2025 (UTC)
- There are a lot of hypotheticals in your response, and I cannot see practical usability for most.
- Books: Category:1894 books about geography. Not helpful, because geography has so many sub-topics. The structured data approach would be much more helpful here, because you could search for "1890s books" & "Alps" & "books about geography", without having to click through each single by-year category. And you might think that Category:Books about World War I is rather specific already, right? The problem is that once we subdivided them by year (or language), we can no longer comfortably subdivide them by topic as well. 1919 book=1920 book=1926 book (books about the history of specific regiments in WW1); 1928 book=1919 book=1917 book (personal war testimonials/memoirs); 1919 book=1920 book=1923 book (books about naval warfare in WW1). By contrast, the by-year subcategories about WW1 books are fully arbitrary, depending how quickly an author produced their own book.
- Maps: Just a 'lil map of an American Civil War battlefield, be my guest and identify which year the map belongs to and copyright status. Photos are snapshots of reality, made in a second. Maps are not like that at all. Yes, context is important, but structured data can handle that just as well. Once you determine "category:1907 maps of Paris" you claim that this file is categorically just as different from "1908 maps of Paris" as it is from "2019 maps of Paris" - the situation is roughly the same as with the WW1 books above.
- evolution of sabretooth tigers by zoo by year? Because humans will selectively breed them for sabretooth traits like they bred German Shepherds? Oh-kay. Sure. I am aware that this is just an example, this could be any organism. Categories are the wrong tool to document such changes, though. And by-year categories are even wronger.
- Location by year. Eh. Yes sure, the Eiffeltower undergoes major architectural rearrangements every year, but we have all those by-year categories because we have so many photos of it; and not because we try to document all those changes in its construction. For most locations that are not overflowed by mass tourism, the structured data approach is best.
- Categories should be there to group "files about the same subject/". If the same subject is "Press Conference in Brussels, 2014 July 13th" with several photographs uploaded: Yes, the presser category belongs into "July 2014 in Brussels", dated not just by year but even by month. By contrast, if there are in total three photos of a statue in the Brussels suburbs, these three should be grouped together primarily; the years in which each was taken is of secondary importance. And we do NOT need "1999 photographs of <some statue in Brussels-suburb>" vs. "2005 photographs of <some statue in Brussels-suburb>" vs. "2019 photographs of <some statue in Brussels-suburb>", for three files. --Enyavar (talk) 12:02, 24 February 2025 (UTC)
- There are a lot of hypotheticals in your response, and I cannot see practical usability for most.
- While I also find it annoying to search through "by year" categories, there's merit in them.
- I agree with the general issue brought up by SergeWoodzing. "By year" in many cases only makes sense if a large number of files cannot be meaningfully sorted in another way. Non-recurring events, certainly. But things that undergo little changes from year to year, do not necessarily need atomized categories, and I also noticed that by-year categories are steady on the rise since 2018. Not always for the better. The following examples are not what literally exists right now, but it would be easy to find real cases just like them.
Comment I did a proposal a few months ago to confine "by year" categories to images that show a meaningful distinction by year. For instance something like a yearly event where there's actually a difference between the years. Whereas, say images of tigers per Enyavar's example aren't worth organizing per year because there's no meaningful between a tiger in 2015 and one in 2016. Anyway, it seemed like there was general support for the proposal at the time.
- The problem is that there's no actual way to enforce it because people will ignore consensus, recreate categories, and attack anyone who disagrees with them. It's made worse by the fact that admins on here seem to have no will or ability to impose any kind of standards. They just cater to people doing things their own way regardless of consensus as long as the person throws a big enough tantrum about it. There's plenty of proposals, CfD, village pump and talk page discussions, Etc Etc. that should already regulate how these types of categories are used though. They just aren't ever imposed to any meaningful degree because of all the
limp wristedweak pandering to people who use Commons as their own personal project. --Adamant1 (talk) 06:47, 24 January 2025 (UTC)
- So should they ban you promptly for using a homophobic slur ("limp wristed"), or should they just let you continue going on your way ignoring consensus?--Prosfilaes (talk) 08:16, 24 January 2025 (UTC)
- @Prosfilaes: I didn't actually know it was a homophobic slur. I just thought it meant weak. I struck it out though. Thanks for letting me know. Not that I was saying anyone should banned for ignoring the consensus, but if people intentionally use homophobic slurs then yes they should be banned for it. With this though it's more about the bending over backwards to accommodate people who don't care about or follow the consensus then it is sanctioning anyone over it. people should just be ignored and the consensus should be followed anyway. There's no reason what-so-ever that it has to involve banning people. Just don't pander to people using Commons as their own personal project. It's not that difficult. --Adamant1 (talk) 09:28, 24 January 2025 (UTC)
- For this topic at least, I don't think I have seen actual attacks against other users, thankfully. SergeWoodzing has used some strong condemnations of the status quo in general on Commons, but I do not perceive his statement as an attack against some users. Now, Adamant points out the problem, which is that we seemingly don't have a guideline or even policy on which topics may be organized by year and which ones should rather not get a by-year categorization. I'm almost sure that people are creating by-year categories out of the best intentions, and mostly because they are boldly imitating the "best practice" of other users, ignorant of some consensus that may or may not have been formed among a dozen users in the village pump. Which means that such users have to be talked out of the idea individually once they start by-year categories for an unsuitable topic. --Enyavar (talk) 16:33, 24 January 2025 (UTC)
- There's certainly an aspect to this where people indiscriminately create by year categories because other people do. But it still comes down to a lack of will and/or mechanisms to enforce standards though. You can ask the person doing it to stop, but they can just ignore you and continue. Then what? No one is going to have repercussions for ignoring the consensus by continuing to create the categories. I've certainly never there be any and I've been involved in plenty of conversation about over categorization. The person usually just demagogues or outright ignores the issue and continues doing it. --Adamant1 (talk) 16:54, 24 January 2025 (UTC)
- For this topic at least, I don't think I have seen actual attacks against other users, thankfully. SergeWoodzing has used some strong condemnations of the status quo in general on Commons, but I do not perceive his statement as an attack against some users. Now, Adamant points out the problem, which is that we seemingly don't have a guideline or even policy on which topics may be organized by year and which ones should rather not get a by-year categorization. I'm almost sure that people are creating by-year categories out of the best intentions, and mostly because they are boldly imitating the "best practice" of other users, ignorant of some consensus that may or may not have been formed among a dozen users in the village pump. Which means that such users have to be talked out of the idea individually once they start by-year categories for an unsuitable topic. --Enyavar (talk) 16:33, 24 January 2025 (UTC)
- Tbf I didn’t really know it was either. I wouldn’t even call it a “slur”— more of a general insult with homophobic connotations, like “sissy” or “pansy”. Dronebogus (talk) 17:57, 27 January 2025 (UTC)
- @Prosfilaes: I didn't actually know it was a homophobic slur. I just thought it meant weak. I struck it out though. Thanks for letting me know. Not that I was saying anyone should banned for ignoring the consensus, but if people intentionally use homophobic slurs then yes they should be banned for it. With this though it's more about the bending over backwards to accommodate people who don't care about or follow the consensus then it is sanctioning anyone over it. people should just be ignored and the consensus should be followed anyway. There's no reason what-so-ever that it has to involve banning people. Just don't pander to people using Commons as their own personal project. It's not that difficult. --Adamant1 (talk) 09:28, 24 January 2025 (UTC)
- So should they ban you promptly for using a homophobic slur ("limp wristed"), or should they just let you continue going on your way ignoring consensus?--Prosfilaes (talk) 08:16, 24 January 2025 (UTC)
Support resolving concerns reported by user "SergeWoodzing". such "alternative overcategorization" by year, or even worse by state, makes useful files hard to find. Not for "Contemporary VIP:s" brewing gazillions of useless images, but for relevant topics, especially if the total number of files is low. Taylor 49 (talk) 00:18, 5 February 2025 (UTC)
- The fix for this problem is to lobby the Wikimedia Foundation to finish implementing structured data on Commons, then use that instead of our antiquated category system. — Rhododendrites talk | 22:51, 23 February 2025 (UTC)
- The category system is in no way antiqued. Structured depicts data is currently a barely populated totally-unused time sink where all of it can be captured by the categories. Display categories in a different way maybe – more like many other websites which have tags – or improve how categories can be queried, searched & qualified. Prototyperspective (talk) 00:41, 24 February 2025 (UTC)
- I have to agree with Rhododendrites. Categories on Commons have become completely useless (mostly thanks to "by year" categories). Structured data is the best way forward. Nosferattus (talk) 00:57, 24 February 2025 (UTC)
- If the reason you think these are "useless" is because of "by year" categories then that just shows how incredibly irrational and weak your argumentation would be. It makes no sense and is actively harming Commons based on some strange SD ideology that exists because people try hard to bury their head in the sand. The solution for this minor issue is simple: a) use other/by subject subcategories instead or b) use search methods like deepcategory:"Donald Trump by year" which can be made more accessible via a cat-search-box. Prototyperspective (talk) 01:43, 24 February 2025 (UTC)
- c) finally implementing date sorting & range-filters including reading the data in the already-existent Summary template. phab:T329961#10041982
- Wouldn't mind if the data in these templates would be synced/copied ~at once into structured data if that's the easiest/best way to implement it but categories are still best for subjects and would just be combined with such metadata in structured data (e.g. sort by or specify year on the category page). Prototyperspective (talk) 01:54, 24 February 2025 (UTC)
- Structured Data is not an "ideology", it is just a good way to have files organized flexibly and without a rigid category structure. I argue that we need both categories and SD.
- "By year" categories are often but not always handled terribly. I have no problem with "Category:1919 in Paris (10 C, 57 F)": it is a big place, and that amply filled category allows you to browse. I also have no problem with "1919 elections in Brazil (1 F)" because it is part of a whole scheme about a very specific topic. I do have a problem with "Category:1919 in Farmton-upon-Runlet, Ruralshire (2 F)", especially if that is the only by-year category from there until "Category:2013 in Farmton-upon-Runlet, Ruralshire (6 F)".
- That last example from above shows the antiquated approach of using categories to build elaborate trees around singular files. "Category:<Church> in the 20th century" which has "Category:<Church> in the 1920s" which has "Category:<Church> in 1921". That last category could also be the single child category of "Category:<Town> in 1921" (for a town like this with a total of 3 photos in the whole 20th century). There is also "Churches in <country> photographed in 1921". Now that category should be handled by a structured data query, not rigidly coded into each file. The structured data would add a tag of "date=1921"; and if it later turns out that the 1921 photo was actually taken in 1905 already, only one little thing needs to get changed. --Enyavar (talk) 10:06, 24 February 2025 (UTC)
- The primary problem with the hierarchical category system (along the language problem) is that it creates false statements. For example we have a mountain in a nature reserve with a communication tower on the summit. The tower in not part of the nature reserve. The tower is categorized unter mountain and the mountain under the nature reserve. Then we get photos of the interior of the tower we looking for photos of the nature reserve. This could be solved by building parallel category structures and only linking them as "see also". But this is far to complicated for people who do not work with the category system in the particular topic every day. GPSLeo (talk) 10:45, 24 February 2025 (UTC)
- This thread is only about the "by year" category subtree, though? --Enyavar (talk) 12:05, 24 February 2025 (UTC)
- I think not really. But the the same problem applies to by year categories also. GPSLeo (talk) 12:29, 24 February 2025 (UTC)
- I agree that this is a problem but it's not a problem with the categories and isn't about statements (at least mostly; or entirely as of now). This is a problem if you use deepcat – e.g. using the Help:Gadget-DeepcatSearch – on the category about the nature reserves or even the grandparent cat about all nature reserves. However, I don't think your example is very good: in that case the tower including its interiors is actually part of the nature reserve and it would be fine to have these photos show up, they just shouldn't be all over the page and preferably more near the bottom of the results. It's still a problem that they show up in the cat "Nature" etc since "Nature reserves" are their subcat. In any case, here is how this can be addressed (and this problem warrants a separate thread):
- Tools for deepcat can be made so that one can easily adjust the depth and have it exclude special categories (it could use these some by default and have premade filters one can readily switch on): I've described this here at phab:T376440#10354943
- Some tool(s) are needed for contributors to more easily spot and fix miscategorizations. FastCCI is such a tool but most of the time when trying to use it, it somehow fails to load and more importantly, that is not its primary or a dedicated purpose of it but more some ancillary feature. The tool would function like so: first the user spots some image they think doesn't belong into the deepcategory results – for example I used this on a photo showing some road sign or sth somewhere underneath Category:Microscopic images relating to biology. Then they use that tool to quickly see how that image relates to the given category (the one I just linked) which would show the category-path from the image to the given category (the shortest one or all if there's multiple paths). Then the user identifies the miscategorization and fixes it which can involve adding a category-see-also or the creation of a new category or simply removing a categorization. One can't expect categorization to be good if people don't have such tools. English Wikipedia actually has the same problem but I guess to a lesser extent where e.g. articles about people, films and events were in the category tree about novels iirc. Please see talkFastCCI:Not loading, showing offoptic images, and proposal for a forked gadget that shows the cat-path why a file is in cat. This is important and the earlier it gets done the better and the more accurate the categorization tree and useful the deepcat results will be.
- Prototyperspective (talk) 12:33, 24 February 2025 (UTC)
- This thread is only about the "by year" category subtree, though? --Enyavar (talk) 12:05, 24 February 2025 (UTC)
- The primary problem with the hierarchical category system (along the language problem) is that it creates false statements. For example we have a mountain in a nature reserve with a communication tower on the summit. The tower in not part of the nature reserve. The tower is categorized unter mountain and the mountain under the nature reserve. Then we get photos of the interior of the tower we looking for photos of the nature reserve. This could be solved by building parallel category structures and only linking them as "see also". But this is far to complicated for people who do not work with the category system in the particular topic every day. GPSLeo (talk) 10:45, 24 February 2025 (UTC)
- c) finally implementing date sorting & range-filters including reading the data in the already-existent Summary template. phab:T329961#10041982
- If the reason you think these are "useless" is because of "by year" categories then that just shows how incredibly irrational and weak your argumentation would be. It makes no sense and is actively harming Commons based on some strange SD ideology that exists because people try hard to bury their head in the sand. The solution for this minor issue is simple: a) use other/by subject subcategories instead or b) use search methods like deepcategory:"Donald Trump by year" which can be made more accessible via a cat-search-box. Prototyperspective (talk) 01:43, 24 February 2025 (UTC)
The category system is in no way antiqued. Structured depicts data is currently a barely populated totally-unused time sink where all of it can be captured by the categories
- It's web 1.0 technology that's been replaced everywhere else on the internet. Manually creating semantic categories with no logic other than a basic hierarchy and piling them together. Nevermind the people by year, how about A of B in C in D type categories, where the [English-only] prepositions might be "in" or might be "of", and A, B, C, and D might be repositioned. Structured data makes all of that a non-issue. You're talking about depicts, but depicts was just supposed to be the first of many data points. Structured data being "barely populated" doesn't mean it's a worse system; it means it's not fully implemented or adopted. The only good argument I've ever seen for categories over structured data is "we already have it, and volunteers have put a ton of time into creating it, and they'll be unhappy/demotivated if we lose any of it". It's a totally valid point, but has nothing to do with it being a superior system... the strength of the argument comes from the value of volunteers. Similar argument, I guess, for consideration of economic devastation in coal-mining communities when there's a concerted push towards renewable energy. IMO the priority should be lobbying to finish implementing SDC, and then focus on bots that can translate category data into structured data so as not to lose that volunteer time. — Rhododendrites talk | 21:24, 24 February 2025 (UTC)- The last time I checked the semantic web was dead on arrival. At the end of the day most people don't know or care about structured data. There should really be a more modern, widely used (I. E. usable by most people) way to organize images. Be that tags or some other system but it seems like a pointless time sink to fully implement SDC when most people can't or don't want to use it anyway purely because categories suck. I don't think fixing the issues with SDC on here is going to magically create a market for it over categories either. It's just a convoluted, half-baked system at the core that there's no actual enthusiasm for. A solution looking for a problem if you will. --Adamant1 (talk) 22:29, 24 February 2025 (UTC)
- It's semantic web technology/methodology and even if it wouldn't be, that something isn't the shiniest new concept doesn't mean it's antiqued. You can define how these categories via a) how the categories are titled and b) via the Wikidata items of the categories where they relation can also be defined. Similar methods&systems have not been replaced on other websites, that is false. They generally have well-visible (not at the bottom or even hidden) tags right beneath or above the image.
- --- Just on top of this refutation of your points,
Structured data makes all of that a non-issue
is plain false. People don't take a minute to tag a picture in some sophisticated SD way, they only add plain depicts and either mark it as prominent or not and even that isn't done for >95% of files and not done sufficiently or as much for remaining ~5% (probably much less). Again, SD for metadata that is mostly in the EXIF data wouldn't be an issue. Reread what I wrote earlier about SD, it's not a better system which is also why it's barely used (read and written to) and why when it is written, it's just copying from the categories. The categories are currently better for example because they have category pages and because they have a category tree relation which is absent here and when you click the SD tag you just land on some Wikidata item. Maybe something that syncs categories to the SD wherever matching WD items exist would be useful but it shouldn't replace that which is currently useful and I don't see why much resources should be spent on something that is largely redundant and importantly not used at all except for some niche experiment tool used by virtually nobody that shows images based on SD depicts which could also be done (with much better results) by querying the Commons categories or using the Commons search (which needs improvements). There is no benefit, no need (agree withA solution looking for a problem if you will
when it comes to depicts SD), and lots of drawbacks. Prototyperspective (talk) 23:31, 24 February 2025 (UTC)- Prototype, Adamant describes a notorious problem with SD: there are no standards which attributes should be applied to a file, and there is no easy way how to apply them; and as a result SD is used in most haphazardous (I love that word, it comes up so rarely. =) way possible, and also basically not used in total. The SD editor we implemented here on Commons is truly a disaster, and this proposal was my only point on the wishlist for the Community Call this year. I have no idea what became of the suggestion, besides the promise it would be brought up; but there is no protocoll or transcript. So who knows. Maybe @Sannita (WMF): can provide/link some insight what happened since? I'm highly curious.
- Anyway, so yes, SD implementation is bad currently; but our category tree is mostly maintained manually, usually by combining several "SD terms" in one category name, and then searching for random files that fit the description, and place them in the category.
- If I may compare Commons to a library: we don't use those pesky new-fangled book catalogs! They would allow indexing and consistent searching, but it's soooooo much work to compile them! Drag each book to the desk, input lots of info into a computer that the book already has printed in it anyway, and then bring it back where it came from. No, we just place each new book on the shelves which are all thematically organized and subdivided, and then we trust all readers to navigate the shelves. / Now, with the catalog, you could just assign barcodes to all the books, then search for them in the catalog and retrieve them easily, even with fully unordered shelves. / The optimal approach is however to catalog the books and also use the thematic shelves. And most public libraries do that, despite the double work. Shelves (i.e. categories) are great in general, except the overly specific - like "Tigers in European zoos in 2023" or "1944 diaries of women in World War II". Oh yes, there are the files that hit these two spots, but it's not a superior way to organize the library. --Enyavar (talk) 16:32, 25 February 2025 (UTC)
- No, that is false. 1. He did not describe that as the problem in that comment at least. 2. That is not the problem at all.
- Also it's good that SD is not used at all since volunteer time is valuable and adding redundant SD is a large waste of time ever and if anything should be disincentivized since we got many better things for volunteers to do than writing metadata that is of no use to anyone to 0.1% of files that is already present at the file. For example, by working on readily implementable things that could feasibly double or quadruple Wikipedia readership or improving the MediaSearch or adding content to Wikipedia. Those roundabout explanations for why SD would be useful or even needed are not convincing and based on pipedreams – SD won't magically allow the items to be tagged. If you want that to be done, it needs work on some bots or simply what I have proposed here. I don't see a reply of you there. SD will take "soooooo" much work while the easy-to-use familiar category system make things quick and easy with tools like cat-a-lot and there are several further ways of how to have things more reliably categorized, a second of which is reading the fields of the Summary template which can be used to set categories. If structured data is used for subjects, then make it read-only for users and automatically infer the tags from the categories. "Tigers in European zoos in 2023" would automatically get the SD "tiger" and year "2023" instead of some redundant doublework. In addition, people could use a deepcat viewmode in e.g. the Tigers category where the most useful images are sorted near the top (once again MediaSearch improvements) or use the daterange filter based on the date in the File information template (again see the phab issue). I think for tigers most people would not use such an overspecific cat and people should not move the file only into a year subcat but also into a subject-specific one and at wherever the scope of cats is what one is interested in, one could then use something like a successor of the deepcat gadget or FastCCI to have a modern nonoverspecific view to scroll through. By year categories are just a minor aspect of Commons, the solution would be to have year-categories set (semi-)automatically based on the date in the file info Summary template. Prototyperspective (talk) 00:28, 26 February 2025 (UTC)
- People tend to act like the WMF and/or developers on here are just incompetent slackers but I image if SDC was really the be all end all people treat it is that they would be putting the time and resources into making it viable. This isn't rocket science. They must know it's a hot mess that know one wants. Tangentially, something that I think would solve some is better and or more descriptions templates for specific types of media. As it currently is for something like a postcard, we are forced to use the general description template or one for photograph. Both leave out fields that are important to postcards. But it would help make searching for them a lot easier if the publisher, photographer, location, subject were included in the file description and in specific fields for them. Then the categories for organizing postcards based on those criteria could be axed. Like a couple of thousand categories for postcards by publisher right now that are probably pointless, but the publisher isn't included in descriptions in a lot of instances. So it's really the only option there is to postcards by specific publishers baring people writing better descriptions or using creator templates. Creator templates are another thing like see alsos that are way under used on here BTW but there's already solutions to the problem SDC is meant to solve. People just need to use them. SDC is redundant at best, if not totally worthless at worst though. --Adamant1 (talk) 10:56, 27 February 2025 (UTC)
Then the categories for organizing postcards based on those criteria could be axed
I don't see how that follows; instead one could use a mentioned functionality to read the fields of the file Information template to set the cat (semi-)automatically. Prototyperspective (talk) 13:49, 27 February 2025 (UTC)- @Prototyperspective: There's a postcard publishers that we're never going to have more then a few images of postcards for and/or where the publishers information just isn't available for whatever reason. Plus ton of them are just two letter initials, if even that. Not to mention a lot of publishers have multiple ways their name is printed on the postcards. Like there's at least a couple of publishers where I knew they published the same postcards because of how the stamp box is designed but a category like Category:Brown square stamp box (postcard publiser) isn't practical. It's not practical to create multiple "by publisher" categories for the same publiser based on if they used there initials, full name, Etc. Etc. on particular postcards. But something like a note about it in the description would be fine. There's also a huge problem with people categorizing images of postcard by publisher but not subject or location. The thing is categorizing an image of a building in a category specifically for that building. It just doesn't happen though. Hundreds of images get dumped in a category like Category:Postcards published by Detroit Publishing Co. and still aren't put in subject based categories years later. So "by publisher" isn't a practical, scalable, way to categorize postcards IMO. Categories are really only good for one or two criteria and in instances where those criteria are totally unambiguous. --Adamant1 (talk) 17:45, 27 February 2025 (UTC)
- People tend to act like the WMF and/or developers on here are just incompetent slackers but I image if SDC was really the be all end all people treat it is that they would be putting the time and resources into making it viable. This isn't rocket science. They must know it's a hot mess that know one wants. Tangentially, something that I think would solve some is better and or more descriptions templates for specific types of media. As it currently is for something like a postcard, we are forced to use the general description template or one for photograph. Both leave out fields that are important to postcards. But it would help make searching for them a lot easier if the publisher, photographer, location, subject were included in the file description and in specific fields for them. Then the categories for organizing postcards based on those criteria could be axed. Like a couple of thousand categories for postcards by publisher right now that are probably pointless, but the publisher isn't included in descriptions in a lot of instances. So it's really the only option there is to postcards by specific publishers baring people writing better descriptions or using creator templates. Creator templates are another thing like see alsos that are way under used on here BTW but there's already solutions to the problem SDC is meant to solve. People just need to use them. SDC is redundant at best, if not totally worthless at worst though. --Adamant1 (talk) 10:56, 27 February 2025 (UTC)
- The last time I checked the semantic web was dead on arrival. At the end of the day most people don't know or care about structured data. There should really be a more modern, widely used (I. E. usable by most people) way to organize images. Be that tags or some other system but it seems like a pointless time sink to fully implement SDC when most people can't or don't want to use it anyway purely because categories suck. I don't think fixing the issues with SDC on here is going to magically create a market for it over categories either. It's just a convoluted, half-baked system at the core that there's no actual enthusiasm for. A solution looking for a problem if you will. --Adamant1 (talk) 22:29, 24 February 2025 (UTC)
- I have to agree with Rhododendrites. Categories on Commons have become completely useless (mostly thanks to "by year" categories). Structured data is the best way forward. Nosferattus (talk) 00:57, 24 February 2025 (UTC)
- The category system is in no way antiqued. Structured depicts data is currently a barely populated totally-unused time sink where all of it can be captured by the categories. Display categories in a different way maybe – more like many other websites which have tags – or improve how categories can be queried, searched & qualified. Prototyperspective (talk) 00:41, 24 February 2025 (UTC)
Comment I have been frustrated about categories for years. It is a pain to have to look through 100 categories with 1-5 photos in each category to find a good photo. Personally I think 100-200 photos in a category are more suitable because it is very fast to find a photo in a category. If there are 1,000 files in a category sorting in subcategories is a good idea. At least I know now that deepcategory can help a bit. --MGA73 (talk) 19:24, 24 February 2025 (UTC)
- @GPSLeo: What you refer to as "The primary problem" is only a problem if you assume that the subcategory relation is always an is-a relation. And, yes, Pinging @Enyavar there is a bit of "ideology" involved: not in the idea of having SD so much as in the idea that SD is inherently superior to categories. This is almost always based on an argument comparing, on the one hand, what would happen if SD were used in something close to an ideal manner to, on the other, the actually existing state of categories.
- And these two issues (failure to tag the nature of a particular category inheritance, ideological preference interfering with synthesizing the two systems) come together. Since the advent of SD, WMF has (as far as I can tell) devoted exactly no resources to improving categories. I don't think the neglect has been consciously driven by ideology, but ideology rarely functions on a conscious basis.
- Leaving maintenance categories aside, and sticking to topical categories, the two biggest deficiencies in categories are, as noted, (1) lack of support for multiple languages and (2) lack of ability to express the relationship between a category and its parent categories. (Conversely, the largest disadvantage of structured data is that (3) users access it through an entirely different mechanism than the wikitext with which virtually all experienced Wikimedians are familiar.)
- I believe all three of these problems are completely addressable. I won't go into details here, and I'm about to be traveling the next 5 weeks so this isn't a time for me to flesh this out, but if I were a developer trying to address these, here's the general directions I'd be looking at. If someone wants me to flesh this out, hit me up in early April when I should be more available again. Some of these involve integrating wikitext and SDC, some do not.
- 1) lack of support for multiple languages in categories. If a category has an associated Wikidata item (associated via Commons category (P373) in Wikidata, or possibly just with the interwiki link for the item; I'm going to skip that latter alternative throughout the rest of this and stick to P373), then a bot should be perfectly capable of copying all of the item names, in whatever language, including aliases, into a templated structure on the Commons category page. This would only need to be updated when the Wikidata item changes or the P373 value changes.
- 2) lack of ability to express the relationship between a category and its parent categories. I see several ways to address this; from an information-theoretic point they are very similar, and a mix-and-match via a bot is possible.
- 2a) A template-driven structure in the category to express its relation to each parent category. Probably optional, with the default to presume an is-a (instance of (P31)) relationship. It might be simplest to name these precisely from the P-codes of the relevant Wikidata items, ideally with a tool to make that more human-readable (see (3) below). Again, wherever Wikidata is aware of the category via P373, a bot could do an enormous amount of this work.
- 2b) An extension to "Category" in mediawiki that would allow an equivalent to 2a to go directly into the [[Category]] element. (e.g. a markup like [[Category:Seattle|P=276]] for Category:Buildings in Seattle to say it is located in Seattle. However, I think this is less good than 2a because (1) it would require a change to mediawiki rather than just to tools and (2) it is not good at expressing things like "this category is an intersection of year YYYY and place PLACE".
- 2c) I could imagine something more dynamic that does all of the calculation of Wikidata relationships among categories on the fly when needed. However, I would have to guess that is more computationally intensive, so a less good idea.
- 3) users access SDC through an entirely different mechanism than the more familiar wikitext. I believe this could be solved by a serialization/deserialization approach (serialization: SDC => wikitext; deserialization: wikitext => SDC). Presumably the relevant wikitext would be in the form of templates. Admittedly, the serialization side of this is much easier, so it might make sense to think of it first as a read-only mechanism. However, I could imagine a slow, steady implementation of the deserialization side to let more and more SDC content be edited through the wikitext editor. And, of course, something parallel could be done for less technically-inclined users, using the same approach the WYSIWYG editor on Wikipedia has taken to turn several templates into forms. - Jmabel ! talk 20:10, 24 February 2025 (UTC)
- To a certain extent, what you are saying sounds like reinventing Semantic MediaWiki. But hey, maybe that is a better model for commons. SDC kind of just grafted wikidata on to commons, but commons does not have the same requirements as wikidata. I've only recently tried playing with SDC. So far I think its really cool, but as a newbie to SDC there are lot of ambiguities as to how the data is supposed to be (Is depicts anything in the picture or just the main things?). From what I've seen so far though, i think the big problem with SDC is UI. Its hidden away in a separate tab. Interesting and uninteresting data is mixed together (Why oh why do we a bot manually adding properties for SHA1 of files? That makes no sense. Why is something like a checksum displayed at the same prominence as the creator). More to the point though, there is no way to browse the results (Except externally). How can we expect users to care about entering metadata if they can never see the results? With a category, you click on it and get to see other things in that category. Not so with structured data. SDC is never going to succeed as a way to catalog commons if we don't implement the browse through the catalog part. Bawolff (talk) 05:07, 1 March 2025 (UTC)
Should a "bot cleanup kit" exist?
[edit]In the last 6 months, Commons has had 2 bots have extended issues where they created tens of thousands of invalid edits.
Both times, I cleaned up the mess with massrollback and either account creator or a bot account. But it's a less than ideal solution, as I was hitting 2 thousand EPM while performing rollbacks, and these were not marked as bot edits. So my question is:
Should we create a tool/script/playbook, for doing bot cleanups? I understand bot owners are responsible for the edits made by their bots, but having dedicated tools to rapidly handle 75 thousand rollbacks without causing 5 mins of database lag would be nice. I have been asked frequently is why this can't be done slowly? The problem is that if for any reason, an affected page is edited, any error introduced by the bot can't be fixed easily, often requiring manual correction. All the Best -- Chuck Talk 18:30, 24 January 2025 (UTC)
- We should require bot operators be able to cleanup mistakes they made with their bot. In the bot request they have to confirm that they can also revert the edits they made with the bot. If they can not guaranty this the bot can not be approved. GPSLeo (talk) 19:00, 24 January 2025 (UTC)
- What GPSLeo said. Krd 04:23, 25 January 2025 (UTC)
- @Krd, Seeing as you have handled most of the bot requests from the last 5 years, when does this check happen? And if a bot does mess up and make a bunch of junk edits, should we really be using the same bot to fix it? (unless we have a standardised script to do it, I don't think that's an option) All the Best -- Chuck Talk 04:50, 25 January 2025 (UTC)
- There is no check, but it speaks for itself that if a bot operator messes up in large scale, they are responsible to at least help to clean it up, whatever it takes. Everybody who is running a real bot is able to do that. Perhaps we should consider to be more hesitant on AWB or js gagdet "bots" in the future. In order to understand the actual size problem, perhaps the 2 mentioned cases should be analyzed regarding what exactly went wrong. Krd 05:24, 25 January 2025 (UTC)
- @AntiCompositeNumber was looking into the flickr goof last I heard. As to WLKbot, nothing went wrong per se , but rather the operator @Kim Bach jumped the gun on implementing something, and creating 3k categories that still need cleanup. Also, these were both full bots, not script/AWB bots. Their edits were fixed by a script. Also, @MolecularPilot updated the script, allowing for ratelimiting and marking bot edits (haven't tested the second part yet.) User:MolecularPilot/massrollback.js. Even if commons has our ducks in a row, once the tool exists and is documented, this can be used movement wide, having a much larger impact. All the Best -- Chuck Talk 21:42, 25 January 2025 (UTC)
- Hi! The new version of massrollback supports ratelimiting (you tell it the max number of rollbacks to make in a minute) but it doesn't support marking them as bot edits if you're flagged. I'm working on this part now! :) MolecularPilot (talk) 23:03, 25 January 2025 (UTC)
- Actually, it already did mark bot edits if your flagged, I forgot that I coded that part. So, yeah! :) MolecularPilot (talk) 08:42, 26 January 2025 (UTC)
- Hi! The new version of massrollback supports ratelimiting (you tell it the max number of rollbacks to make in a minute) but it doesn't support marking them as bot edits if you're flagged. I'm working on this part now! :) MolecularPilot (talk) 23:03, 25 January 2025 (UTC)
- @AntiCompositeNumber was looking into the flickr goof last I heard. As to WLKbot, nothing went wrong per se , but rather the operator @Kim Bach jumped the gun on implementing something, and creating 3k categories that still need cleanup. Also, these were both full bots, not script/AWB bots. Their edits were fixed by a script. Also, @MolecularPilot updated the script, allowing for ratelimiting and marking bot edits (haven't tested the second part yet.) User:MolecularPilot/massrollback.js. Even if commons has our ducks in a row, once the tool exists and is documented, this can be used movement wide, having a much larger impact. All the Best -- Chuck Talk 21:42, 25 January 2025 (UTC)
- There is no check, but it speaks for itself that if a bot operator messes up in large scale, they are responsible to at least help to clean it up, whatever it takes. Everybody who is running a real bot is able to do that. Perhaps we should consider to be more hesitant on AWB or js gagdet "bots" in the future. In order to understand the actual size problem, perhaps the 2 mentioned cases should be analyzed regarding what exactly went wrong. Krd 05:24, 25 January 2025 (UTC)
- @Krd, Seeing as you have handled most of the bot requests from the last 5 years, when does this check happen? And if a bot does mess up and make a bunch of junk edits, should we really be using the same bot to fix it? (unless we have a standardised script to do it, I don't think that's an option) All the Best -- Chuck Talk 04:50, 25 January 2025 (UTC)
- What GPSLeo said. Krd 04:23, 25 January 2025 (UTC)
- Is it sufficient to use the existing mw:Manual:Pywikibot/revertbot.py? If not, what is missing? whym (talk) 12:03, 1 February 2025 (UTC)
- Not all bots use pywikibot. And also, I'm not so sure a bot that just screwed up should be doing the cleanup. All the Best -- Chuck Talk 20:32, 1 February 2025 (UTC)
- With the script linked above, you can specify the target user account whose recent edits are to be reverted. The target account doesn't need to be a Pywikibot bot, nor even a bot. whym (talk) 01:53, 2 February 2025 (UTC)
- I didn't know that existed. I have user:chuckbot kicking around with pywikibot and a working backend, so I might do a bot request for that. All the Best -- Chuck Talk 01:57, 2 February 2025 (UTC)
- With the script linked above, you can specify the target user account whose recent edits are to be reverted. The target account doesn't need to be a Pywikibot bot, nor even a bot. whym (talk) 01:53, 2 February 2025 (UTC)
- Not all bots use pywikibot. And also, I'm not so sure a bot that just screwed up should be doing the cleanup. All the Best -- Chuck Talk 20:32, 1 February 2025 (UTC)
Expanding an explanation on the De-adminship policy
[edit]
Make Commons:Civility, Commons:Harassment and Commons:No personal attacks a policy
[edit]- @The Squirrel Conspiracy Do you think it is good to close this that fast? There are many comments mentioning that there is some need to adapt the pages for Commons. If they are now a policy every not very minor change would require separate community confirmation. GPSLeo (talk) 08:45, 1 February 2025 (UTC)
- That's fair, but if the proposed changes are uncontroversial, it should be easy to get a consensus for them, and if they are controversial, they shouldn't be in the policies in the first place.
- Candidly, I'd like to think that after 15 years, I have a good sense for what does and doesn't get done on this project, and I suspect that no one is going to step up and rewrite the policies regardless of whether the discussion stays open for two weeks or not. Happy to be proven wrong, but there are lots of gaps in Commons' bureaucracy and infrastructure that have never been fixed.
- If you want to revert the close, go ahead though. The Squirrel Conspiracy (talk) 10:34, 1 February 2025 (UTC)
- As for the procedure, I would at least wait until one weekend is over, to include people who only find volunteer time in weekends. While I agree with the observation that there have been policy gaps for a long time, but I think the long time span also means that it wouldn't hurt to spend a few more weeks, or at the very least, the proposed 2 weeks. Using Template:Centralized_discussion or even MediaWiki:Sitenotice wouldn't be unreasonable for this, consdiering most users don't frequent to COM:VPP, but are affected. Sitenotice might be more suited for the final decision, though. whym (talk) 02:03, 2 February 2025 (UTC)
- @GPSLeo: Do you think it was premature to promote those pages? I do. Whether we should un-promote them might be a different matter (unless The Squirrel Conspiracy voluntarily undo the changes), though. One remedy might be to recognize that they have community consensus at a general level, and that they are adapted policies and might still have rough edges in specifics. --whym (talk) 10:32, 7 February 2025 (UTC)
- I think is was to early but I would just keep the discussions going in the current way and in some weeks make a final vote to get clear consensus on the final versions. GPSLeo (talk) 18:17, 14 February 2025 (UTC)
- I think Commons:No personal attacks talks too much about articles and article talk pages, which we don't have in general. And I don't know what would be the Commons counterpart to regular disagreements on Wikipedia talk pages. whym (talk) 11:58, 1 February 2025 (UTC)
- Good point! I'd love to see more thought on this. Jerimee (talk) 05:12, 1 March 2025 (UTC)
- See also an ongoing discussion in Commons_talk:Civility#Pre-policy_debates (started a few days ago). whym (talk) 02:18, 2 February 2025 (UTC)
- Similarly, I've started a discussion about making our new outing rules more Commons-specific over on Commons talk:Harassment. --bjh21 (talk) 09:40, 5 February 2025 (UTC)
- In the past before these stringent rules even became "policy" I already saw at least weekly users clashing because of linguistic and cultural misunderstandings.
- How many sysops remember to give the benefit of the doubt before wielding this weapon against minority users?
- LOL.--RoyZuo (talk) 14:02, 5 February 2025 (UTC)
@The Squirrel Conspiracy and Matrix: FYI, the promotion of Commons:Harassment to policy has been undone. Nosferattus (talk) 14:22, 6 February 2025 (UTC)
- To be clear, it was undone by Matrix. — 🇺🇦Jeff G. ツ please ping or talk to me🇺🇦 14:37, 6 February 2025 (UTC)
- Sorry, I didn't see this discussion, I reverted my reversion. —Matrix(!) ping onewhen replying {user - talk? -
uselesscontributions} 18:46, 6 February 2025 (UTC)- I think this indicate that we could have advertised more before voting: putting a notice on the proposed policy page and its talk page. (Not the fault of the proposer - they specifically said voting was to be done later, not immediately.) whym (talk) 11:39, 7 February 2025 (UTC)
- Is there some way we could specifically mark these as draft policy? - Jmabel ! talk 21:15, 7 February 2025 (UTC)
- There's {{Proposed}} which is close and I think includes what you want. —Justin (koavf)❤T☮C☺M☯ 01:11, 8 February 2025 (UTC)
- But does nothing to suggest that it is a largely agreed-upon draft, and we are just hammering out details. - Jmabel ! talk 04:21, 8 February 2025 (UTC)
- Why not use someting generic like {{Notice}} to insert a short text describing that? I realize the repetitiveness but on the other hand, it's just 3 pages. whym (talk) 06:12, 23 February 2025 (UTC)
- But does nothing to suggest that it is a largely agreed-upon draft, and we are just hammering out details. - Jmabel ! talk 04:21, 8 February 2025 (UTC)
- There's {{Proposed}} which is close and I think includes what you want. —Justin (koavf)❤T☮C☺M☯ 01:11, 8 February 2025 (UTC)
Expanding Template:Official Doctor Who YouTube channel and Template:Official Star Wars Flickr stream
[edit]I would like {{Official Doctor Who YouTube channel}} and {{Official Star Wars Flickr stream}} to be expanded to allow the tagging of content from other official Flickr accounts and YouTube channels. Examples of official YouTube channels which could be added include Bravo, Cartoon Network India, FOX Sports, Harry Potter, HBO, MTV Shores, MTV UK, MSNBC, NBC News, NickRewind, Nicktoons, Prime Video AU & NZ, Rooster Teeth Animation, Warner Bros. Games and Warner Music New Zealand. Aside from the aforementioned Star Wars account, I am unable to think of any official Flickr accounts that even seldom publish their uploads under a free license (please let me know of any that do). Thoughts? JohnCWiesenthal (talk) 21:56, 10 February 2025 (UTC)
Possibly easier way to contact local oversighters via Wikimail
[edit]Hello,
I had recently the need to contact our Commons oversighters. I knew from my home wiki, DE-WP, that an option to contact the German oversighting team through Wikimail, using de:User:Oversight-Email, is offered. I was a bit disappointed by that Commons is a notable exception of projects where such an easy-access way is not implemented (it may be the largest project where this is not given, according to the description on de:Benutzer:TenWhile6/OSRequester). Could a Wikimail way of oversight contact be enacted, parallel to the existing way of the mailing list? Regards, Grand-Duc (talk) 12:08, 14 February 2025 (UTC)
Support Major issue with commons oversight. That and adding a T&S role account for wikimail. I think they attached the emergency account, but I'm unsure. All the Best -- Chuck Talk 17:55, 14 February 2025 (UTC)
- I do not think that it is needed to implement this now as we will soon (I think before the end of 2025) have the new meta:Incident Reporting System made exactly to solve this problem. GPSLeo (talk) 18:15, 14 February 2025 (UTC)
- "Soon" and "Before the end of 2025" is somewhat contradictory! Worst case, that's more than 9 months in the future. Can somebody knowledgeable please inform me and whoever is interested on the amount of work needed to setup such a "contact user account"? The lesser the effort, the sooner it should come, even if it is only for a limited amount of use time. It's kinda the same reasoning as a military who refurbishes a warship, only to put it into reserve status a few months later (like it was done on the carrier USS Franklin (CV-13) or several other US Navy ships after V-J day). The use case is clearly shown, the potential replacement functionality has no clear ETA given yet, so waiting for it is non sensible. Regards, Grand-Duc (talk) 19:01, 14 February 2025 (UTC)
- The problem with an account for this is the question who has access to the account. It would need to be all oversighters that they are able to check each other but everyone who is able to access the account would also be able to change to password and exclude everyone else. Additionally the account should definitely use 2FA what makes it hard to be accessible for all oversighters. The worst scenario would be that someone with access to the account changes the email address unnoticed to fish reports. We could decide on one oversighter who owns this account but in the case of problems (loss of rights and not handing over the account or lost contact/dead) we would need to figure out a solution to regain access through the WMF MediaWiki operations team. Because of the potential serious trouble I think we can keep it as we did for more than one decade or one more year until we have a much better solution. GPSLeo (talk) 19:45, 14 February 2025 (UTC)
- "Potential serious trouble"? Do you hint to that people who sign a confidentiality agreement and identify themselves in front of the site operator would regularly go postal and make nasty trouble, breaching privacy for whatever reason? What's the base for the whole adminship, checkusership and trust in licensing, then? Pinging user:Ra'ike and user:Raymond, the first as OS on DE-WP and the second as local OS: do you have any insight on how the German contact user is set up and if a similar thing is also suitable here and now? Regards, Grand-Duc (talk) 21:14, 14 February 2025 (UTC)
- They don't need account access, they just need the email access, someone from the WMF can have the password. this account only needs to forward all emails sent to it from commons to the oversighter's mailing list. En-wiki can do it for 4 different role accounts, we can do 1. All the Best -- Chuck Talk 22:44, 14 February 2025 (UTC)
- It is not possible to have different mail addresses for wikimail and password reset. Therefore the only thing that could be handled by the WMF could be the second factor. But then setting up the account and logging in would be very complicated. GPSLeo (talk) 05:39, 15 February 2025 (UTC)
- @Grand-Duc Some thought from me as Commons OS. Not discussed with the OS colleagues. First I do not see a big advantage of such a Wikimail: One click to open the Wikimail form and one click on the current e-mail-address to open the mail program. Anyway. Technically it is easy: Creation of the OS user account on Commons with the current mailing list email-adress in the preference. All mail will be forwarded to our mailinglist (not moderated!) and can be handled as usually. One caveat: we do not have a safe place to storing the password. Any of us can have it, of course, but when oversighters change, there is no guarantee that the password will be passed on.
- I will ask my OS colleagues today. Raymond (talk) 08:49, 15 February 2025 (UTC)
- Couldn't the foundation have the password? All the Best -- Chuck Talk 23:46, 15 February 2025 (UTC)
- The password doesn't matter much because anyone with access to the mailing list can reset it. AntiCompositeNumber (talk) 03:23, 28 February 2025 (UTC)
- They don't need account access, they just need the email access, someone from the WMF can have the password. this account only needs to forward all emails sent to it from commons to the oversighter's mailing list. En-wiki can do it for 4 different role accounts, we can do 1. All the Best -- Chuck Talk 22:44, 14 February 2025 (UTC)
- "Potential serious trouble"? Do you hint to that people who sign a confidentiality agreement and identify themselves in front of the site operator would regularly go postal and make nasty trouble, breaching privacy for whatever reason? What's the base for the whole adminship, checkusership and trust in licensing, then? Pinging user:Ra'ike and user:Raymond, the first as OS on DE-WP and the second as local OS: do you have any insight on how the German contact user is set up and if a similar thing is also suitable here and now? Regards, Grand-Duc (talk) 21:14, 14 February 2025 (UTC)
- The problem with an account for this is the question who has access to the account. It would need to be all oversighters that they are able to check each other but everyone who is able to access the account would also be able to change to password and exclude everyone else. Additionally the account should definitely use 2FA what makes it hard to be accessible for all oversighters. The worst scenario would be that someone with access to the account changes the email address unnoticed to fish reports. We could decide on one oversighter who owns this account but in the case of problems (loss of rights and not handing over the account or lost contact/dead) we would need to figure out a solution to regain access through the WMF MediaWiki operations team. Because of the potential serious trouble I think we can keep it as we did for more than one decade or one more year until we have a much better solution. GPSLeo (talk) 19:45, 14 February 2025 (UTC)
- The IRS does not currently support oversight requests, if you select the option for "doxxing" it tells you to report it on a public page like it does everything other than threats of harm. The future plans for the tool are unclear, and I have no further information at this point. AntiCompositeNumber (talk) 03:25, 28 February 2025 (UTC)
- "Soon" and "Before the end of 2025" is somewhat contradictory! Worst case, that's more than 9 months in the future. Can somebody knowledgeable please inform me and whoever is interested on the amount of work needed to setup such a "contact user account"? The lesser the effort, the sooner it should come, even if it is only for a limited amount of use time. It's kinda the same reasoning as a military who refurbishes a warship, only to put it into reserve status a few months later (like it was done on the carrier USS Franklin (CV-13) or several other US Navy ships after V-J day). The use case is clearly shown, the potential replacement functionality has no clear ETA given yet, so waiting for it is non sensible. Regards, Grand-Duc (talk) 19:01, 14 February 2025 (UTC)
Support unless this is somehow tremendously more difficult than I can imagine it to be. - Jmabel ! talk 19:44, 14 February 2025 (UTC)
Support --Adamant1 (talk) 03:46, 15 February 2025 (UTC)
Hello, currently both Template:From YouTube and Template:YouTube CC-BY categorize files into Category:Media from YouTube which contains over 200,000 files and user @Trade did ask to diffuse the category. By using #switch function we can change the templates to categorize by file type into Category:Videos from YouTube or Category:Screenshots of YouTube videos where possible.
Here are the modified templatesː User:999real/From YouTube and User:999real/YouTube CC. And an example usage at File:Lil Zane 2024.jpg. Currently Template:YouTube CC-BY is protected for editing to template editors and admins so I can't edit it. REAL 💬 ⬆ 15:42, 23 February 2025 (UTC)
Support --Trade (talk) 21:39, 23 February 2025 (UTC)
Add an outcome of LicenseReview
[edit]Add an outcome "indeterminable / review impossible" for Template:LicenseReview.
Reason:
for example, File:Jordan protest in front of police2.PNG claims to be made by VOA, but because the youtube video is gone it's impossible to verify. nonetheless, the claim appears trustworthy, so it doesnt seem appropriate to either pass or fail it. a sensible thing to do would be to simply remove the Template:LicenseReview.
But, simply removing it will not prevent certain users slapping the template on it again.
As such, add an outcome, termed "indeterminable" or "review impossible", to signify that users have tried to review the file but could not succeed because source is gone, but there is no significant doubt about the authenticity of the claim, so the file is tolerated. RoyZuo (talk) 10:40, 25 February 2025 (UTC)
Support - Jmabel ! talk 21:14, 25 February 2025 (UTC)
Support per Commons:Village_pump#Category:License_review_needed and the links mentioned there. --MGA73 (talk) 07:49, 28 February 2025 (UTC)
Support Christian Ferrer (talk) 08:44, 28 February 2025 (UTC)
Support --Yann (talk) 09:24, 28 February 2025 (UTC)
Support --Grand-Duc (talk) 13:25, 28 February 2025 (UTC)
Comment Your example of File:Jordan protest in front of police2.PNG is actually one on which I, as a reviewer myself, found a roundabout way to pass the review. Fortunately, VOA maintains a directory of contributors, including the editor named as author in the example file. Google made me find the listing of contributions of Elizabeth Arrott on the VOA page, where this is listed: https://www.voanews.com/a/jordanian-protests-call-for-revolution-toppling-of-king/1547601.html . The date and general appearance of the images there fit our upload, hence I passed the review. Regards, Grand-Duc (talk) 13:25, 28 February 2025 (UTC)
Upgrade Commons:Overwriting existing files to be a policy
[edit]
Consent query
[edit]In theory, photographs licensed on Commons are expected to have the informed consent of the people depicted in the photo: Resolution:Images_of_identifiable_people (2011)
In practice, this is rarely even asserted, let alone documented or enforced. There are less than 20,000 files with consent assertions: Consent tracking
I propose that this might be improved by adding {{consent|query}} to photos that meet some set of criteria. Perhaps those where:
- An individual is the prominent focus of the photo
- An individual is not posing or making eye contact
- An individual is in a state of undress ("Nudes, underwear or swimsuit shots")
If nothing else, adding the consent query tag will increase awareness that informed consent is an expectation. And I'm assuming that this one of the reasons the consent query template was established.
What is a responsible and productive way to go about this? Jerimee (talk) 22:41, 25 February 2025 (UTC)
- Are you saying that any one of these criteria triggers a need for consent, or only the combination of all three?
- Even if the latter, I can immediately think of photos I've taken that fall under all three of the criteria you have mentioned and which certainly does not require more explicit consent (examples are necessarily NSFW because of criterion 3):
- File:2017 Fremont Solstice Parade - cyclists prepare 020.jpg, File:Fremont Solstice Parade 2010 - 92.jpg, File:2017 Fremont Solstice Parade - cyclists prepare 057.jpg. The people who are the prominent focus of these are all naked or nearly so, are not making eye contact, and there is no doubt in the U.S. that if they are doing this in the completely public situation of being in a parade, no further overt consent is needed. Indeed, it would be seen as very odd to ask for more formal consent, to the point where they would probably be very suspicious of the motivation of anyone who asked for that consent (especially because, presuming the photographer was clothed, it would be very awkward for that photographer to expect the people in the parade to pay attention to them). - Jmabel ! talk 03:12, 26 February 2025 (UTC)
- I'm not sure exactly; I appreciate input. I understand why photos of that activity may not require (additional) consent. I appreciate that example.
- I expect the criteria will need to refined based on response and what is learned by doing more of this. For my part, I know I will often get it wrong and apply the tag where it isnt much needed, especially at first. I will need to learn and improve.
- One thing that I find confusing/tricky is "What constitutes public space?" I feel like this is highly circumstantial and varies considerably from culture to culture etc.
- I feel strongly that people who take photos in the context of cultures they do not belong to... I feel like that is more problematic than people taking photos of cultures they belong to or are culturally fluent in. (I've phrased this poorly!) Jerimee (talk) 18:45, 26 February 2025 (UTC)
- As reference: Commons:Deletion requests/File:Kochendes Paar in einer Küche 2017-01-15.jpg and Commons:Deletion requests/File:Little-girl-570864 1280.jpg, which both touch on your very subject. I support your notion of processing and documenting consent, though. Regards, Grand-Duc (talk) 19:57, 26 February 2025 (UTC)
- I'm no expert, but those look to me like photos where the subjects are clearly posing/posed, especially the first one. This seems visibly apparent to me; I don't exactly know why... They seem professional, deliberate, staged/not candid, purposefully arranged, and studio lit. So excellent examples of what we are not trying to flag. Jerimee (talk) 06:02, 27 February 2025 (UTC)
- As reference: Commons:Deletion requests/File:Kochendes Paar in einer Küche 2017-01-15.jpg and Commons:Deletion requests/File:Little-girl-570864 1280.jpg, which both touch on your very subject. I support your notion of processing and documenting consent, though. Regards, Grand-Duc (talk) 19:57, 26 February 2025 (UTC)
- I don’t like the overt focus on perceived nudity. Nudists or Himba women probably do not care that they are being photographed in what western society considers a “state of undress”, and even in fairly conservative western nations most people would not consider photographing someone in a swimsuit a violation of privacy on a public beach. Dronebogus (talk) 15:38, 28 February 2025 (UTC)
- Whether you like it or not, nudity is great. Lack of consent is not, especially when the unconsented media is globally licensed such that the depicted person has no control over how their likeness is used.
- Nudists and Himba women do not need you to speak for them. Unless of course you are a Himba woman in which case your comment is entirely appropriate and appreciated. Unsourced hypothetical assertions about groups of people made by people who do not belong to those groups aren't especially helpful.
- Who goes to a beach and intentionally photographs strangers without their consent? Nudists, since you mentioned them, explicitly do not do this. 1 2 3 4
- This discussion is not meant to be a forum for criticizing our existing consensus guidelines. When you say you
don't like the overt focus
, was that meant to suggest that consent queries be focused elsewhere? If so, where would that be? How can we improve our work together? Jerimee (talk) 05:05, 1 March 2025 (UTC)