Commons:Village pump/Technical

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Shortcuts: COM:VP/T • COM:VPT

Welcome to the Village pump technical section
Technical discussion
Village pump/Technical
 Bug reports
 Code review
Tools
 Tools/Directory
 Idea Lab



This page is used for technical questions relating to the tools, gadgets, or other technical issues about Commons; it is distinguished from the main Village pump, which handles community-wide discussion of all kinds. The page may also be used to advertise significant discussions taking place elsewhere, such as on the talk page of a Commons policy. Recent sections with no replies for 30 days and sections tagged with {{Section resolved|1=--~~~~}} may be archived; for old discussions, see the archives; recent archives: /Archive/2024/07 /Archive/2024/08.

Please note
 
SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 1 day and sections whose most recent comment is older than 30 days.

Protection level

[edit]

File:Kalocsaizsuzsa.jpg is autopatrol protected so why is there (protectedpagetext: editprotected, edit) system message on it? Is the Wikibase part of the page indeed under full (sysop) protection? --Geohakkeri (talk) 21:40, 12 June 2024 (UTC)[reply]

That is weird. The text This page is currently protected, and can be edited only by administrators. comes from Template:Protectedpagetext/PageProtected, but the expected text is at Template:Protectedpagetext/PageAutopatrolProtected. Both of these are transcluded by MediaWiki:Protectedpagetext depending on its first parameter $1. Here's the wikitext:
{{#switch: {{{1|$1}}}
 | editprotected = {{Protectedpagetext/PageProtected}} <!-- Fully protected -->
 | templateeditor = {{Protectedpagetext/PageTemplateProtected}} <!-- Template protected -->
 | editautopatrolprotected = {{Protectedpagetext/PageAutopatrolProtected}} <!-- editautopatrolprotected -->
 | #default = {{Protectedpagetext/PageSemiProtected}} <!-- Semi-protected -->
}}
Per mw:Manual:Interface/Protectedpagetext: $1 - the raw name of the right which is needed to edit the page. Special:ExpandTemplates for page File:Kalocsaizsuzsa.jpg and wikitext {{PROTECTIONLEVEL:edit}} gives editautopatrolprotected, as expected, but "protection level" and "name of the right" might not be the same. —⁠andrybak (talk) 20:08, 15 June 2024 (UTC)[reply]
editautopatrolprotected was added to MediaWiki:Protectedpagetext in Special:Diff/853065284 by User:GPSLeo, who is also the author of Template:Protectedpagetext/PageAutopatrolProtected. Perhaps they can check what went wrong. —⁠andrybak (talk) 20:15, 15 June 2024 (UTC)[reply]
This is the relevant code, I guess. There are editprotected and editsemiprotected hardcoded as the only options there. --Geohakkeri (talk) 20:51, 15 June 2024 (UTC)[reply]
So, if MediaWiki:Protectedpagetext depended on {{PROTECTIONLEVEL:edit}} rather than the proper parametre, it would be a quick fix at least. --Geohakkeri (talk) 21:11, 15 June 2024 (UTC)[reply]
Hmm. For reference, English Wikipedia's en:MediaWiki:Protectedpagetext has a similar #switch, with protect, editprotected, templateeditor, and extendedconfirmed.
Searching the code of MediaWiki,[1] I also found mentions of Protectedpagetext in PermissionManager.php,[2] which passes as the first parameter $1 either the string protect or a variable $right, which comes from function getRestrictions of RestrictionStore. My knowledge of PHP is limited, but I'd guess that possible values for restrictions come from $wgRestrictionLevels, hence templateeditor and editautopatrolprotected in Commons' version and templateeditor and extendedconfirmed in enwiki's version. —⁠andrybak (talk) 22:15, 15 June 2024 (UTC)[reply]
{{MediaWiki:Protectedpagetext|{{PROTECTIONLEVEL:edit}}}} on File:Kalocsaizsuzsa.jpg would display Template:Protectedpagetext/PageAutopatrolProtected.
Wonder if it works correctly on enwiki, w:Special:WhatLinksHere/Template:Protected_page_text/extendedconfirmed has no uses.
https://commons.wikimedia.org/wiki/File:Kalocsaizsuzsa.jpg?uselang=qqx shows (protectedpagetext: editprotected, edit)
Not sure what {{CASCADINGSOURCES}} is meant to do.
Maybe we could insert a switch based on Protectionlevel after "|editprotected =". Enhancing999 (talk) 15:15, 30 June 2024 (UTC)[reply]
I tried to look into this at test.wikipedia.org, but the settings there are different.
Here what is displayed on a file description page comes from javascript var "wbmiProtectionMsg". This might be filled with the actual protection level of the javascript source. Enhancing999 (talk) 12:16, 1 July 2024 (UTC)[reply]
I think CASCADINGSOURCES means sources of cascade protection. Alfa-ketosav (talk) 16:17, 19 July 2024 (UTC)[reply]

Footnotes

  1. git grep -i protectedpagetext -- '.' ':^languages/'
  2. note the manual mapping of sysop and autoconfirmed needed for backwards compatibility

Interface administrator requests at MediaWiki talk:Gadget-Cat-a-lot.js

[edit]

There are several edit requests for interface administrators at MediaWiki talk:Gadget-Cat-a-lot.js. The following edit requests have diffs with proposals. In order of importance:

  1. Bug fix: MediaWiki talk:Gadget-Cat-a-lot.js/Archive 3#Minor edit unmarking feature not working (Special thanks to User:Miraclepine for reporting the bug.)
  2. Localization fix: MediaWiki talk:Gadget-Cat-a-lot.js/Archive 3#Mobile-frontend-return-to-page
  3. UI tweak: MediaWiki talk:Gadget-Cat-a-lot.js/Archive 3#Please add link to Help:Gadget-Cat-a-lot in the box

The page MediaWiki talk:Gadget-Cat-a-lot.js already has instances of {{Edit request}}. Because of it, these new requests won't show up in watchlists of those watching Category:Commons protected edit requests for interface administrators. Hence this additional message at Village pump. —⁠andrybak (talk) 16:31, 15 June 2024 (UTC)[reply]

Lucas Werkmeister, as the most recently active interface administrator with recent edits in Gadgets, could you please take a look? —⁠andrybak (talk) 19:46, 16 June 2024 (UTC)[reply]
Did two of them, leaving the third one open for feedback for a moment. And yeah, the watchlist issue is a general problem with the current edit request system – MediaWiki talk:Copyupload-allowed-domains also suffers from it from time to time. Lucas Werkmeister (talk) 21:03, 16 June 2024 (UTC)[reply]
Thank you! I've struck out the completed requests above. —⁠andrybak (talk) 21:20, 16 June 2024 (UTC)[reply]
Third one also done, and I’ll see if I can deal with Valerio’s edit request too, to get this out of the category. Lucas Werkmeister (talk) 20:03, 19 June 2024 (UTC)[reply]
I've disabled Valerio's request. Nardog proposed a bugfix two days ago in MediaWiki talk:Gadget-Cat-a-lot.js § Random unexpected failures at enwiki. —⁠andrybak (talk) 00:01, 25 June 2024 (UTC)[reply]
Updated the links to the archived sections. Struck out the third request, which was implemented in Special:Diff/885487790. —⁠andrybak (talk) 19:24, 27 June 2024 (UTC)[reply]
Section MediaWiki talk:Gadget-Cat-a-lot.js#Random unexpected failures at enwiki has a patch, which is already tested. Could an interface administrator please take a look? —⁠andrybak (talk) 20:24, 8 July 2024 (UTC)[reply]
User:AntiCompositeNumber or User:Mike Peel, could you please take a look at the edit request by Nardog: MediaWiki talk:Gadget-Cat-a-lot.js#Random unexpected failures at enwiki? —⁠andrybak (talk) 13:28, 13 July 2024 (UTC)[reply]

Upload functions used by various tools

[edit]

Just wondering, is there a technical difference in the backend between the following ways:

Some observations:

  • I'd expect #1 and #2 to be the same, but somehow uploads are less likely to fail if one creates the file description page first and then uses the "upload" link there (#2).
  • The documentation for #4 mentions the api. Presumably this is the same being used by #5. The test I did with #4 seemed to work better than #3 usually does.

If ask for ranking the reliability of these tools, I'd say #5/#4, #2, #1, #3. Enhancing999 (talk) 15:29, 24 June 2024 (UTC)[reply]

Still curious about this. Maybe #2 works better than #1 as it doesn't involve creating the page. Enhancing999 (talk) 11:40, 24 July 2024 (UTC)[reply]
  • 1 and 2 are exactly the same. But indeed if there is no page yet, more operations are involved. And if you upload a new version of the same file, there will also be more operations that are involved and all need to succeed (archiving the old file).
  • 3 uses Chunked uploading, which is a lot more complex than 1 and 2, but can also support much larger files
  • 4 uses JS to upload to the api. This is another entry point into 1/2, but behind the entry point it works identical.
  • 5 uses the same apis as 4 and 3 (and can do both chunked and non-chunked)
The backend is not the full story however. Each frontend/entrypoint has to implement multiple 'recovery' procedures that may improved reliability of uploading. Session expiration, dropped connection, token refreshing etc. all can be handled by each entrypoint (or not). —TheDJ (talkcontribs) 13:38, 24 July 2024 (UTC)[reply]

It appears that the most recent version of this file (which, according to the talk page is a 4K restored version of the film) was not uploaded properly and cannot be played: "No compatible source was found for this media." Can someone please fix this? Johnj1995 (talk) 03:16, 4 July 2024 (UTC)[reply]

@Johnj1995: Hmm, the raw webm file seems to work fine, but it won't play in the Media player. I would suggest filing a Phabricator ticket about it. You may need to revert it to the previous version for now. Nosferattus (talk) 22:32, 7 July 2024 (UTC)[reply]
@Nosferattus: Per the uploader's comment on a featured media nomination for another film that cannot be played, the error is related to this Phabricator ticket: https://phabricator.wikimedia.org/T357215 Johnj1995 (talk) 03:27, 8 July 2024 (UTC)[reply]

SVG rendering on election maps

[edit]

I just uploaded a series of new maps for Icelandic parliamentary elections. I am seeing that despite the files being very similar, there are some inconsistencies with rendering of certain text elements. The circles should have abbreviations of the district names, these only appear in the 2021 map. In front of the party names there are boxes with the letters used to identify the parties, these sometimes don't show up. I have no idea why this happens. The font used is DejaVu sans which should work fine with Wikimedia. Bjarki S (talk) 09:41, 4 July 2024 (UTC)[reply]

I have identified the problem. For what ever reason, Inkscape decided to leave the coordinates (Y and X) of the missing elements as 0 in the tag tspan id. I'm fixing this manually in the XML editor. Bjarki S (talk) 10:16, 4 July 2024 (UTC)[reply]

Occupation "greek-catholic priest" instead of "politician" in Wikidata Infobox

[edit]

Is it just me or is the Wikidata Infobox at Category:Iriana saying that Iriana's occupation is "greek-catholic priest" instead of "politician"? I checked the Wikidata entry on her and on "politician" and it says correctly "politician". Where is the "greek-catholic priest" coming from? (note: I'm accessing the page via mobile browser. I've checked mobile view and desktop view on mobile browser but the infobox display is the same.) Nakonana (talk) 20:12, 4 July 2024 (UTC)[reply]

Just checked infoboxes of other politicians on Commons and they all list "greek-catholic priest" as occupation instead of "politician". Nakonana (talk) 20:14, 4 July 2024 (UTC)[reply]
Maybe mention it at Template talk:Wikidata Infobox. Seems to come from [1]. Enhancing999 (talk) 08:12, 5 July 2024 (UTC)[reply]
As this is already reverted purging the page to clean the cache should solve this. GPSLeo (talk) 08:32, 5 July 2024 (UTC)[reply]
Would you kindly do so? Enhancing999 (talk) 08:33, 5 July 2024 (UTC)[reply]
Up to 147559 category pages are concerned: [2], but it seems to be better now. Enhancing999 (talk) 05:58, 6 July 2024 (UTC)[reply]
Looks like it got fixed now. Nakonana (talk) 11:42, 6 July 2024 (UTC)[reply]
A manual purge of Category:Iriana does not seem to do the trick. Nakonana (talk) 15:56, 5 July 2024 (UTC)[reply]

Harvest coord from metadata

[edit]

somehow coord of File:Ccmhj.jpg from an iphone 14 pro was not detected by commons. a bot to check metadata and fill the coords into sdc would be nice. RZuo (talk) 08:57, 6 July 2024 (UTC)[reply]

Annotations not showing

[edit]

It seems I was able to add image notes in the past here but now I am unable to see them or the add note button - File:Coleman_Bangalore_entomologists.jpg - any way to turn on the annotation button which shows up on other images? Shyamal L. 11:10, 6 July 2024 (UTC)[reply]

Hi Shyamal, I am not sure if your problem is related but Fix the Image Annotator may be relevant. Commander Keane (talk) 05:59, 8 July 2024 (UTC)[reply]
@Commander Keane: Added my support. Jeez, never knew we could be that helpless in the open source world. Shyamal L. 06:02, 8 July 2024 (UTC)[reply]
@Shyamal: I think voting has closed for that RfC. I support a techical needs survey that is always open to suggestions and voting on Commons though. Commander Keane (talk) 06:09, 8 July 2024 (UTC)[reply]

Automatic categorization of subtitles needs to be renamed

[edit]

If a video (e.g. File:1952. Аленький цветочек.webm) has Slovene subtitles (e.g. TimedText:1952._Аленький_цветочек.webm.sl.srt), then it is categorized in Category:Files with closed captioning in Slovenian, but the main category (and English Wikipedia article, for what it's worth) are called "Slovene", not "Slovenian", cf. Category:Slovene language. —Justin (koavf)TCM 05:50, 8 July 2024 (UTC)[reply]

✓ Done Special:Diff/894319705 --Geohakkeri (talk) 06:21, 8 July 2024 (UTC)[reply]
hvala. —Justin (koavf)TCM 16:22, 8 July 2024 (UTC)[reply]

Tech News: 2024-28

[edit]

MediaWiki message delivery 21:28, 8 July 2024 (UTC)[reply]

Help needed from admins speaking javascript

[edit]

I am working on a backlog of {{Edit request}}s. I can handle most file, template and Lua requests but I do not speak javascript. Can an admin help with requests at Category:Commons_protected_edit_requests_for_interface_administrators? Jarekt (talk) 17:23, 9 July 2024 (UTC)[reply]

A gadget to mute audio of a video with one click

[edit]

Is there any gadget/tool/proposal for such a button on pages for videos that have audio?

I think many files in Category:Videos featuring unidentified music need their audio muted and one example case of a video that (as far as I can see) needs to be muted is File:Beijing to Shanghai by train timelapse.webm.

It would be very cumbersome if one first needs to download a large video, modify it somehow (which most users can't readily, don't bother doing, or would take them long), and then reupload as a new version before tagging the page with {{Overwritten revdel}} which probably even most active users don't know about (and adding Category:Videos without audio).

Instead, it should be just a click that makes the server run some ffmpeg command to remove the audio or similar. I don't know if this has been proposed somewhere if it doesn't yet exist. Prototyperspective (talk) 22:14, 12 July 2024 (UTC)[reply]

you imported the example video.
if you were not sure that the music is free, then you should have imported only the video using v2c! RZuo (talk) 05:43, 13 July 2024 (UTC)[reply]
Yes, I noticed it only afterwards and this made me wonder about such a button; your comment is not helpful. Prototyperspective (talk) 10:22, 13 July 2024 (UTC)[reply]
But why do you trust that the copyright statement at the source is correct for the video but not for the audio? GPSLeo (talk) 12:26, 13 July 2024 (UTC)[reply]
Because it was self-recorded by the youtuber who set this license? Also not helpful and offtopic. Prototyperspective (talk) 12:28, 13 July 2024 (UTC)[reply]

DelReqHandler broken for April requests?

[edit]

It seems that something broke the DelReqHandler tool on Commons:Deletion requests/2024/04, the usual links for closing requests don't appear there, any idea how to fix? Gestumblindi (talk) 11:18, 14 July 2024 (UTC)[reply]

DelReqHandler links appear for requests from April 18 and newer, but not for older April requests. I suppose something around April 18 went wrong? Gestumblindi (talk) 18:59, 15 July 2024 (UTC)[reply]
Issue still persists. Gestumblindi (talk) 09:19, 24 July 2024 (UTC)[reply]

New technical problem with generation of SVG preview images

[edit]

The preview images of File:MitigationOptions costs potentials IPCCAR6WGIII rotated-de.svg are broken. They used to be rendered and shown correctly. Since the graphic hasn't changed sind March 2023, it appears something with the SVG renderer ist broken. Does anyone know what happened? --DeWikiMan (talk) 14:56, 14 July 2024 (UTC)[reply]

Possibly the use of fill:currentColor and stroke:currentColor. WMF supports SVG 1.1. File claims to be SVG 1.1 (which uses a subset of CSS 2), but currentColor is from CSS 3. The value is supposed to select the current value of the color property. GNUPlot is not emitting SVG 1.1. The WMF renderer was changed (April 2024?) to a version that is a few years behind the latest release. Maybe a more recent version of librsvg (the WMF renderer) supports the property. Glrx (talk) 01:40, 15 July 2024 (UTC)[reply]
Thank you Glrx for the suggestion.
I substituted all occurences of "currentColor" with "black". The SVG 1.1 validator basically says that it is correct now (except for the RDF metadata and inkscape elements, see |validator.nu. I also tried to save it as "plain svg" from Inkscape. Uploaded both to Commons. Neither did help.
I ran rsvg-convert (version 2.52.5) on it and it gave a "rendering error: InvalidMatrix", whatever that means...
Do you have any further suggestions? I'd really appreciate it.
--DeWikiMan (talk) 17:58, 15 July 2024 (UTC)[reply]
Creating "optimized SVG" from Inkscape did the trick. I don't know exactly why. I believe, the problem could be related to this librsvg problem [3]. Probably, one of the transform matrices was not invertible. In such a case, the librsvg version which is now used on Commons, possibly does no longer ignore the transform, but fails and stops rendering.
--DeWikiMan (talk) 19:16, 15 July 2024 (UTC)[reply]
@DeWikiMan: Looks like you found the answer 30 minutes later. Glrx (talk) 19:21, 15 July 2024 (UTC)[reply]

MediaWiki internal error

[edit]

Accidentally set the license tag to {{|cc-by-sa-4.0-sikander}} instead of {{cc-by-sa-4.0-sikander}} on File:LCBO strike - Market street - 20240713C.jpg and got this error:
MediaWiki internal error.
Original exception: [a752adf9-f969-4f5e-b251-829dc2d1186e] 2024-07-14 20:57:43: Fatal exception of type "Wikimedia\Rdbms\DBUnexpectedError"
Exception caught inside exception handler.
Set $wgShowExceptionDetails = true; at the bottom of LocalSettings.php to show detailed debugging information.

Should I report this somewhere other than here? Regards // sikander { talk } 🦖 21:03, 14 July 2024 (UTC)[reply]

@PantheraLeo1359531: No, not happening now. Got that error a few times when updating the files but after a few minutes it started working fine. // sikander { talk } 🦖 16:54, 15 July 2024 (UTC)[reply]
Good, I assume it was only a shorter temporarily error ;) --PantheraLeo1359531 😺 (talk) 18:18, 15 July 2024 (UTC)[reply]

Tech News: 2024-29

[edit]

MediaWiki message delivery 01:28, 16 July 2024 (UTC)[reply]

Query to find dates of DR items

[edit]

Hi, the first list of files in the DR Commons:Deletion requests/Professional wrestling magazines has copyright issues depending on the date; if published after ~23 October 1987 then they are likely to be deleted. Many of the files only have year in the filename (and some are missing year in the filename), but most seem to have a specific date on the file itself - e.g. File:Ric Flair, circa Spring 1987 (cropped).jpg has 1987 in filename but a date of 1 March 1987 on the file. Is it possible for someone to run a query or join to bring the date from each file into the DR, so that the closer can identify which fall before 23 October 1987 and are eligible for deletion (depending on the DR decision)? Consigned (talk) 13:06, 17 July 2024 (UTC)[reply]

✓ Done, thank you Geohakkeri! Consigned (talk) 16:52, 17 July 2024 (UTC)[reply]

Problems with PDF Preview

[edit]

Hello, I noticed since a few hours ago the pdf preview function is bugging. I've been uploading slides for sometimes today and noticed that I neither could see the thumbnail nor see the preview in file pages. I did check with my friends, one who used same network as I used, and other who worked in other location, and also using my phone with different networks. All of them reported the same problem. Is this known bug? Or its the problem with my files? Thank you Hisyam Athaya (WMID) (talk) 09:25, 18 July 2024 (UTC)[reply]

It seems to be a known problem. @Sannita (WMF): is there a plan to fix it? Enhancing999 (talk) 17:48, 22 July 2024 (UTC)[reply]
The preview worked again, I asked other people who worked a lot with Commons and they confirmed this is a known problem. Hisyam Athaya (WMID) (talk) 02:19, 23 July 2024 (UTC)[reply]
It still needs to be fixed. Enhancing999 (talk) 07:18, 23 July 2024 (UTC)[reply]
AFAIK there are several tickets on Phabricator on the topic, so it is a known bug. I don't know which team has it, though, and I'm afraid the priority is not high on this. I'll try to investigate this. Sannita (WMF) (talk) 15:34, 23 July 2024 (UTC)[reply]

Hosting of free fonts in Commons

[edit]

As technical aspects of the following RfC, I thought it can be a good idea to crosspost link of this RfC Commons:Requests for comment/Hosting of free fonts in Commons in technical village pump. Pardon me if you see unsuitable or already visible enough. Thanks 😊 −Ebrahimtalk 12:48, 18 July 2024 (UTC)[reply]

Good idea, thank you for bringing this up :) --PantheraLeo1359531 😺 (talk) 07:25, 19 July 2024 (UTC)[reply]

SVG abruptly not displaying

[edit]

At some point within the last month File:LGBTCannabis_white.svg stopped displaying, and it's unclear to me why as this file was uploaded in 2020 and hasn't recently been changed other than being added to an additional category. Clicking 'Original file' gives a 'XML Parsing Error: prefix not bound to a namespace' error, while clicking any of the resolution PNG previews gives a 429 error. This error doesn't seem to be something on my end as I asked someone else on a different computer from a different internet connection to take a look and they confirmed that it's broken for them as well. Apologies if this is a known issue that's being worked on or something a-la graphs extension - I don't frequent Commons. Waxworker (talk) 05:08, 19 July 2024 (UTC)[reply]

@Waxworker: File:LGBTCannabis_white.svg is not a valid XML file. The Commons SVG renderer librsvg was recently upgraded from 2.44 to 2.50, which uses a stricter XML parser, resulting in this error. The fix here is to add a xmlns:sodipodi namespace declaration or just remove the sodipodi:nodetypes attribute. Other files affected by this:
Dexxor (talk) 09:49, 19 July 2024 (UTC)[reply]

The link on this template showing the copyright notice does not function, perhaps it is outdated. I mean the link in the sentence "The text of permission is available here." The current link on "here" is https://mosreg.ru/about/, which is not accesible. The correct link would be https://mk.mosreg.ru/o-sayte . Please can some template expert correct the link, I am not aware of all template technicallities. Regards, Ellywa (talk) 08:44, 19 July 2024 (UTC)[reply]

OCR to auto-categorize maps / charts by year shown

[edit]

Is there any gadget/tool for optical character recognition (OCR) of files on Wikimedia Commons?

If there is no such thing it would be really great if somebody could give it a try, it could be very useful.


I'd like to categorize Our World in Data maps by the year of the data into Category:Maps of the world by year as well as OWID charts by the latest data point into Category:Charts by year of latest data.

This is useful for many reasons such as making things in the image explicit as metadata, making things queryable (for example combining cats using petscan), statistics, search (see the search box), better enabling people to find the latest version for some data, better WMC search engine results, and (probably most importantly) updating outdated/old datagraphics that are in use (GLAMorgan can be used for that).

The issue there is that there are really many OWID files (which should now all be in the OWID category) and there may be even far more once people upload "image stacks" for the OWID Gadget if that is the way used to display more interactive OWID data (which I oppose as suboptimal).

One could go through the former manually which also has the advantage that many of these are missing one or a few other categories but the second one really has too many items to do that manually and again more OWID datagraphics keep getting uploaded and this isn't only about OWID datagraphics (there's also other cats one could scan).

See also my related comment here that is about machine vision on WMC more generally or automated species identification: …open letter…#Image recognition software for categorisers.

In my example usecase, an OCR Commons tool could for example OCR read all numbers in a file (files of the petscan results) and then (if it found one or a plausible one) set the category for the latest year that is ≤ current year. Prototyperspective (talk) 11:43, 19 July 2024 (UTC)[reply]

For Category:Images by text that could be helpful too. Ideally one could choose
  • a word, group of words, or category tree
  • define a maximum number of words or characters that should be on an image (sample: less than 5 words). This to avoid doing OCR on lengthy texts.
Then confirm suggestions made by OCR. Enhancing999 (talk) 12:21, 19 July 2024 (UTC)[reply]
SVG file to OCR
I do not know about gadgets.
There is an OCR tool.
See https://ocr.wmcloud.org/ for direct interface and API documentation.
It will work with PNG files but not SVG files (which can be converted to PNG and then OCR'd).
One can get the URL for a PNG rendering of an SVG file. Here's a conversion that is 887 pixels wide
Here's a Polish OCR run on that PNG:
So the Polish text is (converting Unicode code points to Unicode)
  • Typ ściągający
  • Typ naciskający
  • Typ obustronny
But why OCR an SVG file? The PetScan query shows SVG files that have text elements.
With JavaScript, read the SVG file with the Fetch API, grab the text elements with getElementsByTagNS(nsSVG, "text"), ask for the .textContent of each text element, and then search that string for the years or terms you want.
I do not know about the rest of the task.
Glrx (talk) 14:57, 19 July 2024 (UTC)[reply]
Wow great so around 70% of this already exists! Thanks a lot for this info. Now it basically only needs a way to make it scan files in petscan results.
SVG files always have a PNG file linked beneath them so they don't need to be converted again.
However, SVG files already have the text as plain text in them so rather than OCRing them it would be better if they the text contained in them was read somehow. However, that (which you also described in your bottom paragraph) is not needed here:
I tested it like so with a PNG render underneath File:Death-rate-smoking,1996.svg and it worked very well.
If there was a tool where one can e.g. enter a petscan ID and it makes these requests the other thing needed would be
  1. the small code that checks for the latest plausible year-number (and either in the first few lines / title or not in the same line as Data source)
  2. a bot that adds the categories to the files accordingly.
Is there a developer here who is interested in building these three missing parts assuming they don't also exist already? Prototyperspective (talk) 15:37, 19 July 2024 (UTC)[reply]
https://ocr.wmcloud.org/ interesting tool. Quite surprising what OCR on photos actually gives. I tried:
Both found "rue des lauriers", but the first also a motto and the second part of sticker from a key service on the pole ;)
Maybe OCR could be added automatically on upload and stored somehow to be searchable. Possibly, as structured data so it's editable. Enhancing999 (talk) 10:49, 22 July 2024 (UTC)[reply]
About SVG: ideally the text would be rendered on the file description page separately. Maybe that's something that can be added through LUA directly on Template:Information Enhancing999 (talk) 17:46, 22 July 2024 (UTC)[reply]
I added a request for that at Template_talk:Information#Output_SVG_text. Enhancing999 (talk) 10:18, 29 July 2024 (UTC)[reply]

Characters Not Entering Properly

[edit]

I have been having a strange error where certain characters will not enter properly when editing Wikimedia Commons. For example, typing two left brackets ("[[") converts both of them into a "ʽ". The same happens in reverse for two right brackets. However, it only happens when typing them sequentially. In other words, if I type one, then move the caret to the left and type the second one they remain brackets. Similarly, copy-pasting them from somewhere else also doesn't cause any issues. In another case, typing an asterisk ("*") results in what is apparently a diaeresis (It won't reproduce ). This only happens on Wikimedia Commons and not Wikipedia or any computer program. However, it does occur on both my userpage and this post. Any idea what is causing this and how to fix it? –Noha307 (talk) 17:33, 22 July 2024 (UTC)[reply]

@Noha307: Focus the Commons search bar. If you see a little keyboard icon as depicted, click it and select “Disable input tools.” --Geohakkeri (talk) 19:13, 22 July 2024 (UTC)[reply]
Hey, that fixed it! Thank you! Noha307 (talk) 21:53, 22 July 2024 (UTC)[reply]

Tech News: 2024-30

[edit]

MediaWiki message delivery 00:01, 23 July 2024 (UTC)[reply]

PD template error: author "I, John Doe"

[edit]

Hi, I found an irritating error in early PD template (2007-2008) and assume there are more than 23K instances of it's faulty use. check

The template creates a set of three lines that adds an "I" to the author's name in two of them. Obviously derrived from " I, John Doe, the copyright holder" it mentions author's name as "I, John Doe". I am not exactly sure where this error lives (I can see that it is on the pages now, inside the PD|self template). I see it as very irritating and kind of disrespectful towards the creators to missspell their names this way adding a random "I" to their name. Does anyone have a good approach to fix thse instances and check the template? Thanks, I saw this in the Dutch translation template. Peli (talk) 23:08, 23 July 2024 (UTC)[reply]

This is a good candidate for Commons:Bots/Work requests. I went ahead and made a request to fix this issue at here. —CalendulaAsteraceae (talkcontribs) 06:53, 24 July 2024 (UTC)[reply]
Thanks, great move. But I'd like to add that the 'list' is just a kind of educated guess, created by a certain search key, I was not able to check the text in a all or in a significant number of the real pages. The test was just confirmed by looking at a very small number of pages in the first page of the results. Peli (talk) 07:13, 24 July 2024 (UTC)[reply]
That's why I asked for a specific find-and-replace in the bot request. It might miss some pages, but it won't have false positives, and should get a lot of the problematic pages, making it easier to do manual review of the remaining ones. —CalendulaAsteraceae (talkcontribs) 20:59, 24 July 2024 (UTC)[reply]

VRT process

[edit]

I was reading through VRT process and i am confused."Before taking permission we have to upload the media"–isn't it? and if author deny or does not reply then what should to be done?
–– KEmel49talk,Uploads 18:48, 24 July 2024 (UTC)[reply]

I usually ask if the author would grant permission before, then upload the media, and ask the author to send the permission to the respective WMC email address --PantheraLeo1359531 😺 (talk) 08:30, 25 July 2024 (UTC)[reply]

User category

[edit]

Kind regards. I recently created an own user category (Category:Files by User:NoonIcarus). Is the category be populated automatically? Or can the process be automated? Many thanks in advance NoonIcarus (talk) 23:54, 26 July 2024 (UTC)[reply]

COM:Cameroon

[edit]

Does anyone know why the level-2 section headings "==Not protected==" and "==Public domain and folklore: not free==" aren't being properly displayed in COM:Cameroon#General rules? -- Marchjuly (talk) 09:28, 27 July 2024 (UTC)[reply]

I had the same problem: [8] fixed it. Not sure what it actually is though. Enhancing999 (talk) 10:55, 27 July 2024 (UTC)[reply]

The XML in the uploaded file could not be parsed

[edit]

Hello! I wanted to created some map. I got free baselayer in PNG, opened Inkscape and made import of PNG file in software. After that I've added several lines and symbols and saved the result in SVG. If I try to upload the result to Commons, I see "The XML in the uploaded file could not be parsed". One hypothesis is that problem is in embedded PNG-layer, but, as I remember, there are SVG-files in Commons, which contain raster layers. Size of file is 12 Mb. Microsoft Edge opens file normally. What does cause the uploading error? It is possible to download the file for its checking. Perhaps, there is some web service, which cand repair structure of document, if it is broken? But, indeed, I'm not sure, that there file is broken: it is simple (raster layer, a few lines and symbols) and is not huge. Dinamik (talk) 09:44, 27 July 2024 (UTC)[reply]

We do not allow uploads of svgs with images inside of them. Its is often misused and it creates potential security problems because our filescanners do not work on those embedded images. —TheDJ (talkcontribs) 08:07, 28 July 2024 (UTC)[reply]
Did such limitation exist in Commons always? I believe, that, for example, first versions of this file have embedded baselayer. Dinamik (talk) 09:56, 28 July 2024 (UTC)[reply]
Probably not, see Category:Fake SVG. Enhancing999 (talk) 10:16, 28 July 2024 (UTC)[reply]
Commons has always allowed files to have embedded bitmaps, but those bitmaps must use the data: scheme. Files with external URLs are now blocked from uploading. Furthermore, the Commons rasterizer will not fetch external URLs, so such a base layer would no longer display. All the versions of the St. Petersburg map display, so there would not be an external URL. Glrx (talk) 22:57, 28 July 2024 (UTC)[reply]
The file is over 10 MB. At one point, SVG uploads were limited to 10 MB, but I do not believe the is still the case.
The file is mostly an embedded PNG. Following that, there are some path and flowRoot elements. The path elements should be OK, but the flowRoot is not supported. It was described in an SVG 1.2 draft, but that draft was not accepted. The element does not exist in the SVG 2.0 spec.
WMF supports SVG 1.1. Even if you could upload the file, it would not display as you would expect.
I do not see a reason for the XML error. W3's validator finds 67 errors, but they only involve normal Inkscape, sodipodi, and RDF extensions or the bogus flowRoot elements.
Glrx (talk) 23:15, 28 July 2024 (UTC)[reply]
Running rsvg-convert (latest version, 2.58) on that SVG gives an error without the --unlimited option, which is described as "The XML parser has some guards designed to mitigate large CPU or memory consumption in the face of malicious documents. It may also refuse to resolve data: URIs used to embed image data in SVG documents." Dexxor (talk) 07:17, 29 July 2024 (UTC)[reply]

Skip people in search results

[edit]

Any idea how to filter search results for photos that are not of persons? Is it currently possible or what would need to be added to make it possible? Enhancing999 (talk) 11:30, 27 July 2024 (UTC)[reply]

Yes: append -deepcategory:"People" or a similar category more specific to your search like "People climbing" (concrete example). It doesn't work with the two examples and with any other categories that don't just have a few subcats. The way to change that is phab code issue: phab:T369808 Prototyperspective (talk) 11:52, 27 July 2024 (UTC)[reply]

Thanks, but that assumes that the images already have a people category. Also, I doubt deepcat will ever be changed to include all subcategories of Category:People.

This is similar for other files from that Flickrstream. Enhancing999 (talk) 12:30, 27 July 2024 (UTC)[reply]

Yes, it doesn't work another way and they should be in that cat. Are you asking about machine vision filters? That would be more than difficult to add. It is not about that particular cat but how well that search operator works and it doesn't scan the whole cat tree anew, it uses some cached data or could do so if it currently does scan things anew for every search request. This is what the categories are for and the user should not be required to do categorization first, that's another issue. I don't know what the point of your question is, how do you think this could be possible if not as described or similar (such as excluding terms commonly in the file descriptions of images of people)? Prototyperspective (talk) 13:35, 27 July 2024 (UTC)[reply]
The point is to find images of buildings and cityscape included in these searches/Flickrstreams (and skip all politicians, I'm not interested in).
There was some AI done on images that added suggestions to every image. One could just skip all those images where the suggestion is people/faces or similar. Enhancing999 (talk) 13:42, 27 July 2024 (UTC)[reply]
Basically it is that images with suggestions for containing people (I thought people not faces), need to be located in the People category. For example with a subcategory "Images likely depicting people to check". Prototyperspective (talk) 14:04, 27 July 2024 (UTC)[reply]
BTW, your phab ticket seems to be repurposed to add the missing error message on MediaSearch, not to make deepcategory:"People" possible. Enhancing999 (talk) 14:10, 27 July 2024 (UTC)[reply]
No, that's a misunderstanding then: it's not about showing the error message also in MediaSearch but getting it deepcat to work reliably always (except for newly-created categories). Also to add to my prior point a subcat like "People by activity" may be more appropriate and the "Images likely depicting people to check" doesn't need to be in the People cat, one could just add a second deepcat search operator phrase. Other than that I don't think there's a feasible way given that not even any other image search engines have such features and WMC is unlikely to be able to be the first to offer machine vision supported image search. Prototyperspective (talk) 16:44, 27 July 2024 (UTC)[reply]
My oldish phone can do some of it, so Commons should be able to offer it as well. Search engines for the general public tend to have some other constraints: like always provide the same results and not output anything problematic.
Commons was almost there a while back .. so it shouldn't be too complicated to make it work.
BTW let's be optimistic about your ticket, but in any case, I don't think it wouldn't solve my usecase. Enhancing999 (talk) 22:56, 27 July 2024 (UTC)[reply]
Ok good point, still that is not a public Web or Website search engine. I don't think it would be very useful but maybe I'm wrong or it would be simple to add. I guess you could check if there is a readily available open source package for this that could be used and check if there is a related phab ticket and if not propose it somewhere. Also keep in mind that WMC has far more files than your phone (however maybe that only means the initial scan takes a bit longer). I think generally it suffices to just change the search terms so either some things are excluded or it's more specific to what you're searching for such as searching "animals climbing" instead of just "climbing" or going to the subcategories about e.g. "buildings". Prototyperspective (talk) 10:45, 28 July 2024 (UTC)[reply]
Why wouldn't it be useful to be able to search images by what is actually visible?
Your other suggestions implies that the file description text includes that information or was already categorized, but File:Comemoração da Independência do Brasil (48700486098).jpg is somewhat representative in that not being the case. The entire point of the search is to find images and add more detailed categories. Enhancing999 (talk) 10:31, 29 July 2024 (UTC)[reply]
I didn't say that, I wrote "very useful" that means that it's about the magnitude/degree.
The other things were just alternatives that don't necessarily always work or work for all files depending on what you intend to do which you didn't specify and can vary.
Adding/integrating machine vision would be useful; see this. Prototyperspective (talk) 11:00, 29 July 2024 (UTC)[reply]

Notice of licencing template redirection

[edit]

Hello, per Template talk:GPLv3, GPLv3 will soon be redirected to GPLv3 only. It currently has no transclusions and has been deprecated for two weeks. Considering the potential legal implications, I want to proceed with caution. Are there any tools that still have the GPLv3 template hardcoded inside? —Matrix(!) {user - talk? - uselesscontributions} 06:55, 28 July 2024 (UTC)[reply]

Orphaned talk pages after format conversion

[edit]

Consider the following pages:

File:Blue dot 7px.gif (Deleted and redirected after file format conversion) File talk:Blue dot 7px.gif (Orphaned talk page)
File:Blue dot 7px.png (Live page) File talk:Blue dot 7px.png (Non-existent until just now)

I suspect many more orphaned talk pages exist like this one. Until just now, there was no way for an editor looking at File:Blue dot 7px.png to see that there were relevant discussions at File talk:Blue dot 7px.gif. I fixed the problem by moving File talk:Blue dot 7px.gif to File talk:Blue dot 7px.png. I propose that all such pages leftover from file format conversions similarly be moved to match the name of the page in the new format. This seems like a bot task, something like this:

For all pages in the File talk namespace:

  1. Skip if the page is a redirect
  2. Skip if {{SUBJECTPAGENAME}} is not a redirect.
  3. Skip if the {{PAGENAME}} don't end in ".gif"
  4. Skip if {{SUBJECTPAGENAME}} doesn't redirect to SUBJECTPAGENAME.sub(/.gif$/, ".png")
  5. Log and skip if a page already exists named PAGENAME.sub(/.gif$/, ".png")
  6. Move to PAGENAME.sub(/.gif$/, ".png")

An analogous procedure could be followed for other file format conversions.

Questions:

  1. Is there consensus for these moves? Should a talk page persist through file format conversions and associated renaming? To me, the answer is clearly yes, as I regard these as essentially versions of the same file even if the two files co-existed on Commons at one point. Any discussion on the old file format is highly likely to be relevant to the converted file.
  2. Would someone volunteer write a script to perform this task?

Daask (talk) 16:08, 29 July 2024 (UTC)[reply]

Hi, We should have redirects from one file extension to another. This is a source for problems. Yann (talk) 20:47, 29 July 2024 (UTC)[reply]

Tech News: 2024-31

[edit]

MediaWiki message delivery 23:07, 29 July 2024 (UTC)[reply]

[edit]

It looks like something went wrong in Nasrumikailkabira. This is technically a gallery page, but should be the talk page of User:Nasrumikailkabira. Can that please be fixed? JopkeB (talk) 05:31, 31 July 2024 (UTC)[reply]

✓ Done fixed. Also move protected the user talk page to autopatrollers to prevent this from happening again and gave warning. —Matrix(!) {user - talk? - uselesscontributions} 05:36, 31 July 2024 (UTC)[reply]