Wikipedia:Link rot/URL change requests

Source: Wikipedia, the free encyclopedia.

This page is for requesting modifications to URLs, such as marking dead or changing to a new domain. Some bots are designed to fix link rot; they can be notified here. These include

WaybackMedic
. This page can be monitored by bot operators from other language wikis since URL changes are universally applicable.

Old nextbestpicture.com links

Hello, please change all links (in the main namespace) of the form http://www.nextbestpicture.com/2/post/2020/12/the-2020-indiana-film-journalists-association-ifja-winners.html to https://nextbestpicture.com/the-2020-indiana-film-journalists-association-ifja-winners/ (i.e. everything between the first slash after the domain name and the last one in the link should be removed, the ".html" should be replaced with a slash, and HTTP should be changed to HTTPS). Lots of these links seem to be marked as dead by InternetArchiveBot, including at Clarke Peters (where I noticed this and fixed it manually) and On the Rocks (film). Thanks! Graham87 (talk) 07:06, 12 December 2023 (UTC)[reply]

No problem I'll get to it, thanks. Anything marked dead will be restored to live, if it tests live. I'll keep the old archive URL in place, unless you want to delete it, or, replace it with an archive to the new URL. -- GreenC 04:19, 13 December 2023 (UTC)[reply]

Graham87: here you go Special:Diff/1186100009/1190645424. Good find. It edited over 500 pages, fixed many cites. It was difficult they use a bot blocker that's why Wayback Machine and IABot had trouble. I had a solution for it and was able to verify the new links work, in a few cases it required an archive URL. -- GreenC 02:43, 19 December 2023 (UTC)[reply]

The formatting of exoplanet.eu catalog entries has changed recently, so that all entries now have a numeric ID (e.g. 1261 for Kepler-62f). The previous format (which had the planet name alone) still soft-redirects to the correct target, but older links using a previous format need to be corrected by hand. –LaundryPizza03 (d) 01:29, 15 December 2023 (UTC)[reply]

User:LaundryPizza03: Is there an example of an old link, and its corresponding new link? -- GreenC 04:08, 15 December 2023 (UTC)[reply]

@GreenC: In this example, the former URL was https://exoplanet.eu/catalog/kepler-62_f/, and is now https://exoplanet.eu/catalog/kepler_62_f--1261/. –LaundryPizza03 (d) 04:10, 15 December 2023 (UTC)[reply]
I'd suggest consulting Linksearch for example pages, and examples of the older format that is now a hard 404. 55 Cancri b is an example; the URL http://exoplanet.eu/planet.php?p1=55+Cnc&p2=b is linked; the old URL format had https://exoplanet.eu/catalog/55_cnc_b/, and the current DB page for this planet is at https://exoplanet.eu/catalog/55_cnc_b--25/. Note that host stars are no longer directly accessible in the database; information about them can be accessed through the entries about their planets.
exoplanet.eu: Linksearch en (insource) - meta - de - fr - simple - wikt:en - wikt:frMER-C X-wikigs • Reports: Links on en - COIBot - COIBot-Local • Discussions: tracked - advanced - RSN • COIBot-Link, Local, & XWiki Reports - Wikipedia: en - fr - de • Google: searchmeta • Domain: domaintoolsAboutUs.comLaundryPizza03 (d) 04:18, 15 December 2023 (UTC)[reply]

I see "kepler-62" (dash) is now "kepler_62" (underscore). It might be be possible to convert ?p1=55+Cnc&p2=b to 55_cnc_b and then loading that page https://exoplanet.eu/catalog/55_cnc_b/ and extracting the new URL from the HTML. As you suggest, I'll take a look at the linksearch and see how homogeneous. I'll get to this not immediately. -- GreenC 04:35, 15 December 2023 (UTC)[reply]

User:LaundryPizza03: Seeing a lot of links like this. I added an archive URL because the source link is dead. I'd prefer to convert them to the new /catalog url scheme, but there is no way to link to a star, only planets, like this. Am I missing something? What do you recommend for URLs with star.php?st= -- GreenC 19:02, 21 December 2023 (UTC)[reply]

The only thing I can figure out, on the Catalog page https://exoplanet.eu/catalog enter star_name="HD 5319" then click "Apply filter" it brings up a list of planets. However, there is no way to link to this search result. Only a person manually entering the star name can find it, there is no API or mechanism for automated use. -- GreenC 19:21, 21 December 2023 (UTC)[reply]
@GreenC: I'd suggest deleting all of those links. You can still convert the older-format planet links as you described. –LaundryPizza03 (d) 05:25, 22 December 2023 (UTC)[reply]
For example the many exoplanet.eu star links in List of exoplanets discovered by the Kepler space telescope: 1–500 which look useful to verify data. Someone might object, why the cites are being deleted, since the archive URLs work and verify. -- GreenC 06:46, 22 December 2023 (UTC)[reply]
Try obtaining archives for the links that aren't already archived. –LaundryPizza03 (d) 07:21, 22 December 2023 (UTC)[reply]
Yes the bot will add archives for dead links: Special:Diff/1143718768/1191219614. I am going slow because there are errors in the data showing up in the logs that require manual fixes. For example this planet Special:Diff/1168566545/1191217938 has been renamed, but the article name still had the old name. Similar example Special:Diff/1188022306/1191211568. Or syntax errors, Special:Diff/1188040396/1191199379. -- GreenC 17:07, 22 December 2023 (UTC)[reply]
User:LaundryPizza03 - this iteration is done. It edited 694 pages, out of 705 checked. It converted the star system links to archive URLs. The planet links are mostly converted. I noticed late in the process it wasn't converting planet links that already had an archive URL and were otherwise dead links.. they need manual checking. Probably something changed with the planet, like it's name or existence. It should be possible to find most of them in the catalog with some time and searching.
Also, I was unaware of {{Cite EPE}}. Over time, individual pages at the site will stop working, and the standard link rot tools won't detect or fix them, when the links are abstracted behind a custom external link template. I suppose it's possible the template could be useful if the entire site changes structure, but most likely the data in the template won't be sufficient to accommodate the new URL scheme. Thus at best the template makes adding a link a little quicker, and more uniform looking, but at the cost of increased link rot and challenges down the road when the URL scheme changes. I've always thought standard cite templates are the best way to go because there are so many tools that support them. -- GreenC 02:51, 23 December 2023 (UTC)[reply]

International Meteorological Organization

Hello. I notice that after clicking on this IMO link, it says the website moved to a new url and the old one will be available until this month. Looking through the IMO links on Wikipedia, some formats can be swapped over already:

There are other ones that aren't in these three categories and that I don't see in the new website. Here are some examples. I was wondering if the old public.wmo.int links could be changed to the new wmo.int links where possible, and the broken public.wmo.int with no new URL could be archived. There's 436 links to go through. Thanks! MrLinkinPark333 (talk) 00:29, 17 December 2023 (UTC)[reply]

Fortunately you found this in time. I'll prioritize it. If the public-old site goes offline it will be a lot harder to migrate. -- GreenC 01:34, 17 December 2023 (UTC)[reply]

MrLinkinPark333: Here is what I did: migrate links where possible, as you discovered above like with Press releases, simply by changing the URL. This method only worked for some, the new site doesn't have all the pages from the old site. Thus, anything it couldn't find at the new site, it converted to public-old.wmo.int to bypass the information page that says the link is doomed. Then it saved a copy of the public-old.wmo.int link to the Wayback Machine. Then it added those Wayback links into the citation as archive URLs with url-status of dead (soon dead). I think this method saved the most content from imminent destruction. At some point later, once the new site is working, I can make more changes if you see ways to convert the public-old.wmo.int links to the new site at wmo.int. There are 195 public-old links in 160 articles. -- GreenC 19:14, 18 December 2023 (UTC)[reply]

That works. I can always revisit the links later to see if any can be swapped over. Thanks! MrLinkinPark333 (talk) 19:19, 18 December 2023 (UTC)[reply]

Phineas F. Bresee

Further reading Corbett, C.T. (1958) Our Pioneer Nazarenes. Kansas City, MO.: Nazarene Publishing House. [2][permanent dead link]

This can be corrected by linking to one of the following: https://whdl.org/en/browse/resources/6629 https://nmi.whdl.org/en/browse/resources/6629 https://apnts.whdl.org/en/browse/resources/6629

Thanks! 174.127.124.132 (talk) 07:22, 17 December 2023 (UTC)[reply]

 Done! In the future, the best place to suggest an improvement for a single article (e.g. Phineas F. Bresee) is the article's talk page (e.g. Talk:Phineas F. Bresee). This page is to request an improvement for hundreds or thousands of articles with the same issue. Thanks! GoingBatty (talk) 01:27, 18 December 2023 (UTC)[reply]

Sub-site of a blacklisted website has changed URL

The sub-site "inventors.█████.com" ("about" censored because of wiki filter) now appears to be "thoughtco.com", with references/external links either linking to the same article on the new site, or simply don't work. Apparently there are 150+ articles using the inventors URL (1), & what looks like 500+ external link search results (2), although a significant portion are on talk pages. Silverleaf81 (talk) 09:28, 17 December 2023 (UTC)[reply]

User:Silverleaf81, the site is tricky. They've been excluded from the Wayback Machine link. There are some at Archive.today. However compare that link with the new one at thoughtco, notice the content drift, they've made changes to the content at Thoughtco. So the conservative course is convert them to archive URLs so the original citation verifies. The problem is there may not be complete coverage at archive.today, and the replacement link at thoughtco may not verify the cited fact.
What I can try, convert to archive.today, where possible. When not, leave it alone. Wherever it redirects, that is where it goes, and it will be up to someone to manually figure out if the new page verifies or not. Possibly some year in the future, the Wayback exclusion will be lifted, and those archives become available again. -- GreenC 04:09, 23 December 2023 (UTC)[reply]

User:Silverleaf81: This is done. It got most of them. It added 341 archive.today URLs. A list of about 50 questionables is at Wikipedia:Link_rot/cases/inventors.about.com but not all of them are legitimately a problem. -- GreenC 02:24, 26 December 2023 (UTC)[reply]

runeberg.org finally on https

My website runeberg.org just recently moved from http: to https: so it would be nice if someone could update the remaining 11,000 links accordingly. This is not urgent, as everything works fine with automatic redirects, but it would be nice. Thank you. -- LA2 (talk) 22:57, 17 December 2023 (UTC)[reply]

User:LA2: OK no problem. I got a lot of requests here at the same time other things came up elsewhere. I will get to this with some time, it is the right place/tool to request for this kind of work. I'll ping you when completed. -- GreenC 17:53, 18 December 2023 (UTC)[reply]
User:LA2: runeberg.org (http or https) existed in 6,769 articles. It checked each link has a status 200, after converting to https. Any that didn't it added a {{dead link}} tag. The rest are converted to https. There was some typos and non-working links to Google Translate I manually fixed. List of http runeberg.org links -- GreenC 20:31, 26 December 2023 (UTC)[reply]
Great! Thanks! --LA2 (talk) 22:20, 27 December 2023 (UTC)[reply]

www.nwt.org is for sale and references to it need attention

It seems that the Episcopal Diocese of North West Texas used the URL www.nwt.org for information about the candidates. That site is now for sale. References to that site, such as at https://en.wikipedia.org/wiki/Scott_Mayer_(bishop) should be corrected/removed. Fr Kevin PJ Coffey, SCP (talk) 16:45, 18 December 2023 (UTC)[reply]

As a three-letter domain, it will probably sell. I added it to the list of domain to be usurped. Special:Diff/1186090244/1190575904 -- GreenC 17:49, 18 December 2023 (UTC)[reply]

Yahoo! Groups

I found many broken links to Yahoo! Groups. Can we find archived copies of these pages? Jarble (talk) 18:19, 18 December 2023 (UTC)[reply]

Looked at a small number through archive.org and seem to have login requirements so may suck a lot of time for little gain. Neils51 (talk) 08:36, 22 December 2023 (UTC)[reply]
Yes some of the hardest objects: soft-404 within soft-404. Like a URL that redirects to a home page (www.yahoo.com) is soft-404 #1. This forces retrieving an archive URL but this also is a soft-404, because it contains a login screen. The solution is to find a different archive provider that has/had the ability to login when making the capture (archive.today) and to build extra soft-404 detection at the second layer specific to the site. This is what I am doing now with good success, but it's taking a while to do discovery what a soft-404 looks like since Yahoo has varieties. -- GreenC 06:19, 27 December 2023 (UTC)[reply]

Jarble: The bot added 1,474 new archive URLs. I limited it to only adding archive.today because it has the best coverage for this site, Wayback had trouble making good saves due to logins and cookies. There were 115 it couldn't find and added a {{dead link}}. Also added the archives to IABot's database so these updates will propagate to over 300 other wikis. -- GreenC 04:48, 28 December 2023 (UTC)[reply]

ATSDR migrations

Many links from http://www.atsdr.cdc.gov have been migrated to https://atsdr.cdc.gov or https://wwwn.cdc.gov, which has broken a lot of links. Some automated attempts to archive the pages have resulted in archives of 404 errors at this page. I noticed this on Health effects of radon, and unfortunately the IDs on a lot of these pages ("ToxFAQs") have no relation to the new, identical pages on the HTTPS websites. Additionally, some articles like Peninsula Extension refer to Public Health Assessments, which need to be found in an archived page since the files have been deleted and are only available by email request. Reconrabbit (talk|edits) 18:38, 19 December 2023 (UTC)[reply]

Quick note: it looks like many of the .pdf links are still intact, but .htm / .html links need to be archived. Not priority since this has been the case for at least 5 years Reconrabbit (talk|edits) 22:15, 20 December 2023 (UTC)[reply]

User:Reconrabbit: I can see why this has gone unaddressed for so long it's complicated. I can't promise everything is perfect but most everything that is dead now has an archive URL. They use JavaScript redirects which gave bots trouble, thus the bad archive URLs. I checked the existing archive URLs for soft-404s, this is imperfect, but it did find and replace a few: Special:Diff/1190591816/1192546009 I fixed a few of the ToxFAQ links by manually looking them up: Special:Diff/1189670705/1192547048 But most were simply archived: Special:Diff/1121144402/1192546200 If you want to create a map of old -> new the bot can use that to make changes on-wiki.

The http links existed in about 350 articles. The bot edited 211 pages. I think the difference is the links were already archived, or working such as the PDFs. It added 141 new archive URLs. And it made 127 redirect moves: Special:Diff/1154065478/1192545155 Hope that helps. -- GreenC 00:05, 30 December 2023 (UTC)[reply]

Thank you. It looks like a good number of the redirect moves don't go directly to the toxin in question, but that's fine, since it directs someone right to the ToxFAQs homepage with an alphabetical directory; shouldn't be too much of an ask for a reader to find the appropriate page from there. Reconrabbit 01:17, 30 December 2023 (UTC)[reply]
Yes those cases are not so bad. It's the ones that have tfacts##.html that would benefit from a mapping of old to new like Special:Diff/1121144402/1192546200 is not so great, but this is good Special:Diff/1189670705/1192547048 where I manually found the new link and programmed it into the bot. It was just too time consuming. If you want to map the tfact's I'll add them to the bot. A list of 31 old URLs, the index page for the new URLs. Can look it up based on the context of the cite eg. the first one in the article Benzene would look up "Benzene" at the index page and that is the new URL. -- GreenC 02:01, 30 December 2023 (UTC)[reply]
I tried out a method on a couple of the links, and found that it seems to work for pretty much every one: Replacing /tfactsXX.html with /toxfaqs/tfactsXX.pdf provides a contemporary PDF file for the item in question in every instance I tried. Ex: the archived link for Benzene, the live PDF Benzene ToxFAQ. Reconrabbit 02:22, 30 December 2023 (UTC)[reply]
Excellent discovery. Converting: Special:Diff/1192545947/1192569315 -- GreenC 02:43, 30 December 2023 (UTC)[reply]

Gospel Music Hall of Fame

Hello. The old url for the Gospel Music Hall of Fame looks to be usurped. The new URL was working at least until September 2023. Not sure which solution is better: 1) convert the old link to new links and use archive URLs 2) use archived URLs for both old and new links. Luckily, with the two URLS there's less than 100 links to work through. Thanks! MrLinkinPark333 (talk) 19:57, 19 December 2023 (UTC)[reply]

Update: the new url is working today. Taking a look at the URLs, some of them are easier to change over than others:
  • /site/ with name: this to that
  • /speaker-lineup/ with name can be converted like /site/:
  • Any with ID numbers would need manual converting. I.e. this to that
  • Any with years could be manually converted to individual bios. E.g. this to that. However, 2000 is used in 2 articles.
  • There's other exceptions where either the new URL is blank or the URL is in a slightly different order. I think an archived copy of Bartlett's old URL would be more useful as his article at Eugene Monroe Bartlett is referencing more than his year of induction.

Would this work or is there a more simpler solution? Thanks! --MrLinkinPark333 (talk) 02:50, 29 December 2023 (UTC)[reply]

User:MrLinkinPark333 - gmahalloffame.org is in 30 mainspace articles. I can convert where possible using the two rules you found, and for the manual ones, I'll change to archive URLs. If you want to manually repair them, I'll provide the list of the articles/URLs which were converted to archive URLs. It will also check for the string "Biography coming soon" and treat those pages as dead. And I'll check what else might come up in the logs like soft404 redirects to the home page. -- GreenC 17:12, 31 December 2023 (UTC)[reply]
Since it's a small list, I could fix whatever didn't get converted over. Thanks! MrLinkinPark333 (talk) 18:38, 31 December 2023 (UTC)[reply]
The bot only edited 15 pages..You can check two places: Special:Contributions/GreenC_bot (ending at Dolly Parton). And Search of gmahalloffame.org. Of the edits most were adding archive URLs. The pages it didn't edit, most already had archive URLs, and with no available replacement page there was nothing it could do. -- GreenC 19:58, 31 December 2023 (UTC)[reply]
Thank you for the quick reply! MrLinkinPark333 (talk) 21:11, 31 December 2023 (UTC)[reply]

Ilta-Sanomat

Around 346 articles (full list including the ones that already use archived URLs) have URLs to the Finnish newspaper Ilta-Sanomat's website http://www.iltasanomat.fi/ that now redirects to the main page https://www.is.fi/

It seems that URLs that have an ID starting with the numbers 200000 can be fixed by simply changing "iltasanomat" to "is", e.g.:

(I also changed to HTTPS in those examples)

But the URLs with IDs starting with the number 1 or URLs with completely different patterns can't be fixed by changing "iltasanomat" to "is", e.g.: ("Sivua ei löydy" is Finnish for "Page not found")

-

-

-

So, what would be the optimal way of fixing these?

  • A) Setting an archive link to all of them?
  • B) Changing the ones starting with "200000" from "iltasanomat" to "is" and setting an archive link to others?
  • C) ...?

Also, there are thousands of articles with the same issue on fi.wikipedia, so helping that project too would be much appreciated. 85.76.13.79 (talk) 15:12, 20 December 2023 (UTC)[reply]

I checked around for redirect information such as in the Wayback Machine or in headers and can't find anything, so there is no map how to move the non-20000 links. The 20000 links can be moved. Thus, solution "B" for enwiki. For fiwiki, unfortunately my bot is not configured to work with Finnish citation templates. However I can change the entire domain to "permadead" in the IABot settings, this will inform IABot to convert every iltasanomat.fi link on 300+ wikis to an archive URL. -- GreenC 20:54, 1 January 2024 (UTC)[reply]
Ok, plan B for en-wiki and changing iltasanomat.fi to permadead for other wikis sounds good. Thank you in advance. (Original poster). 2001:14BA:9C98:7100:C993:D281:D619:D802 (talk) 15:48, 2 January 2024 (UTC)[reply]
Results: 487 pages contain the domain. Checked each and made changes in 378 pages (some already had archive URLs). Converted 163 URLs of the -20000 type, added 320 new archive URLs, added 12 {{dead link}}, changed 12 |url-status=live to dead. Uploaded results (archive URLs) to IABot, and changed the domain to "permadead" so it will propagate on other wikis. IABot has recorded over 6,000 unique URLs. -- GreenC 20:20, 2 January 2024 (UTC)[reply]

A Note About this Forum

This forum is getting a lot of requests recently. The requests can take a lot of work, 1-7 days each depending on the complexity: custom programming, data discovery, running tests cases, qualifying results, designing algorithms, waiting for the bot to run (slow due to networking), etc... Furthermore, my time to do this work is limited! If you make a request, and time goes by, that is why. I wish there was a way to boilerplate it, and I have generalized the code as much as possible, but ultimately this work is bespoke and artistic in nature due to the endless variety of conditions at remote sites. I try to respond to requests in chronological order, except when a site needs be triaged due to imminent outage, has an extremely large footprint, or can be addressed quickly, in those cases I might respond before some others. -- GreenC 20:10, 20 December 2023 (UTC)[reply]

No worries! Take all the time you need :) MrLinkinPark333 (talk) 00:54, 22 December 2023 (UTC)[reply]
I know that recruitment is a difficult task but I really wish areas of technical maintenance like this weren't so often left to 1-3 editors. Thank you for your work, and don't rush things too much. Mach61 (talk) 22:18, 23 December 2023 (UTC)[reply]

www.smallsrecords.com

The

Draft:Chris Byars once I get off my school laptop (which blocks IA). Cheers, Mach61 (talk) 22:14, 23 December 2023 (UTC)[reply
]

NVM only a few pages link to it Mach61 (talk) 22:20, 23 December 2023 (UTC)[reply]

IPA Fonts

According to this archived link, the IPA fonts were transferred from IPA to the Character Information Technology Promotion Council, who now host the fonts on their website. Citation 14 should link to https://moji.or.jp/mojikiban/font/ and Citations 13 and 22 (which is a dead link) should be https://moji.or.jp/ipafont/.

(Apologies if this is the wrong place for this. I'm new to editing and I didn't want to mess up the citation.) Ichneumonidae (talk) 18:25, 26 December 2023 (UTC)[reply]

Sorry, I should have said this is about the article List of CJK fonts! Ichneumonidae (talk) 18:26, 26 December 2023 (UTC)[reply]
 Done: This page is for requesting changes that might affect hundreds or thousands of pages - you can check if the URL that changed is on a lot of pages by using Special:LinkSearch. If it's only affecting one article (I just checked, and it looks like these specific dead links are only present on List of CJK fonts and Mona (font)), the best place to suggest improvements is on the Talk page for that article. Thanks. Reconrabbit (talk|edits) 19:16, 26 December 2023 (UTC)[reply]

Space Launch Report

The website www.spacelaunchreport.com was cited extensively in many spaceflight articles and now has been usurped by an adware site of some sort. Could all of these links please be archived? Example link http://www.spacelaunchreport.com/falcon9ft.html#f9stglog from List of Falcon 9 first-stage boosters. Ergzay (talk) 10:02, 27 December 2023 (UTC)[reply]

As a further note, to ensure this isn't a waste of anyone's time. When searching for how many pages use the link I hit the error "A warning has occurred while searching: The regex search timed out, so only partial results are available. Try simplifying your regular expression to get complete results." so this should be a very good candidate for mass replacement. Ergzay (talk) 01:57, 28 December 2023 (UTC)[reply]
WP:JUDI. I process the domains in batches. It is added to the queue: Special:Diff/1190914504/1192203117 .. when regex searching the recommend method: insource:spacelaunchreport insource:/spacelaunchreport.com/ .. the first insource does a broad non-regex search, the second insource does a regex within the results of the first search only. Since regex is so expensive it narrows the search before doing regex. -- GreenC 05:01, 28 December 2023 (UTC)[reply
]

bird-stamps.org

Domain bird-stamps.org hsa been usurped and redirect to the home page. Link search shows about 275 articles with such links, a relative handful of these have been updated with archive links. Fabrickator (talk) 08:51, 31 December 2023 (UTC)[reply]

A
WP:JUDI gambling site. Added to queue: Special:Diff/1193111754/1193243552 -- GreenC 20:26, 2 January 2024 (UTC)[reply
]

Memória Globo

Most Memória Globo links are dead (like https://memoriaglobo.globo.com/programas/entretenimento/novelas/zaza.htm), there are more on Portuguese Wikipedia. Notrealname1234 (talk) 18:06, 31 December 2023 (UTC)[reply]

User:Notrealname1234: There are some working URLs, eg. [1]. I'll check each, can't set all dead. Portuguese Wikipedia has it's own archive bots and archive provider, it's one of a few sites where IABot is unable to run, and my bot can't run anywhere but Enwiki. -- GreenC 20:34, 2 January 2024 (UTC)[reply]
It's done. Edited 144 pages, added 243 archive URLs, 7 {{dead link}}, moved 114 URLs to a new URL (redirects), updated IABot. -- GreenC 01:33, 3 January 2024 (UTC)[reply]

www.amjbot.org

We have hundreds of links to URLs like http://www.amjbot.org/content/96/3/668.full, which just serve an HTTP 404 in response. They can simply be removed, if they're in the URL parameter of a citation template with a DOI (which leads to the real current location of the current publisher's version). Nemo 16:36, 1 January 2024 (UTC)[reply]

User:Nemo_bis:
-- GreenC 17:48, 3 January 2024 (UTC)[reply]
Nice! Thanks, Nemo 16:27, 4 January 2024 (UTC)[reply]

ir.uiowa.edu

This repository was retired and its contents went in various directions, including pubs.lib.uiowa.edu and scholarworks.wmich.edu. The domain currently serves TLS errors, while at some point it seemed to redirect all requests to an unrelated frontpage. URLs can be replaced where an OA copy is available, but as a first step it's ok to just remove all links in cite journal templates where a DOI is present. Nemo 13:34, 6 January 2024 (UTC)[reply]

User:Nemo_bis is "OA" -> "IA"? Otherwise I don't know what OA means. If it is IA, the example diff [2] shows the migration of ir.uiowa.edu -> pubs.lib.uiowa.edu .. are you suggesting using IA snapshots to find the redirect? Unfortunately it doesn't look like IA saved the correct redirect information. [3] Is there some place else to obtain the new URL? -- GreenC 17:32, 6 January 2024 (UTC)[reply]
No, OA as in
Citation bot will add the OA links later if the broken links are removed. I was only asking about the removal, sorry. Nemo 15:51, 7 January 2024 (UTC)[reply
]

User:Nemo_bis, there are 418 pages with the domain. For all cite journal with a doi: A) In 132 citations removed the URL Special:Diff/1137009702/1194866745. B) In another 84 there was a working redirect migrated Special:Diff/1184196199/1194866750. For everything else not a cite journal with a doi: C) Added 198 archive URLs Special:Diff/1186059609/1194876255. Migrated 54 redirects same as B). And D) added 8 {{dead link}} Special:Diff/1173723334/1194876457. -- GreenC 05:39, 11 January 2024 (UTC)[reply]

Outstanding! I thought figuring out the redirects would be too much work (some go to a Primo frontpage). Nemo 21:21, 12 January 2024 (UTC)[reply]
I can usually catch those that redirect to the same place, by the nature of the same destination URL showing up multiple times in the logs, during a trial-run. I add a trap for them in the code to treat those redirects as dead links, and rerun it again. Almost every domain has this problem, to some degree. It's hard to fully automate but I have as much as possible. -- GreenC 22:27, 12 January 2024 (UTC)[reply]
Cool! Makes sense. Nemo 07:25, 14 January 2024 (UTC)[reply]
Is there any way to get a list of where these changes were made? I have been correcting all the links as I have time. None of them should be dead and all have live content somewhere, most should be using a DOI (which I have been adding) 1920wr (talk) 16:38, 17 January 2024 (UTC)[reply]
1920wr, Yes. I could provide a list of the article names for set C), but it would miss pre-existing archive URLs. It's probably better to find them with this search: 196 articles. For set D) that's hard to search for, rather, here are the 8 the bot added a {{dead link}}: Victor L. Littig,Jonathan Blum (writer, born 1967),John Herriott,R. Douglas Hurt,Second plague pandemic,Mayors of Sioux City, Iowa,List of school districts in Iowa,Christopher B. Krebs .. good luck with this project it would be great to see them converted to cite journal with DOI, a major improvement for this domain. If you think there is something I can help with bot let me know. -- GreenC 21:15, 17 January 2024 (UTC)[reply]

ebooks.adelaide.edu.au (404)

460 pages. "eBooks@Adelaide has now officially closed", January 7, 2020. There is no copy or replacement site. Prior to 2014 it was http://etext.library.adelaide.edu.au (same paths).

  1. If path contains ".html" then convert to an archive URL
  2. If path contains 4 elements and ends in "/" eg. http://ebooks.adelaide.edu.au/k/kant/immanuel/k16p/ then add "complete.html" and convert to archive URL ie. http://ebooks.adelaide.edu.au/k/kant/immanuel/k16p/complete.html -> https://web.archive.org/web/20110309070433/http://ebooks.adelaide.edu.au/k/kant/immanuel/k16p/complete.html
  3. If path contains 3 elements and ends in "/" eg. http://ebooks.adelaide.edu.au/m/mill/john_stuart/ convert to archive URL
  4. Exceptions to rule 2 & 3 are Plutarch, Voltaire, etc.. eg. https://ebooks.adelaide.edu.au/p/plutarch/symposiacs/ .. check logs for other exceptions
  5. Optionally where no archive exists, either remove URL from citation or nuke citation if an external link section.

-- GreenC 18:10, 6 January 2024 (UTC)[reply]

Done, saved all but a handful. The existing links were often not to the full text, the archive version didn't follow the chapter tree so the texts were incomplete. I moved many to the "complete.html" version, which is the entire text on a single page, then converted to the archive.org version of that page. Special:Diff/1061289409/1195386287 .. Also, most are 19th century texts, they could be replaced by Gutenberg etc -- GreenC 04:57, 14 January 2024 (UTC)[reply]

oxfordislamicstudies.com

The domain "oxfordislamicstudies.com", referenced in about 400 articles, is returning the "NET::ERR_CERT_COMMON_NAME_INVALID" error.

It seems that in at least some cases, the current content is available at oxfordreference.com. Other possible places to look would be oxcis.ac.eu or perhaps ox.ac.uk. I really have no idea to what extent archive copies of oxfordislamicstudies.com provide any useful content. Fabrickator (talk) 19:05, 7 January 2024 (UTC)[reply]

In the case of http://www.oxfordislamicstudies.com/article/opr/t125/e2280?_hi=2&_pos=2 (non-working link), the archive copy returns useful content, while the oxfordreference.com link provides too little content to likely be of any use. Fabrickator (talk) 19:27, 7 January 2024 (UTC)[reply]
No archived version available at https://fatcat.wiki/release/lookup?doi=10.1093/acref/9780195165203.001.0001 yet either. Were they all HTML pages only or was there a PDF somewhere? Nemo 20:42, 7 January 2024 (UTC)[reply]
The book itself is archived (example). Nemo 20:43, 7 January 2024 (UTC)[reply]
According to [4]: "Oxford Islamic Studies Online product site has been retired. Content you previously purchased on Oxford Islamic Studies Online has now moved to Oxford Reference, Oxford Handbooks Online, or What Everyone Needs to Know. They are paywall sites and no redirect map. The Wayback links will probably be better, worth a try. -- GreenC 05:23, 14 January 2024 (UTC)[reply]

Fabrickator: In 317 articles, I added 413 new archive URLs, 19 {{dead link}}, and changed 106 |url-status=live to dead. -- GreenC 22:30, 15 January 2024 (UTC)[reply]

now Malware: myetymology.com

There are at least fifty uses of "www.myetymology (dot) com" on en.wiki [5], both bare URLs and in Cite templates. This domain seems to have some tricky malware scheme on it: visited via a Chrome browser it shows a page with the Chrome logo and text with something about having to verify that you're human and you should click "Allow". Via a Firefox browser, it puts up a grayed-out dummy page with a white dialog-box-like splash area saying "Before you continue to myetymology.oom" and blather about security and download Firefox add-on", with a single button labeled "continue". It does tricky stuff too: when I switched away from the Chrome window to invoke the snip-it utility to capture it, it changed the display so that it showed a duckduckgo search for "!ducky" (a search engine I don't use). The domain has definitely been usurped, is very likely dangerous, and needs to be eradicated from wikipedia. -- R. S. Shaw (talk) 04:13, 10 January 2024 (UTC)[reply]

WP:JUDI queue for usurpation Special:Diff/1193243552/1195955910 -- GreenC 22:35, 15 January 2024 (UTC)[reply
]

The website has undergone a total revamp, including a change of URL from lawfareblog.com to lawfaremedia.org.

OLD: https://www.lawfareblog.com/
NEW: https://www.lawfaremedia.org/

Valjean (talk) (PING me) 16:31, 10 January 2024 (UTC)[reply]

CloudFlare DDoS mitigation blocking the bot, but resolved. About 408 URLs changed Special:Diff/1187555382/1196040749, another 18 moved the archive URLs and modified |url-status= Special:Diff/1177313295/1196042453. regards -- GreenC 04:34, 16 January 2024 (UTC)[reply
]
Thanks! -- Valjean (talk) (PING me) 05:23, 16 January 2024 (UTC)[reply]

2002 Winter Olympics torch relay broken archive links

Hello. Both 2002 Winter Olympics and 2002 Winter Olympics torch relay use this archive URL but it does not work. Instead it redirects to the Wayback Machine and has a question mark in the URL. Looking at old archived copies of this link, none of the 2001 and 2002 versions work despite being highlighted in blue. Some of the 2002 archived copies redirect to a blank page. I was wondering why this was the case. Thanks! MrLinkinPark333 (talk) 20:50, 16 January 2024 (UTC)[reply]

I reported it, but can not guarantee it will get resolved. I looked in various places and ways and can not find a working replacement for this archive. It's an old site (by Internet standards) and went dead with a few years of creation. Thanks for the report. -- GreenC 22:01, 16 January 2024 (UTC)[reply]
No worries! It does make me wonder if any other archived URLs used on Wikipedia instead redirects to the Wayback Machine and puts a question mark into the URL. This is the first time that has happened to me. MrLinkinPark333 (talk) 00:49, 17 January 2024 (UTC)[reply]
There is link rot within the Wayback Machine itself. My bot WaybackMedic was made (and named) for that purpose, but it takes so long now to check every archive URL, due to the volume, it's not feasible to run it that way anymore. When we started in 2015 there were around 600k archive URLs on enwiki, now there are nearly 12 million and adding about 200k a month. -- GreenC 01:35, 17 January 2024 (UTC)[reply]
Ah. I wasn't aware of the issues with the Wayback Machine. Hopefully this is a limited issue. MrLinkinPark333 (talk) 02:25, 17 January 2024 (UTC)[reply]
Yes I believe it's a very small fraction. Of course we don't know what we don't know, cases like this are only knowable by manual discovery. If it was a lot we'd be hearing more complaints. The cases I can detect, it's like 0.0005% error rate. -- GreenC 03:47, 17 January 2024 (UTC)[reply]

Big Cartoon DataBase

Per Wikipedia:Templates for discussion/Log/2024 January 16#Big Cartoon DataBase Template:Bcdb and Template:BCDB title are being deleted, however there are many other non-templated links to that website that aren't working (see for example the second reference at Tod Carter or the external link at Knight-mare Hare). Reporting here as I don't think anything is currently done with these (archived, marked as dead, or removed) Gonnym (talk) 14:04, 23 January 2024 (UTC)[reply]

Gonnym, I see about 1,000 instances of the templates, and another 1,400 links. The site has been "excluded from the Wayback Machine". But, the first one I checked is available at archive.today. There are a number of options:
  • Convert the 1,000 templates to normal square links, then convert those plus the 1,400 to archive.today, where available, or add a {{dead link}} if not. That way if the site is ever un-excluded from the Wayback in future those archives could get added.
  • Nuclear option: completely eliminate all citations and links to this site.
  • Some other combo, like nuking the 1,000 but trying to save the 1,400 and if any those don't archive then nuke those etc..
Both options are a bit of work, nuking is not clean it's semi-automated each one has to be visually verified it didn't mangle things, but I have done it before and the quantity isn't too high. The conversion and archiving is more automated. My suggestion, if you think the site is completely unreliable and should be eliminated even when it has archives, the nuclear option, otherwise the first option. -- GreenC 14:40, 23 January 2024 (UTC)[reply]
I have no real opinion here as I hadn't participated in that discussion but I'll ping here others that did. @Snowmanonahoe @TechnoSquirrel69 @WikiPediaAid. Gonnym (talk) 14:47, 23 January 2024 (UTC)[reply]
The site is a wiki... I'm impressed it managed to amass 1400 citations. I say nuke it, because again, it's a wiki. Snowmanonahoe (talk · contribs · typos) 15:57, 23 January 2024 (UTC)[reply]
Thanks for the ping, Gonnym! The links being generated by the template are already being removed by a bot since the TfD closed as delete, so we don't need to worry about those. I would rather not indiscriminately delete the other links in citations, just add the archive URL along with a |url-status=dead if applicable. TechnoSquirrel69 (sigh) 15:00, 23 January 2024 (UTC)[reply]
Sounds like that bot is not only eliminating the template, but also the entire citation to BCD. Sounds like a limitation of the bot, it can only delete templates without the option to convert to square links. That's unfortunate because TfD should concern removing templates, not removing citations, which is more the domain of
WP:RSN. This is a common scenario with a mix of templates and links and we end up with this inconsistency. Some cites are completely deleted because of the template, others are kept because they are square links, it's random. Anyway this is not directly related to BCD just observing. I can try to archive what is left no problem. -- GreenC 15:25, 23 January 2024 (UTC)[reply
]
I don't think the bot is removing citations, just the links generated by the {{bcdb}} template. All of the cite links should still be around. TechnoSquirrel69 (sigh) 15:45, 23 January 2024 (UTC)[reply]
For now, I'll retain the citations and treat the links as dead. There is no clear consensus to nuke cites entirely. -- GreenC 01:47, 24 January 2024 (UTC)[reply]
Thanks, GreenC! TechnoSquirrel69 (sigh) 23:06, 24 January 2024 (UTC)[reply]
I made the following edits
  • Remove pre-existing Wayback links since they don't work
  • Add archive.today links when available (1,025)
  • Add {{dead link}} for the rest (697)
  • Update iabot.org so changes can propagate to 300+ other language wikis
If in the future the restriction on Wayback is lifted the bots should be able to convert the dead links. -- GreenC 02:59, 25 January 2024 (UTC)[reply]

Gemini, Apollo, Shuttle Mission "Chronology of Wake-up Calls"

This weblink PDF (https://history.nasa.gov/wakeup%20calls.pdf) is used as a secondary source across a large number of articles for the Gemini, Apollo, and especially Space Shuttle missions. It recently got 404'd, but a very recent archived link is available here (https://web.archive.org/web/20231220093919/https://history.nasa.gov/wakeup%20calls.pdf). It would be great if y'all can add this archive link to the queue. SpacePod9 (talk) 00:54, 24 January 2024 (UTC)[reply]

I submitted an IABot job to process the 56 pages where it's located. -- GreenC 01:51, 24 January 2024 (UTC)[reply]
Thanks for the help! SpacePod9 (talk) 03:43, 24 January 2024 (UTC)[reply]

Canoe.ca

It appears that canoe.ca was once a news website that is referenced in quite a few articles, but it has since been usurped by another gambling website. Unfortunately, the new owners have also blocked the Wayback Machine and only some of the pages I've seen are in archive.today. However, some of the links appear to be salvageable by changing "canoe.ca" to "canoe.com" and then going into the Wayback Machine. Is this something that the bots can help with? Thanks! :Jay8g [VTE] 23:32, 27 January 2024 (UTC)[reply]

That was probably a little confusing. There are basically three ways that existing canoe.ca links can be archived:
  • Archive.today might have a direct archive of the canoe.ca URL
  • The Wayback Machine might have an archive of the same page with "canoe.ca" replaced with "canoe.com"
  • Archive.today might have an archive of the same page with "canoe.ca" replaced with "canoe.com"
As far as I can tell, the canoe.ca and canoe.com pages were completely identical, but all of the links I've checked seem to be dead on both domains. Unfortunately, there are over 10,000 of these links according to Special:LinkSearch, which is too much for me to deal with manually. There are also quite a few dead links to canoe.com itself, but at least those aren't usurped and can be found in the Wayback Machine normally. :Jay8g [VTE] 23:45, 27 January 2024 (UTC)[reply]
Notes for canoe.ca ie. canoe.com:
Proposal for canoe.ca in five runs of WaybackMedic:
  1. Pass 1a (canoe1): Remove all Wayback links  Done - remove 391 archives
  2. Pass 1b (canoe3 & canoe4): Remove all WebCite links (SSL errors and unstable)  Done - remove 329 archives
  3. Pass 2 (canoe2): Attempt conversion to archive.today. Else add {{dead link}}  Done - add 8,353 archive.today, 633 {{dead link}} (total including existing), change 578 |url-status=live to dead
  4. Pass 3a (canoe5): For canoe.ca with a {{dead link}}: check the API if a Wayback link exists if it were converted to canoe.com - if so, change source link to canoe.com and set to live status and remove {{dead link}}  Done - 157 URLs converted to canoe.com
  5. Pass 3b (canoe6): Check the canoe.com links from Pass 3a for link rot, if so, convert to Wayback or archive.today links  Done - 294 Wayback URLs added to canoe.com URLs in the same set of articles processed during Pass 3a (excess due to pre-existing canoe.com links that were dead)
  6. Pass 3c (canoe7): Make a list of citations with {{dead link}}  Done 406 cites listed at Wikipedia:Link rot/cases/canoe.ca
  7. Pass 4 (judi14a and judi14b): Convert canoe.ca to a usurped citation per steps at
    WP:USURPURL
    . This will include completely deleting citations that have no archive URL  Done Edited approximately 6,000 pages.
Proposal for canoe.com
  1. Pass 5 (canoecom): Check for dead links and soft-404s as normal  Done Edited 1,132 articles out of 1,953 checked. Added 1,820 archive URLs. Change 371 |url-status=live to dead
----
User:Jay8g per above proposal. Each pass of the bot has different settings enabled. When done in this order, it should work. The "Pass 3" might result in a lot of deleted citations, I'll let you know before running that one. This will require at least 4 runs of the bot of 6k pages each, plus some manual steps it will take a while. -- GreenC 01:39, 28 January 2024 (UTC)[reply]
That all sounds good to me! Thanks! :Jay8g [VTE] 04:01, 28 January 2024 (UTC)[reply]
I just thought of one issue with pass 4: Because canoe.ca was a news aggregator, some of the citations that currently link to it can be found on other, unrelated websites. For example, the reference in Dwayne Johnson (the first link that comes up for me in the 6,148-page search) points to http://www.canoe.ca/SlamWrestlingArchive/feb24_rocky.html on canoe.ca, but the same article can be found at https://slamwrestling.net/index.php/1998/02/24/a-piece-of-the-rock/ on Slam Wrestling's own website. That exact article is also available using the Wayback Machine with canoe.com, but if it was not available there, replacing it with the slamwrestling.net URL would be better than deleting it. Of course, there's no way to do that without manual work, and anything that's just a bare URL is gone for good.
I will be interested to see how many canoe.ca links are left after steps 1-4, to see whether it makes sense to remove those links entirely or try to find the same articles posted elsewhere first. I'm not sure if this is a situation that has come up before with usurped URLs like this or what the standard practice is. :Jay8g [VTE] 04:18, 28 January 2024 (UTC)[reply]
For the rocky example, there is no map to know where the canoe.ca link should go. And since canoe.ca is now a usurped vice site we are supposed to hide it from view. And if no archive is available, delete it. Let's wait and see how many there are after Pass 3. One solution is rather than delete the entire cite, convert to {{citation}} which doesn't require a URL, convert the |work= to Slam Wrestling, and remove the canoe.ca URL. This kind of work is laborious because there are so many permutations of citation templates and argument combinations people use it's not consistent. Also the square and bare links that don't use templates. -- GreenC 16:54, 28 January 2024 (UTC)[reply]
Yes, there's no automatic way to fix that. I'm also not sure how many of the links would even be able to be manually fixed, since some might not be able to be easily found on other domains. I agree with waiting to see what is left after the bot tries to find archive links to see if it's worth me trying to fix the leftovers manually. :Jay8g [VTE] 22:05, 28 January 2024 (UTC)[reply]
User:Jay8g: Here are the remaining 406 citations with {{dead link}}: Wikipedia:Link rot/cases/canoe.ca .. there are over 11,000 in total on enwiki so the archival success rate was about 96% which is very good. Something still needs to be done with the 406. Options are nuke the citation, which is the only choice for square links. Convert to {{cite news}} and remove the |url= - this option is normally done when the cite can be found offline like microfiche of a newspaper. Of course, there is manual work, where anything is possible. In the mean time, I'll start processing the rest of the canoe.com links, many appear inoperable. -- GreenC 14:36, 30 January 2024 (UTC)[reply]
I spot-checked several of the remaining 406 dead links and was unable to find alternative links for any of them, so I think we should be good to remove the remaining links. Thanks for all your help on this -- I'm impressed by how many links were able to be fixed! :Jay8g [VTE] 21:50, 30 January 2024 (UTC)[reply]
User:Jay8g sounds good. I'll be working on this over the next few days and will post when done. Thanks for bringing this to attention. I've been aware of Canoe, but didn't know it was usurped and excluded from Wayback, that's a new scenario (plus the canoe.com twist). It basically required every feature my bot has and then some, never made so many passes. This was a good learning experience what the bot can do and how. -- GreenC 02:14, 31 January 2024 (UTC)[reply]
As noted above, this is all done finally. -- GreenC 02:34, 5 February 2024 (UTC)[reply]
Most of the content on canoe.ca was from the
Canadian Press stories and a bunch of them list The Canadian Press as the author, publisher, agency, etc. and the URL's with "-ap" were Associated Press
stories. Articles from those agencies should be available in a variety of places. Finding them is the challenge.
The wrestling articles could probably all be found on Slam Wrestling if someone is willing to do the work. I didn't see any equivalent partner sites for other sports or categories.--Jahalive (talk) 02:22, 2 February 2024 (UTC)[reply]
I'm guessing you're not interested in customizing a bot to pull the news agency and date from the URLs of those CP and AP stories.--Jahalive (talk) 00:38, 13 February 2024 (UTC)[reply]
User:Jahalive, your idea is a good one. I'm going to pass because there is more work than I have time for. I want to use the bot and my time where it has the most impact, fixing link rot, that's really the bots specialty. Your idea could probably be done by other bot writers. Could try BOTREQ or AWBREQ -- GreenC 01:19, 13 February 2024 (UTC)[reply]

Warren Abstract Machine citations

Some citations at Warren Abstract Machine are broken, including this one: http://wambook.sourceforge.net/ 185.151.251.58 (talk) 08:54, 31 January 2024 (UTC)[reply]

I ran IABot on the page but it might take a few tries before the bot decides a link is dead. - GreenC 02:19, 2 February 2024 (UTC)[reply]
It was a soft-404 - I set it dead at iabot.org and reran the bot. -- GreenC 03:50, 13 February 2024 (UTC)[reply]

bibliotecadigital.ciren.cl

This Chilean digital library seems to have reformatted its URLs and is used in numerous articles as a source. Here's a list - it seems like they still host most if not all articles but under different URLs. Jo-Jo Eumerus (talk) 13:52, 31 January 2024 (UTC)[reply]

User:Jo-Jo_Eumerus is there an example of old to new? Most likely if it's not obvious how to change there is nothing we can do other than treat the old links as dead and add archives. -- GreenC 02:15, 2 February 2024 (UTC)[reply]
It seems like they still share the titles: https://bibliotecadigital.ciren.cl/server/api/core/bitstreams/72bd0a55-5f0d-4ea6-98c4-116797dce09e/content becomes https://bibliotecadigital.ciren.cl/items/96666f36-9fc4-4833-8a95-0e85c6fd98ce Jo-Jo Eumerus (talk) 11:13, 3 February 2024 (UTC)[reply]
Jo-Jo Eumerus It looks like https://bibliotecadigital.ciren.cl/server/api/core/bitstreams/72bd0a55-5f0d-4ea6-98c4-116797dce09e/content is working. Maybe they had time to repair it. But most of them are still not working. Without a map of old to new, I suggest only check if they are dead and if so add an archive URL. For example https://bibliotecadigital.ciren.cl/handle/123456789/7049 becomes https://web.archive.org/web/20160629061606/https://bibliotecadigital.ciren.cl/handle/123456789/7049 .. I think the new page would be https://bibliotecadigital.ciren.cl/items/96666f36-9fc4-4833-8a95-0e85c6fd98ce but it looks different.-- GreenC 00:27, 12 February 2024 (UTC)[reply]
Aye, same content but a slightly different looking platform. Jo-Jo Eumerus (talk) 12:22, 12 February 2024 (UTC)[reply]

Jo-Jo Eumerus: The bot ran on 25 pages. It added 10 archive URLs, and 9 {{dead link}}. The pages with {{dead link}}. -- GreenC 04:01, 13 February 2024 (UTC)[reply]

cnnphilippines.com

WT:TAMBAY#Archiving news articles of CNN Philippines. Chlod (say hi!) 17:17, 31 January 2024 (UTC)[reply
]

Submitted to IABot. -- GreenC 02:12, 2 February 2024 (UTC)[reply]
I don't know why but IABot missed over 1,000 links so I reran it with WaybackMedic and got the rest. -- GreenC 02:36, 5 February 2024 (UTC)[reply]
Many thanks, @GreenC! Chlod (say hi!) 12:48, 5 February 2024 (UTC)[reply]

themessenger.com

themessenger.com has shut down [6], we have around 186 uses per themessenger.com HTTPS links HTTP links. All of the news articles are now linking to a blank page (e.g. [7]) Hemiauchenia (talk) 19:46, 1 February 2024 (UTC)[reply]

Submitted to IABot. -- GreenC 02:17, 2 February 2024 (UTC)[reply]
User:Hemiauchenia IABot processed this domain, but I had to run it a second time through WaybackMedic. The problem is IABot is missing a lot for reasons I don't understand. Of the 184 articles that contain this domain, after IABot processed it, Medic edited an additional 101 pages adding archive URLs, and converted 43 instances of |url-status=live to dead. -- GreenC 15:29, 13 February 2024 (UTC)[reply]

Wst.tv

Hi, with a heavy heart, the World Snooker Tour has changed its website and changed how all of their links work, and has no real naming convention for most links from wst.tv.

For instance: https://wst.tv/players/jimmy-white/ now is at https://www.wst.tv/players/6100064a-0ea4-4a0c-b8ee-0e2ddaa3def4

News articles and other items have also moved. If there is a smart way for this to be fixed, let me know, but I'm assuming we'd need to archive/mark as dead for the remainder. Lee Vilenski (talkcontribs) 19:39, 2 February 2024 (UTC)[reply]

User:Lee Vilenski I don't see a way to migrate the links, without redirect information. If some have links have a redirect the bot will pick it up automatically. Otherwise it will add an archive URL or {{dead link}}. Looks like 379 pages. -- GreenC 05:57, 3 February 2024 (UTC)[reply]
All of the news articles have moved from https://wst.tv/murphy-takes-season-opener/ to https://www.wst.tv/news/2023/july/21/murphy-takes-season-opener/
It's a mess, I certainly don't see a way to fix it. Lee Vilenski (talkcontribs) 09:04, 3 February 2024 (UTC)[reply]
It's surprisingly common how often websites migrate to a new platform, and don't leave redirects. If you want, contact them to ask if they plan to leave redirects and mention Wikipedia as an example. For now I can still add the archives, and if in the future they add redirects, the bot can undo the archives, make it live again and migrate to the new redirected URL. Either way it's basically flipping a switch in the bot. -- GreenC 14:12, 3 February 2024 (UTC)[reply]
Regarding contacting WST: My experience is that they do not respond. It might be better to try to convince their software suppliers to provide redirects. It would appear that there are two companies involved. One is https://urbanzoo.io/ and the other is https://www.imgarena.com/.  Alan  (talk) 12:42, 4 February 2024 (UTC)[reply]
It looks like content was not migrated. For example old site https://wst.tv/white-completes-epic-comeback/ search at the new site: "White Completes Epic Comeback" in the news tab Search with no result. Likewise Google: https://www.google.com/search?client=firefox-b-1-lm&q=%22White+Completes+Epic+Comeback%22+site%3Awst.tv .. looks like a complete resetting of the site and any matches found, like with the /players, could be happenstance. --- GreenC 17:39, 4 February 2024 (UTC)[reply]

I was able to build a preliminary map of the player pages, by headless browsering https://www.wst.tv/players/ and reformatting the HTML into this table, making a best guess on the left column. If the bot encounters a URL in the left column, it will replace with the right column. -- GreenC 17:14, 4 February 2024 (UTC)[reply]

I think it is much more complex than that. The old site had pages for many more players than are currently included in https://www.wst.tv/players which only has current players. Look at https://web.archive.org/web/20221126125804/https://wst.tv/player_category_taxonomy/other-players/. Most of these are gone completely, and many are referred to in our articles.  Alan  (talk) 10:12, 5 February 2024 (UTC)[reply]
...for instance: if you search in https://www.wst.tv/players for "Davis", you will only get Mark Davis. The old site included Steve Davis, Joe Davis and Fred Davis, who were significant players, apparently now forgotten by WST.  Alan  (talk) 10:27, 5 February 2024 (UTC)[reply]
OK I was afraid of that, it didn't seem like many players. It does appear the old site and content was completely abandoned, and the new site has some overlap but that is happenchance and can't be assumed to contain the same actual content on the page even if a match can be made. They didn't do a site migration. In this case for citation verification purposes the correct action is treat everything from the old site as a dead link and hope there are archive available. -- GreenC 14:40, 5 February 2024 (UTC)[reply]
That's pretty much what we've been doing. If you look at the List of snooker players you'll see that all the references have working archives.  Alan  (talk) 15:14, 5 February 2024 (UTC)[reply]
Extended content
awk -ilibrary 'BEGIN{f=readfile("snook1.html"); for(i=1;i<=splitn(f,a,i);i++) {j++; if(j == 5) {j = 1; print "https://wst.tv/players/" tolower(fname) "-" tolower(lname) " --  https://www.wst.tv/" subs("href=\"/","",id) }; if(j == 1) {match(a[i], /href=["]\/players\/[^"]+[^"]/, d); id=d[0]}; if(j == 2) {fname=strip(a[i])}; if(j==4){lname=strip(a[i])}  }  }'

https://wst.tv/players/mark-allen --  https://www.wst.tv/players/c37aba27-5b12-4fae-8a8b-9e749c7a25f3
https://wst.tv/players/zhang-anda --  https://www.wst.tv/players/0512f55a-faea-48df-a8fc-895fbcaef511
https://wst.tv/players/muhammad-asif --  https://www.wst.tv/players/3f7a3e33-3889-4c3f-91e3-a6d876c8b999
https://wst.tv/players/john-astley --  https://www.wst.tv/players/49e85842-53d7-4fdb-b69b-4a0db92ff06d
https://wst.tv/players/stuart-bingham --  https://www.wst.tv/players/ac932300-dacb-4e91-803b-99a03fa20853
https://wst.tv/players/luca-brecel --  https://www.wst.tv/players/cd124662-9d97-413c-9609-5051d002ab3b
https://wst.tv/players/jordan-brown --  https://www.wst.tv/players/c49e98bc-101d-419a-81aa-ff2caedb1734
https://wst.tv/players/oliver-brown --  https://www.wst.tv/players/fe7732cc-435e-4ba8-84bf-25f771f0f376
https://wst.tv/players/alfie-burden --  https://www.wst.tv/players/b6350368-74fc-4adf-92c8-ff9126e90541
https://wst.tv/players/ian-burns --  https://www.wst.tv/players/80c5ce19-2c01-48a4-85e4-c0304ac1ea4a
https://wst.tv/players/james-cahill --  https://www.wst.tv/players/4b7b307c-8ec8-4b53-b46e-6817081b95c4
https://wst.tv/players/stuart-carrington --  https://www.wst.tv/players/37a87bd0-792f-46ae-9377-56df3bef9034
https://wst.tv/players/ali-carter --  https://www.wst.tv/players/c796b82d-1040-422d-b27d-9249310b99a3
https://wst.tv/players/ashley-carty --  https://www.wst.tv/players/32dedd2f-0e09-4c03-bed3-679646da516b
https://wst.tv/players/jamie-clarke --  https://www.wst.tv/players/b29c7ae2-4f1c-413c-92bb-01ce78d99b08
https://wst.tv/players/sam-craigie --  https://www.wst.tv/players/edcdfdad-8c65-48fb-94f0-b9b3ac9ad04d
https://wst.tv/players/dominic-dale --  https://www.wst.tv/players/86fd8e51-3964-497c-97c3-729cef44b1f0
https://wst.tv/players/mark-davis --  https://www.wst.tv/players/0398e6dc-dcbf-4ff0-9ff2-7515212bc818
https://wst.tv/players/ryan-day --  https://www.wst.tv/players/5d419487-e341-4301-a4f5-e493a2a78754
https://wst.tv/players/ken-doherty --  https://www.wst.tv/players/e9c5eddd-e493-473e-b688-a3a2ea861800
https://wst.tv/players/scott-donaldson --  https://www.wst.tv/players/ff710b2f-cf05-45d6-840e-e10a7dc9f921
https://wst.tv/players/mostafa-dorgham --  https://www.wst.tv/players/14243478-1def-4ce2-a9a0-80a2858abe32
https://wst.tv/players/graeme-dott --  https://www.wst.tv/players/e0f5c435-470e-4ac3-8406-5ccd39fd475c
https://wst.tv/players/adam-duffy --  https://www.wst.tv/players/2fc33800-aaf8-4e7f-9af0-afc58df79ed2
https://wst.tv/players/ahmed aly-elsayed --  https://www.wst.tv/players/f65d2c9a-513a-458b-9c8b-edfc3aebbce6
https://wst.tv/players/dylan-emery --  https://www.wst.tv/players/0106063a-5a37-47c3-9cbf-67a891012a5e
https://wst.tv/players/reanne-evans --  https://www.wst.tv/players/bc4020ad-76c2-42a4-8994-dd0f756d0b6a
https://wst.tv/players/tom-ford --  https://www.wst.tv/players/69df4145-0b26-4a1e-9afb-c9ae74fa3fd1
https://wst.tv/players/marco-fu --  https://www.wst.tv/players/5012642c-60cc-4ab3-a41b-b152370562eb
https://wst.tv/players/david-gilbert --  https://www.wst.tv/players/9b2532c1-a189-4573-8320-f254d2f9bfde
https://wst.tv/players/martin-gould --  https://www.wst.tv/players/2a0e2004-856c-4f0b-ae3e-54dded6141f8
https://wst.tv/players/david-grace --  https://www.wst.tv/players/ad650d94-b08b-4dc5-9c5f-1653dc909127
https://wst.tv/players/liam-graham --  https://www.wst.tv/players/75baf94d-2c63-42dc-8acb-4e7a5a7bcb09
https://wst.tv/players/xiao-guodong --  https://www.wst.tv/players/c3d39c08-92fd-471b-8901-903a4bd22027
https://wst.tv/players/he-guoqiang --  https://www.wst.tv/players/5587fb4d-8517-4572-918e-65ff83b71d74
https://wst.tv/players/ma-hailong --  https://www.wst.tv/players/a2dbb55d-a612-4aef-9a1c-b9401232eac5
https://wst.tv/players/anthony-hamilton --  https://www.wst.tv/players/a3789843-3f0c-4161-b68a-b770fff83f96
https://wst.tv/players/lyu-haotian --  https://www.wst.tv/players/022c7a82-72c5-4fb5-a748-eb9b249d33fb
https://wst.tv/players/barry-hawkins --  https://www.wst.tv/players/ec561f17-e982-43b3-8807-82fc76adbe75
https://wst.tv/players/louis-heathcote --  https://www.wst.tv/players/e8d25a73-348b-40cd-b4e8-f757250d8900
https://wst.tv/players/stephen-hendry --  https://www.wst.tv/players/8ef2e9be-1769-40e9-8235-a143c9ed5951
https://wst.tv/players/andy-hicks --  https://www.wst.tv/players/66dd278a-0996-41ce-a3c4-3213fda0693c
https://wst.tv/players/john-higgins --  https://www.wst.tv/players/a5eecca1-8302-4739-84fc-6721627baa43
https://wst.tv/players/andrew-higginson --  https://www.wst.tv/players/83deba83-12f0-446d-ab47-e43f5b8ab09e
https://wst.tv/players/liam-highfield --  https://www.wst.tv/players/15860676-6802-4c5d-a06e-ce1356e8cdb7
https://wst.tv/players/aaron-hill --  https://www.wst.tv/players/be51ee14-4b28-4932-8d3d-af8011dc9201
https://wst.tv/players/liu-hongyu --  https://www.wst.tv/players/b614e094-3724-419a-a052-13261ace5b05
https://wst.tv/players/ashley-hugill --  https://www.wst.tv/players/6be559fd-aaac-45af-bd53-5eaa54b22553
https://wst.tv/players/mohamed-ibrahim --  https://www.wst.tv/players/1aa06013-1544-4fd7-b3e7-e8682676acd5
https://wst.tv/players/asjad-iqbal --  https://www.wst.tv/players/b765daf4-6bf6-41e5-b298-50769ed0d841
https://wst.tv/players/himanshu-jain --  https://www.wst.tv/players/218661d8-4ebe-4700-9907-0d0e2af0aeeb
https://wst.tv/players/si-jiahui --  https://www.wst.tv/players/f3c7e0cf-7cb6-405e-9ba1-4d02716a20c3
https://wst.tv/players/jak-jones --  https://www.wst.tv/players/036bc430-6c51-4d63-a366-a6ca218f7f39
https://wst.tv/players/jamie-jones --  https://www.wst.tv/players/a85bdd17-6038-43c8-9cec-d492e4a8a2df
https://wst.tv/players/mark-joyce --  https://www.wst.tv/players/710a2723-9694-4cca-8827-64ee50386179
https://wst.tv/players/jiang-jun --  https://www.wst.tv/players/cf6b1e24-e90e-4420-8290-1c1b0f9ea97e
https://wst.tv/players/ding-junhui --  https://www.wst.tv/players/3ff06750-8c3c-456c-8fac-58209b6f679e
https://wst.tv/players/pang-junxu --  https://www.wst.tv/players/9c842985-9f09-4bd0-aa6a-dafe523b40ee
https://wst.tv/players/anton-kazakov --  https://www.wst.tv/players/cbe2d832-5b47-4b91-bf4e-1e482c875825
https://wst.tv/players/jenson-kendrick --  https://www.wst.tv/players/17e59e8f-42b0-4332-bfaa-452366af8280
https://wst.tv/players/rebecca-kenna --  https://www.wst.tv/players/36672a61-a02f-428b-94a1-d42323bccbb3
https://wst.tv/players/lukas-kleckers --  https://www.wst.tv/players/ccd2b587-4c53-40a5-8b4a-e90b7663ce56
https://wst.tv/players/sanderson-lam --  https://www.wst.tv/players/52ba4e5c-fea6-426c-8ab0-7ca6828d13d5
https://wst.tv/players/rod-lawler --  https://www.wst.tv/players/c9a6633d-a5f9-4302-aacd-c2869fe9259b
https://wst.tv/players/julien-leclercq --  https://www.wst.tv/players/690dc31c-2392-4dd0-8dd9-52e5825cab46
https://wst.tv/players/andy-lee --  https://www.wst.tv/players/d758aa70-d8b1-446a-8284-b2a1ace120bb
https://wst.tv/players/david-lilley --  https://www.wst.tv/players/6757b432-8dc6-4c8d-a345-dac8eb58edf5
https://wst.tv/players/oliver-lines --  https://www.wst.tv/players/c7c75376-75ce-4e4b-ba26-d6c8a098ec9b
https://wst.tv/players/jack-lisowski --  https://www.wst.tv/players/d56f02ab-f2df-41ca-b9a4-24167aded141
https://wst.tv/players/stephen-maguire --  https://www.wst.tv/players/c07238de-bca9-4067-9749-00841bd06d28
https://wst.tv/players/anthony-mcgill --  https://www.wst.tv/players/ac8407bc-1cbf-4642-86a3-1e3cacbaeb62
https://wst.tv/players/ben-mertens --  https://www.wst.tv/players/e9a8f8aa-aa8c-4e64-baa4-3fcfd07ebb26
https://wst.tv/players/hammad-miah --  https://www.wst.tv/players/0ffdae01-5fad-40c8-8b9f-8eb3a942ecac
https://wst.tv/players/robert-milkins --  https://www.wst.tv/players/95eec847-2905-491f-abbe-92ff39038bda
https://wst.tv/players/stan-moody --  https://www.wst.tv/players/a65d6cc8-05fa-4827-8294-a1da17c975f6
https://wst.tv/players/ross-muir --  https://www.wst.tv/players/8051730e-7460-4773-b262-9188f2166f61
https://wst.tv/players/shaun-murphy --  https://www.wst.tv/players/03fe92d3-ad85-434c-bc17-5fe02a496187
https://wst.tv/players/mink-nutcharut --  https://www.wst.tv/players/ae9dffcf-4e09-472a-848e-21bf165f975e
https://wst.tv/players/fergal-o'brien --  https://www.wst.tv/players/cefe88f9-89da-4460-9ed6-6e04ec69cec3
https://wst.tv/players/joe-o'connor --  https://www.wst.tv/players/c2809815-3bd0-41fa-b727-458e22c98070
https://wst.tv/players/martin-o'donnell --  https://www.wst.tv/players/8195961a-a4b7-4ba7-960b-08ab4778dbd3
https://wst.tv/players/sean-o'sullivan --  https://www.wst.tv/players/50da4361-072d-418d-a2a0-721866983d02
https://wst.tv/players/ronnie-o'sullivan --  https://www.wst.tv/players/226c7294-655e-4925-bcde-17330ddfc438
https://wst.tv/players/jackson-page --  https://www.wst.tv/players/19ce247e-1824-4f94-8fe3-c94ce4056802
https://wst.tv/players/andrew-pagett --  https://www.wst.tv/players/d338eb63-5268-427e-a60c-52cb55a56625
https://wst.tv/players/tian-pengfei --  https://www.wst.tv/players/4b168b1a-298b-4c0a-adf6-e3190e36caff
https://wst.tv/players/joe-perry --  https://www.wst.tv/players/a33b80af-7f17-4bb1-8c5d-d36e45eb801c
https://wst.tv/players/andres-petrov --  https://www.wst.tv/players/fc2f8de1-4d6a-40a1-84d2-faea2c5fdb8d
https://wst.tv/players/manasawin-phetmalaikul --  https://www.wst.tv/players/b95907dd-e602-4448-9c78-00c865f4bcd5
https://wst.tv/players/liam-pullen --  https://www.wst.tv/players/44b09a9f-4ded-4b51-80f5-dbd28eb86274
https://wst.tv/players/jimmy-robertson --  https://www.wst.tv/players/4e7f33e8-925d-4442-b8f7-6023cd920d9e
https://wst.tv/players/neil-robertson --  https://www.wst.tv/players/8b83133a-4c15-4275-811e-bdf2cb02702f
https://wst.tv/players/noppon-saengkham --  https://www.wst.tv/players/aaf6c342-11f7-4d03-86b3-1144a4fd92f8
https://wst.tv/players/victor-sarkis --  https://www.wst.tv/players/a91dbb92-a44c-4076-8694-5c08cd40c534
https://wst.tv/players/mark-selby --  https://www.wst.tv/players/ba7831b4-ab75-4435-946a-c6f02e4e2d4b
https://wst.tv/players/matthew-selt --  https://www.wst.tv/players/c1ac359d-8359-405b-9879-74dd9b4a5b2c
https://wst.tv/players/xu-si --  https://www.wst.tv/players/f5586d0e-89f5-434e-8723-65046b1d6fe9
https://wst.tv/players/yuan-sijun --  https://www.wst.tv/players/734865fe-9ee2-4a3e-b4d1-035bf819aff2
https://wst.tv/players/ishpreet-singh chadha --  https://www.wst.tv/players/cc2c8bf7-0c67-4751-9e36-7b86718164b1
https://wst.tv/players/baipat-siripaporn --  https://www.wst.tv/players/53cd277e-28fe-48ed-a0ce-4d5d9745c85f
https://wst.tv/players/elliot-slessor --  https://www.wst.tv/players/b1239913-b987-4bae-a7f6-ff4eb481f503
https://wst.tv/players/matthew-stevens --  https://www.wst.tv/players/af1c65bd-d676-4bfc-8e93-65e34adf93c7
https://wst.tv/players/zak-surety --  https://www.wst.tv/players/24564b03-cfd6-474c-a653-0268241d632f
https://wst.tv/players/allan-taylor --  https://www.wst.tv/players/d1cf990f-e5b8-4584-acce-2bd9b534fcb5
https://wst.tv/players/ryan-thomerson --  https://www.wst.tv/players/1227cfd1-3132-405f-a672-4bdf64538df3
https://wst.tv/players/rory-thor --  https://www.wst.tv/players/9d43b39f-b17f-415f-b779-eebc550cd265
https://wst.tv/players/judd-trump --  https://www.wst.tv/players/e2f3cfe7-6138-4ce6-b1dc-77dcc1d0a65f
https://wst.tv/players/thepchaiya-un-nooh --  https://www.wst.tv/players/67203224-1d66-4c1e-b655-150f4f835aba
https://wst.tv/players/alexander-ursenbacher --  https://www.wst.tv/players/12be0769-d225-4c97-b687-4753e3c1bc26
https://wst.tv/players/hossein-vafaei --  https://www.wst.tv/players/99019ac8-ad6a-4927-9f93-1935ea43ca55
https://wst.tv/players/chris-wakelin --  https://www.wst.tv/players/a1beeb4b-2493-476c-9682-1900eb83c2d5
https://wst.tv/players/ricky-walden --  https://www.wst.tv/players/80b7e0a3-61eb-4a12-b4c4-9d6da83d5b24
https://wst.tv/players/daniel-wells --  https://www.wst.tv/players/a458950b-c644-4f16-b89a-543ccfccc61c
https://wst.tv/players/jimmy-white --  https://www.wst.tv/players/6100064a-0ea4-4a0c-b8ee-0e2ddaa3def4
https://wst.tv/players/michael-white --  https://www.wst.tv/players/9728dd54-b60e-4bf5-9149-cecb93b530ee
https://wst.tv/players/robbie-williams --  https://www.wst.tv/players/8954fbf2-3b42-4af9-981b-333ec1cd8b03
https://wst.tv/players/mark-williams --  https://www.wst.tv/players/6aaddcbb-345c-474a-9069-e7757e155729
https://wst.tv/players/gary-wilson --  https://www.wst.tv/players/e5f4377c-5119-4c0a-9a88-e42eb8e48677
https://wst.tv/players/kyren-wilson --  https://www.wst.tv/players/a8c0d3a6-706b-4bf0-8dce-9cde97fe88c4
https://wst.tv/players/ben-woollaston --  https://www.wst.tv/players/8ad4ff3f-9f92-44ba-a884-6c8a8e0dcf08
https://wst.tv/players/peng-yisong --  https://www.wst.tv/players/78c09fb8-3382-4cb0-a3e8-d0f041f23389
https://wst.tv/players/wu-yize --  https://www.wst.tv/players/d935d534-e696-4292-b773-e9b8efee1ea7
https://wst.tv/players/dean-young --  https://www.wst.tv/players/2354ac0b-0b04-4965-8ae3-1f135713005c
https://wst.tv/players/zhou-yuelong --  https://www.wst.tv/players/960cd1e6-2bb4-4229-aefe-447646412bf2
https://wst.tv/players/cao-yupeng --  https://www.wst.tv/players/3a9eca87-f640-4942-a9a7-74a47f40c562
https://wst.tv/players/long-zehuang --  https://www.wst.tv/players/40859ee8-e438-4062-aa9b-84e4e8e22bac
https://wst.tv/players/fan-zhengyi --  https://www.wst.tv/players/8cbf82f6-c417-421c-ae39-17c8103284cd
  •  Done User:AlH42, the bot is done. It edited 371 articles. Added 1,267 archive URLs. Converted 1,248 cases of |url-status=live to dead. -- GreenC 03:20, 6 February 2024 (UTC)[reply]
Good work! My poor, poor watchlist. Just need to work out what we can do with the remainder. Lee Vilenski (talkcontribs) 08:07, 6 February 2024 (UTC)[reply]
User:AlH42: Not too bad, articles where the bot added a {{dead link}}
-- GreenC 14:48, 6 February 2024 (UTC)[reply]
Thank you. I think we still have a lot to do though. And the WST player template is a problem.  Alan  (talk) 15:10, 6 February 2024 (UTC)[reply]
The bot should have processed every link for the domain in mainspace. It might have missed some rare cases where it has trouble parsing the page. The template space I didn't do. There might be some in File space, I have not checked. Anyway if you think you need more bot help, let me know. -- GreenC 15:44, 6 February 2024 (UTC)[reply]

Google cache

Apparently, the Google cache (webcache.googleusercontent.com) is about to be shut down. There are over 5,000 pages with these links, and many of them appear to already be broken. These should probably be replaced with the original URL and/or proper archive links if available, depending on how they are currently being used. :Jay8g [VTE] 00:59, 5 February 2024 (UTC)[reply]

I'll work on this.  Doing... - if you see this request brought up elsewhere point them here. The links are messy and so are placements within templates it will need some care. -- GreenC 01:29, 5 February 2024 (UTC)[reply]
Would archive.org still have the info? If so we should try to get all of it so it is easily replaceable by regex. Geardona (talk to me?) 15:29, 5 February 2024 (UTC)[reply]
Not all the now-dead original urls have archive.org links, is it possible to put google cache archive links into archive.org to 'save' the pages? Kingsif (talk) 22:47, 8 February 2024 (UTC)[reply]
The bot is more sophisticated than blindly converting to archive.org links. It will take 4 different actions, depending on the status of the source URL (live or dead), and archive availability for 1) the source URL and 2) Google Cache URL (at archive.org). In terms of creating new archive.org pages from the GC page, that only would work if the GC is still working which in most cases it not true, and when it is true, the source URL is usually live anyway, so there is no reason for either GC or archive.org -- GreenC 17:25, 9 February 2024 (UTC)[reply]
  •  Done - Google Cache is eliminated from Enwiki. It was in about 5,000 pages. It was a significant undertaking for multiple reasons. There are still 834 inside archive.org pages. One of four actions were taken: 1) original URL is live simply remove the Google Cache and replace with the original URL 2) Original URL is dead and no archives available, remove the Google Cache replace with the original URL and add a {{dead link}} 3) Original URL is dead but has an archive at another provider available 4) Original URL is dead and the Google Cache URL has an archive at another archive provider (the 834 linked above). Option #1 was most common surprisingly. For anyone wanting to do this elsewhere, I made a tool to convert Google Cache URLs to the original source URL: https://github.com/greencardamom/Googcacheparse -- GreenC 16:19, 11 February 2024 (UTC)[reply]
    Thanks again for your work on this! :Jay8g [VTE] 22:32, 11 February 2024 (UTC)[reply]

linguistlist.org

This site is linked to by the linglist parameter in {{Infobox language}}. Snowmanonahoe (talk · contribs · typos) 23:19, 5 February 2024 (UTC)[reply]

User:Snowmanonahoe: I only see it on two pages: https://en.wikipedia.org/wiki/Special:LinkSearch?target=linguistlist.org%2Fmultitree --The site itself looks dead since 2008 or 2009. -- GreenC 00:49, 6 February 2024 (UTC)[reply]
GreenC: try Special:LinkSearch/multitree.org/codes/. Those urls all redirect to linguistlist.org/multitree now. Snowmanonahoe (talk · contribs · typos) 00:58, 6 February 2024 (UTC)[reply]
User:Snowmanonahoe: Ok. There are 75 pages. Compare results at Archive.today with WaybackMachine. I recommend a first pass using Archive.today, and any not available a second pass will use WaybackMachine. Sound alright? BTW the entire linguistlist.org site looks like it needs review 421 pages. They made a new website and the old inbound links are not working right. The new website links are working. -- GreenC 02:30, 6 February 2024 (UTC)[reply]
I think Kwamikagami should weigh in on this first. Snowmanonahoe (talk · contribs · typos) 03:08, 6 February 2024 (UTC)[reply]
I gave up on getting multitree links to work back when they were basically offline. I didn't know they were up again.
Multitree is generally not a RS. I would avoid using them except for extinct languages where Linglist maintains the description of the ISO code (like Ethnologue does for living languages); for classification trees of various authors (e.g. on our Austroasiatic article); and maybe a couple other things I'm not thinking of, but not as a general reference.
Is there something in particular you wanted me to weigh in on? I'd think we'd want to update the links when we use them, as I can't think of any reason we'd want to preserve or link to old versions of their pages. — kwami (talk) 03:30, 6 February 2024 (UTC)[reply]
I would avoid using them except [some] .. OK my job is to save the dead links by adding an archive URL. It's only about 75 links. You can remove some citations and keep others as you prefer, once the archives are added, so you will be able to see what the content of the page is. -- GreenC 14:54, 6 February 2024 (UTC)[reply]
That should work just fine. No need for you to evaluate the quality of the ref. — kwami (talk) 15:25, 6 February 2024 (UTC)[reply]
For the 75 pages with multilist.org/codes URLs it is a multi-pass run:
  1. Pass 1 (multitree1): Remove existing archive.org links
  2. Pass 2 (multitree2): Add archive.today where available
  3. Pass 3 (multitree3): Add archive.org where available
User:Kwamikagami: 75 pages with multilist.org/codes - they should have either an archive URL or a {{dead link}} otherwise the bot had trouble parsing the citation. -- GreenC 04:00, 12 February 2024 (UTC)[reply]
Thanks. I'd only reviewed instances called from the info box. Will go thru them over the next few days. Looks like about half should be removed, as they're things that can be cited to RS's. — kwami (talk) 08:11, 12 February 2024 (UTC)[reply]
  • linguistlist.org was also processed (about 450 pages) and many problems were found and repaired: Dead links, soft-404s, migrated links, Cloud Flare blocks. -- GreenC 20:32, 12 February 2024 (UTC)[reply]
    Thanks for all the work with that. — kwami (talk) 23:26, 12 February 2024 (UTC)[reply]
    Is there a better way to handle the 512 auto-generated refs at Category:Languages with Linglist code? Or would they all have to be done by hand? — kwami (talk) 23:50, 12 February 2024 (UTC)[reply]
It is being generated by Template:Infobox_language/linguistlist. Are most multitree.org/codes URLs dead, or only some? Or not sure? -- GreenC 00:05, 13 February 2024 (UTC)[reply]
It's also in Template:Infobox language/ref and Module:Infobox language. It looks like all of multitree.org is retired. What if change the template to use a generic archive URL, and hope for the best: Special:Diff/1140877092/1206755696, Special:Diff/996938315/1206753611 and Special:Diff/1114901671/1206760361 - this is a stop-gap solution because archive.org won't have archives for all of the URLs. Ideally multitree.org would be removed from Template:Infobox_language and sub-templates and individually archive URLs added to replace the ones auto-generated, at the same location where it was auto-generated. Somewhat difficult. -- GreenC 01:45, 13 February 2024 (UTC)[reply]
Yeah, they appear to be defunct. But they are the official ISO repository for descriptions of languages extinct before ca. 1950, equivalent to Ethnologue for recent languages. We really should have a link to the official site. — kwami (talk) 02:40, 13 February 2024 (UTC)[reply]
Maybe it's OK with generic archive URLs at the Infobox layer. If not enough, will need to remove the Infobox support, add the citations individually to each article, and run archive bots to add archive URLs. -- GreenC 03:49, 13 February 2024 (UTC)[reply]

hobbes.nmsu.edu

OS/2 repository going offline in April. Only a few pages on enwiki. [8] -- GreenC 15:32, 6 February 2024 (UTC)[reply]

 Done -- GreenC 16:34, 13 February 2024 (UTC)[reply]

iltalehti.fi

I've noticed that some of the 1,222 Iltalehti URLs are dead but bots don't fix them:

All those pages give the Finnish-language text "Hakemaasi sivua ei valitettavasti löytynyt." (= "Unfortunately, the page you were looking for could not be found."). I tried to Google those URLs' headlines, but I couldn't find new URLs for them, so I think Iltalehti has removed those articles from their website completely. Could a bot go through Iltalehti URLs and set an archive link for the Iltalehti webpages that have that exact text on them? Also, if there's a way to fix these, can it be set that InternetArchiveBot fixes them eventually on other language wikis as well? Like GreenC did a month ago in the discussion above #Ilta-Sanomat to the Ilta-Sanomat URLs. For example, there are 10,070 Iltalehti URLs on fi.wikipedia. Thank you again. 85.76.13.79 (talk) 15:35, 11 February 2024 (UTC)[reply]

I requested IABot to run on Maj-Len Grönholm and it fixed it Special:Diff/1194582882/1206244327. Probably IABot hasn't automatically processed the pages yet. I'll take a look at it though, because I know IABot has gaps in coverage what it processes. I'll run it through WaybackMedic which will get them all, plus look for soft-404s like that "Unfortanately" string when the pages otherwise return status 200. Whatever it finds it will update the IABot database, and that should eventually propagate to the rest of the wikis. -- GreenC 16:35, 11 February 2024 (UTC)[reply]
Thanks again. One thing I noticed though: If either blogit., m. or plus. preceded iltalehti.fi in the URL, the bot changed the URL to the main page URL https://www.iltalehti.fi. I found 12 edits in question with this search: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12. Can the bot fix these or do we have to fix these by hand? 85.76.13.79 (talk) 13:00, 14 February 2024 (UTC)[reply]
Oh sorry looks like I missed those, they are soft-404s. If you will manually restore them to the original URL, I can rerun the bot on those pages. It will add an archive URL instead of following the redirect to the homepage. -- GreenC 14:21, 14 February 2024 (UTC)[reply]
Alternatively you can just revert the entire edit by the bot if there is no intervening edit, and the bot will redo the entire page, if that's easier. -- GreenC 14:23, 14 February 2024 (UTC)[reply]
Done. 85.76.13.79 (talk) 20:26, 15 February 2024 (UTC)[reply]
Also done. Special:Diff/1207818208/1207836829 -- GreenC 21:20, 15 February 2024 (UTC)[reply]

Normally I catch these. Output of the "l4s4" script (ie. show redirects with 4 or more cases):

mintbox:[] ./l4s4 
7 -  https://www.iltalehti.fi/politiikka/a/201712072200588364 
4 -  https://www.iltalehti.fi/perhe/a/200612185426589 
4 -  https://www.iltalehti.fi/popstars/a/200701145593138 
4 -  https://www.iltalehti.fi/uutiset/a/2016061121711142 
4 -  https://www.iltalehti.fi/viihdeuutiset/a/201801072200651274 
12 -  https://www.iltalehti.fi 
4 -  https://www.iltalehti.fi/viihde/a/2009073010005660 
8 -  https://www.iltalehti.fi/politiikka/a/201801182200679167 

ie. there were 12 pages with redirects to https://www.iltalehti.fi .. But I forgot to run the script before committing changes to wiki. -- GreenC 21:20, 15 February 2024 (UTC)[reply]

newindianexpress.com

Many old links don't redirect to their new ones, like this doesn't take us here. Better to tag the old ones as dead. Kailash29792 (talk) 13:09, 12 February 2024 (UTC)[reply]

 Doing... -- GreenC 17:55, 16 February 2024 (UTC)[reply]
 Done - The domain exists in 15,261 pages. The bot made changes in 8,467 pages. The changes were adding new archive URLs 5,240. Added 238 {{dead link}} where no archive URL existed. Changing 1,220 |url-status=live to dead. And a bunch of other misc cleanup work. Changes are also uploaded in IABot so it will propagate to 300+ other wikis. User:Kailash29792 this was a much needed cleanup thank you for bringing to attention. -- GreenC 18:13, 17 February 2024 (UTC)[reply]

crossrail.co.uk

All URLs under the crossrail.co.uk domain are now redirecting to https://web.archive.org/web/20221229005042/https://www.crossrail.co.uk/# with subpages just going to the same archive of the main page breaking links. All links therefore need to be marked as dead and pointed to an archive earlier than 29 December 2022. Thryduulf (talk) 12:51, 14 February 2024 (UTC)[reply]

Interesting. Never seen that before (HTML redirect to archive.org for the entire site). I like it. The site appears to be mostly usable via the archive version. Simple solution for general purposes. Well, like you say, we can do better at enwiki. I'll add more specific archive URLs for each page. -- GreenC 18:05, 16 February 2024 (UTC)[reply]

 Done - The bot checked 124 pages that have the domain. It edited 101 pages. Added 161 archive URLs. Converted 51 |url-status=live to dead. Added two {{dead link}}. Updated IABot with information so it propagates to 300+ other wikis. Thryduulf thank you for the notification. -- GreenC 21:05, 18 February 2024 (UTC)[reply]

pomus.net

Sometimes it redirects to a pornsite and sometimes to different fake "I am not a bot" websites. There are many links to it, all of which require url-status=usurped - Altenmann >talk 07:11, 15 February 2024 (UTC)[reply]

WP:JUDI (usurpation) queue Special:Diff/1202023308/1207703597, thank you. -- GreenC 13:46, 15 February 2024 (UTC)[reply
]

royin.go.th

Several years ago, the Royal Institute of Thailand changed its name to the Royal Society of Thailand and most (but not all) of the content from its old website, under the domain www.royin.go.th, is now preserved under the subdomain legacy.orst.go.th . Can this be handled by a bot? --Paul_012 (talk) 10:12, 22 February 2024 (UTC)[reply]

58 pages. When I try the first one http://www.royin.go.th/th/knowledge/detail.php?ID=639 it doesn't work at http://legacy.orst.go.th/th/knowledge/detail.php?ID=639 rather wants to redirect to https://www.orst.go.th/?ID=639 however I can't read Thai and don't know if that is a soft-404 or legitimate page. -- GreenC 15:17, 22 February 2024 (UTC)[reply]
It seems links like that one are too old and weren't preserved, and that they constitute more than a small minority. 58 isn't a lot; maybe I can check them manually and replace them with AWB. --Paul_012 (talk) 15:45, 22 February 2024 (UTC)[reply]
Thanks. It would be better if you can. -- GreenC 16:17, 22 February 2024 (UTC)[reply]

Vice Media

Just wanted to flag that per Vice reporters on social media, there are concerns that the Vice Media website is about to be shutdown (a la The Messenger (website)). Is there a way to make sure that all articles using it as a source have archive links? Thanks! Sariel Xilo (talk) 21:16, 22 February 2024 (UTC)[reply]

Concerns about the total shuttering have just been picked up by Hollywood Reporter with the top editor unable to confirm if the website will be pulled down. Sariel Xilo (talk) 21:23, 22 February 2024 (UTC)[reply]
Wow that's over 17,000 pages. No worries if they shut it down we'll add archives. Any link added to Wikipedia should be archived at Wayback automatically, and big sites like this are typically crawled entirely. Too bad if true they had a lot of good content. -- GreenC 22:01, 22 February 2024 (UTC)[reply]
Confirmed that they'll stop publishing on Vice.com [9], but as to whether they'll leave the website up as a historic archive like Gawker was historiclally left or it will be taken down is anyone's guess. Hemiauchenia (talk) 22:19, 22 February 2024 (UTC)[reply]

dcist.com

WAMU has shutdown the DCist and if you go to the website, it shows a popup stating it will redirect you to "WAMU.org in 15 seconds" (Washington Post mentions the redirect). It looks like this redirect popup is occurring on both the homepage and all of the articles so DCist links should be marked as dead. Sariel Xilo (talk) 18:36, 23 February 2024 (UTC)[reply]

604 pages -- GreenC 19:46, 23 February 2024 (UTC)[reply]
Sariel Xilo, the domain was set permanent dead by IABot.org in 2017. IABot has gaps in coverage so I rechecked with WaybackMedic and it added about 200 archive links and changes to |url-status=. -- GreenC 20:31, 23 February 2024 (UTC)[reply]
Thanks! Let's hope that's it for a bit with news outlets dying. Sariel Xilo (talk) 20:34, 23 February 2024 (UTC)[reply]
Just a note that this site is now live again, for at least the next year, per WAMU reporting. 19h00s (talk) 22:00, 28 February 2024 (UTC)[reply]
19h00s & Sariel Xilo. This site is yo-yo. Shut down in 2017. Restored in 2018. Shut down in 2024. Restored in 2024. My bot has a feature "make live". I can do that. -- GreenC 23:27, 28 February 2024 (UTC)[reply]
 Done made live again. -- GreenC 01:49, 29 February 2024 (UTC)[reply]
Thank you! 19h00s (talk) 01:55, 29 February 2024 (UTC)[reply]

theherald.com.au

Formerly the main domain of

The Newcastle Herald, it now redirects to smh.com.au, which is a different newspaper. Most links can be resurrected by replacing theherald.com.au with newcastleherald.com.au. About 1500 links can be found in Special:LinkSearch/www.theherald.com.au
and almost all can be dealt with by just replacing the domain, and http can be updated to https.

Tim Starling (talk) 01:01, 24 February 2024 (UTC)[reply]

  •  Done edited nearly 1,000 articles. Migrated the links and/or |url-status=. -- GreenC 04:46, 24 February 2024 (UTC)[reply]

bibalex.org

Deprecate web archive provider http://web.archive.bibalex.org and http://web.petabox.bibalex.org (on-hold pending verification site is permanently down)-- GreenC 15:28, 3 March 2024 (UTC)[reply]

 Done around 250 pages converted to other archive providers or added {{dead link}} -- GreenC 15:00, 16 March 2024 (UTC)[reply]

bhu.ac.in

Erstwhile simple links such as bhu.ac.in/history has been replaced with complex and complicated urls such as bhu.ac.in/Site/Home/1_2_16_Main-Site with no fix pattern or co-relation. Therefore, it is requested that all bhu.ac.in (except those bhu.ac.in/Site/***) be marked dead and archived url be preferred for them. Thanks, Please feel free to ping/mention -- User4edits (T) 05:33, 16 March 2024 (UTC)[reply]

Also iitbhu.ac.in has a lot of dead links and redirects. Doing those same time. -- GreenC 17:00, 16 March 2024 (UTC)[reply]
 Done Checked 132 pages containing bhu.ac.in and iitbhu.ac.in - added 142 new archive URLs, 6 new {{dead link}} templates, and migrated 57 links to a new redirect location (mostly http->https). Everything else appeared to be working or previously archived. Search -- GreenC 17:48, 16 March 2024 (UTC)[reply]

finlex.fi

Finlex.fi URLs aren't dead but for some reason InternetArchiveBot keeps adding archived URLs for them. This was brought up at meta:User_talk:InternetArchiveBot#Finlex.fi_URLs_aren't_dead a month ago: Bot's edits: [10], [11], [12]. Some URLs it tagged as dead but are actually working: [13], [14], [15]. Those finlex.fi URLs that now have both a working URL and an archive URL should be tagged with the |url-status=live tag, and could someone try to tell IABot that Finlex is live? Thanks. 2001:14BA:9C94:9A00:E866:DADA:1085:E3D9 (talk) 09:28, 17 March 2024 (UTC)[reply]

Just noticed that this same issue is being discussed at fi.wikipedia: fi:Wikipedia:Kahvihuone_(tekniikka)#Botti_hakee_arkistosta_kumottuja_lakeja 2001:14BA:9C94:9A00:E866:DADA:1085:E3D9 (talk) 09:41, 17 March 2024 (UTC)[reply]
The site has a "Are you human?" check box (CloudFlare). This is causing the bot to think it's a dead site. I logged into iabot.org and changed the domain to "Subscription" status and that will cause the bot to avoid this domain, it won't set live or dead. My bot WaybackMedic has capabilities to bypass CloudFlare. I can try to process this domain and see what happens. My bot also has a feature "make live" ie. convert a citation from dead to live state. Unfortunately my bot only works on English Wikipedia. I'll let you know what happens. -- GreenC 15:13, 17 March 2024 (UTC)[reply]
Unfortunately, this site has maximum security enabled, none of my tools can get through. It started happening in late January 2024. I don't know what to do because no bot is able to determine if a link is live or dead. And no archive service such as WaybackMachine is able to archive a page. Only humans can get through, and they need to solve a captcha. It might be worthwhile waiting to see if they relax security in the future, since this is a recent development. -- GreenC 00:40, 19 March 2024 (UTC)[reply]

squashinfo.com

www.squashinfo.com is a standard reference in articles about squash players. The problem is that current links mostly have the form www.squashinfo.com/players/12345-playername. This leads to the alphabetical players list on squashinfo instead of to the individual player profile. The solution would be to change "players" to "player" without the s. I just did this for Hannah Chukwu where I changed the respective link from http://www.squashinfo.com/players/13679-hannah-chukwu to http://www.squashinfo.com/player/13679-hannah-chukwu. We currently have several hundred articles about squash players and most of them have squashinfo-links so this may rerquire a bot to fix. Proofreader (talk) 13:31, 17 March 2024 (UTC)[reply]

@Proofreader: Looks like about 472 articles. I can do this, and some other things up like conversion to https and check for dead links. Might be be a few days before I get to it. -- GreenC 15:21, 17 March 2024 (UTC)[reply]
Thanks a lot. --Proofreader (talk) 15:23, 17 March 2024 (UTC)[reply]

RateTheRef.net

The website RateTheRef.net seems to have been usurped by a Thai gambling site. I don't know how many pages this affects, or whether the old content has been archived, but I figured someone ought to be told. DavidKVT (talk) 21:21, 18 March 2024 (UTC)[reply]