User talk:MBH
A cookie for you!
As a token of appreciation for this edit. vvvt 17:22, 6 June 2012 (UTC) |
WebCite
Another idea. Many links on Wikipedia have been archived by archive.org and archive.today .. check other archives and compare page encoding with WebCite to see if they match. This is not a perfect method because the other archives might not exist, or be a soft-404, or a different snapshot date with different content. But I think it should resolve most of them as there is good coverage.
Here is a script I wrote to access the Wayback API to see if a URL is available. On Toolforge, copy-paste the below to "api.awk" it should work (also 'chmod 755 api.awk')
Extended content
|
---|
#!/usr/bin/gawk -bE # The MIT License (MIT) # # Copyright (c) 2016-2018 by User:GreenC (at en.wikipedia.org) # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. @include "/data/project/botwikiawk/BotWikiAwk/lib/library.awk" BEGIN { Optind = Opterr = 1 while ((C = getopt(ARGC, ARGV, "pu:t:c:")) != -1) { opts++ if(C == "u") # -u <url> URL to check. url = verifyval(Optarg) if(C == "t") # -t <timestamp> (optional) Timestamp. Default: "20070101" timestamp = verifyval(Optarg) if(C == "c") # -c <closest> (optional) Closest: before|after|either - Default: before closest = verifyval(Optarg) if(C == "p") # -p (optional) Print API command not results. showapi = 1 if(C == "h") { usage() exit } } if(closest !~ /before|after|either/) closest = "before" if(!isanumber(timestamp) || timestamp == "") timestamp = "20070101" if( url ~ /error/ || ! opts || url == ""){ usage() exit } url = urlencodeawk(url) command = "wget --header=" shquote("Wayback-Api-Version: 2") " --post-data=" shquote("url=" url "&closest=" closest "&statuscodes=200&statuscodes=203&statuscodes=206&statuscodes=403&tag=×tamp=" timestamp) " -q -O- " shquote("http://archive.org/wayback/available") if(showapi) print command else print sys2var(command) } # # verifyval - verify any command-line argument has valid value. Usage in getopt() # function verifyval(val) { if(val == "" || substr(val,1,1) ~/^[-]/) { stdErr("Command line argument has an empty value when it should have something.") exit } return val } function usage() { print "" print "API - show Wayback API 2 results for a single URL" print "" print " Usage : api -u <url>" print "" print " Options:" print " -c <closest> - before|after|either - default: before" print " -t <timestamp> - default: 20070101" print " -p - print the API URL instead of results" print "" } |
If you decide to try archive.today it has problems. The API to see if a page is available is http://archive.today/timemap/<url>
The url portion should not be percent encoded except for "#" which should be encoded to "%23". If the returned html contains www.henley-putnam.edu/Portals/_default/Skins/henley/images/loading.gif
there was an error retrieving the page. If the content contains "DDoS protection by Cloudflare" it is being blocked due to rapid queries - the block wears off after a few hours but it can be a problem.
-- GreenC 16:02, 27 October 2019 (UTC)
- I have already implemented win1251 checking using https://github.com/jstedfast/Portable.Text.Encoding library, that can convert byte stream into win1251. I will update my algorithm and start checking your list soon. MBH (talk) 16:18, 27 October 2019 (UTC)
Test page
Hello, you have created
- I made this page to resolve a dispute with another user about "can any registered user create an article in enwiki"? Turned out that I'm autoconfirmed, but that user isn't. MBH (talk) 18:58, 9 December 2019 (UTC)
Question concerning the Russian Wikipedia
Привет! Мне кажется, что Вы в русской Википедии говорили, что Ваш бот патрулирует правки, если вдруг при патрулировании произошёл сбой. А у меня сейчас висят несколько редиректов, никто их не патрулирует. (Редиректы на гимнасток, см. мои вчерашние правки.) (Пишу здесь, так как в том разделе не планирую иметь никаких дел. Сейчас просто решил перевести всё-таки туда кое-какие свои статьи и кое-что ещё для комплекта. А постоянно идут сбои патрулирования.) --Moscow Connection (talk) 07:35, 14 March 2020 (UTC)
- Moscow Connection я не особо понял, что вы от меня хотите. Приведите ссылки на данные статьи/правки. MBH (talk) 07:49, 14 March 2020 (UTC)
- Может, я перепутал? Мне кажется, что в русской Википедии пару месяцев назад было обсуждение недавно возникшей технической проблемы, а именно частых сбоев при автоматическом патрулировании правок (авто)патрулируемых участников. И Вы сказали: "Ничего страшного, мой бот каждый день проходит и такие правки патрулирует."
Вот примеры: 1, 2.
Извините, если перепутал Вас с кем-то. А если нет, то, может, настроите бота и редиректы обходить? --Moscow Connection (talk) 08:08, 14 March 2020 (UTC)
- Может, я перепутал? Мне кажется, что в русской Википедии пару месяцев назад было обсуждение недавно возникшей технической проблемы, а именно частых сбоев при автоматическом патрулировании правок (авто)патрулируемых участников. И Вы сказали: "Ничего страшного, мой бот каждый день проходит и такие правки патрулирует."
Not a bad joke
About this, just saying that birds are dinosaurs, not separate animals that are descendents of dinosaurs. In cladistics, all animals descended from a clade belong to that clade; thus, humans are bony fish and birds are dinosaurs. You are right to revert it, however, because COMMONNAME takes precedence. Wilhelm Tell DCCXLVI converse | fings wot i hav dun 16:10, 23 June 2021 (UTC)
- In cladistics, but not in common language (and will not in future, I believe). MBH (talk) 16:14, 23 June 2021 (UTC)
Speedy deletion nomination of Category:Moldovan centenarians
A tag has been placed on
If you think this page should not be deleted for this reason you may contest the nomination by visiting the page and removing the speedy deletion tag. Liz Read! Talk! 19:32, 16 October 2023 (UTC)
ArbCom 2023 Elections voter message
Hello! Voting in the
The
If you wish to participate in the 2023 election, please review