User talk:GreenC/WaybackMedic 2.1

fixdatemismatch and timezones

If I archive a URL, I record the archivedate (defined as "Date when the original URL was archived") as the date I made the archive in my time zone (which usually matches the accessdate). However, due to time zone differences, the archiveurl does not always use my date but may use the previous day's date. The bot should not change my archivedates in that scenario, only if the dates differ by more than can be accounted for by timezone. To do otherwise misrepresents the timeline of events. This diff is an example of the behaviour I don't think is correct. Kerry (talk) 13:03, 19 April 2018 (UTC)[reply]

His @Kerry Raymond:. Looking at the Archive URL it is "https://web.archive.org/web/20171022232630/http:.." the snapshot date is 20171022232630 .. this corresponds to 2017-10-22 at 23:26:30 .. thus |archivedate=2017-10-22. The |archivedate= is the date it exists at the archive service, not the date you added it. -- GreenC 13:16, 19 April 2018 (UTC)[reply]

That's not what it appears to say in Template:Cite web where it says ""Date when the original URL was archived". For a source like a news report released that day, we could have an archivedate the day before the source was created, which seems nonsensical Kerry (talk) 13:21, 19 April 2018 (UTC)[reply]

I agree the wording is ambiguous. But it's how it works and always worked, archivedate reflects the date recorded at the webarchive service. Not only my bot but every bot since forever has done it this way. There's no reason to record when the editor created the archive (information more often than not unknown since the archive was created by someone else); the reader only needs to know the webarchive date so they know how to retrieve it, the only purpose of having a |archivedate=. If you want to discuss further please post at Help talk:Citation Style 1 which is the main discussion forum for the cite templates. -- GreenC 13:39, 19 April 2018 (UTC)[reply]

Follow-up discussion. -- GreenC 15:26, 19 April 2018 (UTC)[reply]

[Question (mostly) about] the use of [URLs w/the domain name] "archive dot is"

Please forgive me if I am just ignorant of what the latest status is, on some issue that was maybe controversial (or something) in the past, but maybe it has changed in some way, by now.

Maybe the issues ["if any"] about the use -- [on Wikipedia, at least] -- of the [domain name] << "archive dot is" >> have been resolved, or maybe they have 'evolved', or something. I think that at one time there was something controversial about it ... but even if that is correct, maybe "not so much" today, since ... maybe things have changed, by now.

I noticed that, right after one robot (User:InternetArchiveBot) added the oldest section (and iirc the only section now existing), to the "Talk:" page [at] Talk:Jewish_News_of_Greater_Phoenix -- it also made a corresponding edit, to the article Jewish_News_of_Greater_Phoenix ... and that edit was the one that inserted a "{{dead link|date=April 2017 |bot=InternetArchiveBot |fix-attempted=yes }}" tag, for a certain footnote.

Then I noticed that: the very next edit, [to that article] -- an edit made by

User:Green_Cardamom/WaybackMedic_2.1 apparently! -- was: this one

, ... which got rid of the "dead link" tag, and instead implemented a solution using an "archiveurl" field.

The field value put into that "archiveurl" field seems to have been https://archive.is/20130126234119/http://www.jewishaz.com/issues/printstory.mv?080118+decades ... and, as far as I know, I think it is still there.

Is it OK now, to use URLs like that? (ones with [the domain name] "archive dot is") -- ? --

Just wondering.

I never felt like I understood completely (100%) what the big deal was (if any) ... if it ever was a big deal (was it?) ... about using URLs containing that domain name, on Wikipedia.

Any comments?

I realize that this topic might be tangential (or even 100% OFF-TOPIC) relative to the main purpose of this "Talk:" page.

Thanks for your patience. --Mike Schwartz (talk) 19:20, 22 February 2019 (UTC)[reply]

Hello Mike Schwartz. Well, there was a time when archive.is was banned but that was later lifted. It's now a respected site. It does have a soft-404 problem (reports archive available when the page is really a 404 or something otherwise unusable). The soft404 rate is over 50%. So my bot specializes in filtering those, plus a final manual check. That is why IABot does not use it because it can't be fully automated. -- GreenC 21:19, 22 February 2019 (UTC)[reply]

OK. Thank you for that kind (and helpful) -- and prompt! -- reply.

BTW, I agree that sometimes it is (or ... it could be) beyond the capability of a some robot, -- (unless it is a VERY smart robot!) -- to determine whether a given web page, that can be "fetched" using a certain URL, (a) should count as basically ... a web page that is (virtually) "just as useless" as some kind of a "404", [e.g. for purposes of serving as the target of an "archiveurl" field of e.g. a "cite web" template instance, in a footnote ... where the goal is, to try to convey the content ... that the "original" web page USED to contain, "as of" a certain date in the past] ... vs. whether (b) it is useful enough, that it is NOT "similar to" /slash "in the same category as" ... a "

dead link

" -type situation.

IMHO

, [I think that] you have just answered my question; so ... I intend to consider this matter (the question that was my original reason for creating this section of this "Talk:" page) to be in the status of "CASE CLOSED.

Thank you. --Mike Schwartz (talk) 07:56, 24 February 2019 (UTC)[reply]