Wikipedia:Deferred changes/Request for comment 2016

Source: Wikipedia, the free encyclopedia.
The following discussion is an archived record of a request for comment. Please do not modify it. No further edits should be made to this discussion. A summary of the conclusions reached follows.
Thank you to all who participated in this discussion. The consensus seems clear. All three proposals are successful.
  • There is consensus to allow edit filters to defer changes either passively or actively in accordance with the edit filter guideline. (Passive deferring places the edit on Special:PendingChanges for human review, but still presents it to readers immediately. Active deferring holds back the edit, showing readers instead the revision prior to the edit, similar to how pending changes protection currently works.)
  • Bots may, on approval, also defer edits passively, and bots with rollback rights may, on approval, also defer edits actively. A frequently cited example of a bot that would benefit from active deferred changes was ClueBot – the community expressed a lot of faith in it to catch vandalism, and deferred changes would allow the bot to catch edits it suspects may be vandalism, but isn’t quite sure enough to revert.
  • The ORES extension is authorized to defer edits both passively and actively under the condition that the thresholds for doing so are decided beforehand by consensus and are higher than what they are currently. Administrators may, at their discretion, increase the thresholds in the event of backlogs.
There were concerns in the threaded discussion sections about the backlog and about biting newcomers. To address the backlog, one suggested solution was to implement deferred changes cautiously and passively: start with a high passive threshold then slowly lower it until an optimal threshold is reached before allowing active deferring. When actively deferring changes, a friendly notification should be presented to the user who made the changes, carefully worded to avoid biting. There was some discussion about creating a separate queue for deferred changes (rather than Special:PendingChanges) and changing the standards for accepting them, but there is no consensus to deviate from the way pending changes are currently reviewed. Respectfully, Mz7 (talk) 23:04, 11 November 2016 (UTC)[reply]

This request for comment concerns deferred changes, a way to defer for review edits by users who aren't autoconfirmed when they match certain edit filters, are picked up by a bot (e.g. User:ClueBot NG) as warranting attention, or are considered damaging by ORES. Until reviewed, the revision displayed to readers can be chosen to be either the latest revision as usual ('passive' defer) or the revision prior to the user edits[1] ('active' defer). Deferred edits appear at Special:PendingChanges. Should we request implementation of deferred changes? Specifically, should we allow it for the edit filter, bots and ORES, both passively and actively? 08:12, 14 October 2016 (UTC)

  1. ^ The same revision that rollback would revert to.

Edit filter

Should we allow edit filters to defer edits passively and actively? Use would be governed by the edit filter guideline.

Support (Edit filter)

  1. As proposer. Per rationale at Wikipedia:Deferred changes. Edit filter managers can be trusted to use the new actions adequately. Cenarium (talk) 09:09, 14 October 2016 (UTC)[reply]
  2. Edit filters already flag edits for review, but there is no centralized location for follow-up. (It's also not particularly obvious where to look.) The rationale for active deferral is also sound, the simple type of heuristics available in the edit filter are often prone to false positives -- this change will increase the preventative power of such filters. MER-C 03:35, 15 October 2016 (UTC)[reply]
  3. This seems like the most logical use for deferred changes and would be an improvement over the existing tagging system. Kaldari (talk) 22:36, 15 October 2016 (UTC)[reply]
  4. I support deferring edits as it prevents the visibility of likely vandalism. It doesn't stop helpful edits. Chris Troutman (talk) 23:11, 15 October 2016 (UTC)[reply]
  5. Sounds like a good idea and a reasonable use for this system. Enterprisey (talk!) 05:44, 16 October 2016 (UTC)[reply]
  6. I remember seeing a post about this at
    WP:EFN and thought it would be a good idea, and I still believe it will be. -- The Voidwalker Whispers 23:58, 16 October 2016 (UTC)[reply
    ]
  7. I think this is a good idea as proposed. Prevention is important in stopping the vandalism. --
    talk 05:35, 17 October 2016 (UTC)[reply
    ]
  8. Makes sense, but see comments below. Ivanvector (Talk/Edits) 17:49, 17 October 2016 (UTC)[reply]
  9. This would definitely help in identifying and removing potentially damaging edits, with minimal impact to the site. Gluons12 | 02:44, 18 October 2016 (UTC).[reply]
  10. I made a similar post at EFN, glad to see it got off the ground. Iazyges Consermonor Opus meum 04:58, 18 October 2016 (UTC)[reply]
  11. Arguments made here seem compelling. Carrite (talk) 03:33, 20 October 2016 (UTC)[reply]
  12. Sounds like a very good idea to me - I would go for 'active' defer or possibly 'active' defer for new users and 'passive' defer for confirmed users if possible. KylieTastic (talk) 22:02, 20 October 2016 (UTC)[reply]
  13. Support per above --TerraCodes (talk to me) 22:26, 20 October 2016 (UTC)[reply]
  14. If edit filters can disallow, then edit filters can defer. Esquivalience (talk) 02:13, 21 October 2016 (UTC)[reply]
  15. Well, it's better than just denying the edit, which is annoying especially if an anonymous user tries to add something constructive and it's disallowed. epicgenius - (talk) 12:39, 21 October 2016 (UTC)[reply]
  16. Very good idea especially for the bots. KGirlTrucker81 huh? what I'm been doing 01:42, 22 October 2016 (UTC)[reply]
  17. Excellent scheme, less bitey than bot reversion or edit filter disallow. Guy (Help!) 22:42, 22 October 2016 (UTC)[reply]
  18. Support per above, a little extra reviewing couldn't hurt. —
    talk | contribs) 00:10, 23 October 2016 (UTC)[reply
    ]
  19. Kevin (aka L235 · t · c) 00:07, 24 October 2016 (UTC)[reply]
  20. This would help Wikipedia eliminate a higher amount of damaging edits from the public eye. I like the bit about "passive" versus "active" deferrals.
    (talk) 01:04, 24 October 2016 (UTC)[reply
    ]
  21. Something that would clearly be good for Wikipedia for both its content and reputation as all of the schools I've been to have usually criticised the site as having lots of spam.
    Talk) 08:44, 24 October 2016 (UTC)[reply
    ]
  22. Definitely support - has the potential to increase edit quality across the site. A lot of bad edits fall through the cracks, this could help with that. [
    talktomeididit] 22:42, 24 October 2016 (UTC)[reply
    ]
  23. Support per MER-C. Gestrid (talk) 15:49, 25 October 2016 (UTC)[reply]
  24. I have seen edits tagged as possible vandalism by edit filters that were only reverted after several hours. Deferred changes would make those edits be reverted more quickly and be less visible. Gulumeemee (talk) 09:03, 28 October 2016 (UTC)[reply]
  25. Support a reasonable proposal, I'm assuming this will involve an update to the Special:AbuseFilter form in the "Actions to take when matched" box with an additional check-box. edit: saw Wikipedia:Deferred changes/Implementation — Andy W. (talk) 17:25, 1 November 2016 (UTC) 17:29, 1 November 2016 (UTC)[reply]
  26. I think it's clear that many potentially embarrassing "joke" type edits often go unnoticed and sit, sometimes for months, before they're caught. Our detection tools/algorithms have grown sophisticated and accurate enough that I think it's good idea to give this 'active deferral' a try. I'm already selectively picking and choosing which edits to patrol in the recent changes feed anyway. If I can get a prepped list of most likely malicious edits, it would greatly increase catch time and efficiency. -- œ 07:31, 2 November 2016 (UTC)[reply]
  27. Totally support. I'm excited by how this could radically reduce casual vandalism and spamming in articles. This takes away the "fun" of seeing one's vandalism in lights. Gradually, folks will realize this site is no place to get their kicks, but instead, a serious encyclopedia.
    Talk • Work 19:00, 5 November 2016 (UTC)[reply
    ]
  28. This is an excellent idea. I expect this combination of automation and human review to be a powerful and efficient tool. Ozob (talk) 00:51, 6 November 2016 (UTC)[reply]
  29. This may broaden the amount of room we have to handle questionable-but-not-vandalistic edits. Jo-Jo Eumerus (talk, contributions) 09:45, 6 November 2016 (UTC)[reply]
  30. This is a very good idea. --Tryptofish (talk) 23:38, 7 November 2016 (UTC)[reply]
  31. Strongly support. ~ Rob13Talk 23:51, 7 November 2016 (UTC)[reply]
  32. This sounds very useful in combating likely disruption. This sounds flexible and doesn't overly penalize editors based on their newness, as the worst that can happen is similar to pending changes. NinjaRobotPirate (talk) 04:35, 8 November 2016 (UTC)[reply]
  33. Support: This seems like it would be able to prevent the visibility of likely vandalism. It wouldn't stop helpful edits. - tucoxn\talk 13:26, 11 November 2016 (UTC)[reply]

Oppose (Edit filter)

Discussion (Edit filter)

  • Unless I'm entirely mistaken, details about edit filters are hidden from non-administrators, other than the most basic info (just the edit filter number, I think, I'm not even sure if we can see when an edit is flagged with a filter). Is there some kind of risk of revealing sensitive information to non-admins by doing this? (I assume that there's not, and that the hiding of this information is just a legacy we-can't-trust-anyone sort of thing). Ivanvector (Talk/Edits) 17:51, 17 October 2016 (UTC)[reply]
    @Ivanvector: Actually, most edit filters are publicly visible, but a great many of them are not. The only thing sensitive about edit filters is their rules, which would not be revealed by the filter taking action. The only thing revealed when a private filter makes an action is the description of the filter. -- The Voidwalker Whispers 21:26, 17 October 2016 (UTC)[reply]
    The only filters that are private, AFAICR, are those which are targeted against long term abuse. These are generally, but not always, "deny". All the best: Rich Farmbrough, 17:49, 20 October 2016 (UTC).[reply]

Bots

Should we allow bots to defer edits passively, and bots with rollback rights actively? Each use would require bot approval.

Support (Bots)

  1. As proposer. Per rationale at Wikipedia:Deferred changes. I except that Cluebot would be able to catch many vandalism edits that are otherwise missed. Cenarium (talk) 09:18, 14 October 2016 (UTC)[reply]
  2. As soon as we don't have 100 bots sending an uncountable number of edits to the pending changes backlog, of course. Esquivalience (talk) 01:57, 15 October 2016 (UTC)[reply]
  3. The rationale given at Wikipedia:Deferred changes is sound. Active deferral can also serve as a deterrence mechanism, and can be used in place of the "don't revert twice" model used by some anti-vandalism bots. MER-C 03:32, 15 October 2016 (UTC)[reply]
  4. I support deferring edits as it prevents the visibility of likely vandalism. It doesn't stop helpful edits. Chris Troutman (talk) 23:11, 15 October 2016 (UTC)[reply]
  5. I support this motion as well. The rationale given at
    talk 05:35, 17 October 2016 (UTC)[reply
    ]
  6. Support per answer to my question below. Seems like another good tool for the chest, and if the bots are going to continue reverting obvious vandalism as they have been, then I'm not concerned about extra reviewing work. Ivanvector (Talk/Edits) 03:07, 18 October 2016 (UTC)[reply]
  7. I support for Cluebot as it's proven on accuracy, but I think each new bot should have to go though a trial period of non-active flagging and review before being accepted, and a relatively easy way to challenge and shut down. Also Cluebot could start as 'active' defer (or both 'active' and 'passive' at different confidence levels), but other bots should be 'passive' until proven. Cheers KylieTastic (talk) 22:09, 20 October 2016 (UTC)[reply]
  8. Support what KylieTastic said. --TerraCodes (talk to me) 22:28, 20 October 2016 (UTC)[reply]
  9. Again, it's better than just rejecting a constructive anonymous edit. Also, we have implemented this in some way for vandalism changes: if Cluebot does not have high confidence that something is vandalism, it goes to a semi-automatic program (like STiki or Huggle) where human editors can review. epicgenius - (talk) 12:41, 21 October 2016 (UTC)[reply]
  10. Support per my comments above. KGirlTrucker81 huh? what I'm been doing 01:44, 22 October 2016 (UTC)[reply]
  11. Support, I think. Guy (Help!) 22:43, 22 October 2016 (UTC)[reply]
  12. Support per above, a little extra reviewing couldn't hurt. —
    talk | contribs) 00:10, 23 October 2016 (UTC)[reply
    ]
  13. Kevin (aka L235 · t · c) 00:07, 24 October 2016 (UTC)[reply]
  14. Absolutely - I've been impressed with the accuracy of bots like Cluebot NG, and the extra edit review seems like a good idea. [
    talktomeididit] 22:42, 24 October 2016 (UTC)[reply
    ]
  15. Support, as ClueBot NG has already proven time and again that it knows what vandalism looks like. Unlike ORES (which I oppose below), it, from what I've seen, hardly ever has a false positive. In the months that I've been active on Wikipedia, I've only ever seen one false positive by ClueBot NG. Gestrid (talk) 15:47, 25 October 2016 (UTC)[reply]
  16. Support, I can see this working. -- The Voidwalker Whispers 22:29, 25 October 2016 (UTC)[reply]
  17. Support the option for bots to choose to defer edits for review. — Andy W. (talk) 17:25, 1 November 2016 (UTC)[reply]
  18. Support per KylieTastic.
    Talk • Work 19:20, 5 November 2016 (UTC)[reply
    ]
  19. Support. Even the best bots are obliged to be overcautious. Allowing them to defer edits passively or actively provides another layer of protection. Ozob (talk) 00:51, 6 November 2016 (UTC)[reply]
  20. Mostly per rationale provided in the previous section. Jo-Jo Eumerus (talk, contributions) 09:45, 6 November 2016 (UTC)[reply]
  21. Support, with the caveats stated by KylieTastic. --Tryptofish (talk) 23:39, 7 November 2016 (UTC)[reply]
  22. Strongly support. ~ Rob13Talk 23:51, 7 November 2016 (UTC)[reply]
  23. Support, I agree with KylieTastic that Cluebot has proven its accuracy. It should be the only bot initially approved for active defer. Each new bot should complete a trial period of non-active flagging and be reviewed before being accepted for active defer. - tucoxn\talk 13:24, 11 November 2016 (UTC)[reply]

Oppose (Bots)

Discussion (Bots)

Whatever is decided here, I think it is important to make sure that edits deferred by a bot with a neural network/machine learning setup (i.e. CluebotNG) which are later reviewed by a human are automatically added to Cluebot's dataset. I don't know how technically difficult this is to accomplish, but it seems worthwhile if it is reasonably doable.

I'd also caution whoever first implements this to go very slowly at first. Creating a massive backlog of edits that will never be reviewed is not the goal. Tazerdadog (talk) 08:49, 16 October 2016 (UTC)[reply]

  • Assuming that a bot like ClueBotNG would continue to operate as it does now (reverting edits that meet its vandalism scores, or however it works) and only defer edits it is less sure about? Ivanvector (Talk/Edits) 17:56, 17 October 2016 (UTC)[reply]

ORES

Should we allow ORES to defer edits passively and actively? Thresholds would be determined by consensus initially, and administrators may set higher thresholds in case of backlogs.

Support (ORES)

  1. As long as the thresholds can be specified onwiki. ORES is still in beta and the false positive ratio is quite high for the current threshold, see this for the kind of edits it picks up. We would need a much higher threshold. Cenarium (talk) 09:35, 14 October 2016 (UTC)[reply]
  2. I support deferring edits as it prevents the visibility of likely vandalism. It doesn't stop helpful edits. Chris Troutman (talk) 23:11, 15 October 2016 (UTC)[reply]
  3. weak support as not sure on accuracy - however as per comments in discussion below it's probably just about getting the correct settings/levels. So support with a careful monitored extended trial period, and a way it can be shutdown/paused if causing issues. Cheers KylieTastic (talk) 22:13, 20 October 2016 (UTC)[reply]
  4. Support per Halfak's comment in the discussion section and the above. --TerraCodes (talk to me) 22:34, 20 October 2016 (UTC)[reply]
  5. Weak support because it is useful, but misses certain cases that are more likely to be vandalism. I'll have to try more ORES, but this would be a weak support from me based on what I've seen so far. epicgenius - (talk) 21:10, 21 October 2016 (UTC)[reply]
  6. Quite honestly, I'm not sure why this is even being debated. ORES's accuracy can be adjusted; heck, ORES's API can be used by flagged bots. This doesn't mandate use; it only allows use – and provides a native on-wiki queue instead of manual review of automated reverts or ad-hoc IRC scoring solutions, like we have now. Kevin (aka L235 · t · c) 00:07, 24 October 2016 (UTC)[reply]
  7. Support, only at a very high threshold. See comments below. Kaldari (talk) 02:54, 26 October 2016 (UTC)[reply]
  8. Provided that the threshold is higher. Esquivalience (talk) 00:59, 3 November 2016 (UTC)[reply]
  9. Support per Halfak (WMF)'s explanation below. Gestrid (talk) 19:30, 5 November 2016 (UTC)[reply]
  10. Support but with others' concerns taken into account.
    Talk • Work 20:11, 5 November 2016 (UTC)[reply
    ]
  11. Support. Systems like ORES can be calibrated to different levels of sensitivity. If it's overly aggressive, it can simply be set to a different threshold. Ozob (talk) 00:51, 6 November 2016 (UTC)[reply]
  12. Support, if ORES scoring is set to be sufficiently selective enough to not clog the queue. -- AntiCompositeNumber (Leave a message) 20:06, 6 November 2016 (UTC)[reply]

Oppose (ORES)

Until the vandalism model of ORES improves in accuracy. Currently, it is pretty inaccurate, with a high false positive rate even on its "High" setting. Also, I believe that the model doesn't factor in linguistic features, which is fatal to its accuracy. I believe that until more diffs are included into the machine learning model (perhaps the ones in CBNG), we should hold off on including ORES. Esquivalience (talk) 01:45, 15 October 2016 (UTC)[reply]
The high setting actually flags more edits than the low setting, so will have more false positives. It is high in sensitivity but relatively low in specificity. The setting for deferral would be kept very high in specificity (to avoid false positives) and would have a relatively low sensitivity. Cenarium (talk) 13:10, 20 October 2016 (UTC)[reply]
Well said. Thank you. --Halfak (WMF) (talk) 20:27, 20 October 2016 (UTC)[reply]
ORES is still under active development and is not yet as accurate as bots like ClueBot NG. Deferring edits based on ORES would be premature, IMO. Kaldari (talk) 22:34, 15 October 2016 (UTC)[reply]
If we set our threshold at the level of ClueBot, we would be just as, if not more, accurate. --Halfak (WMF) (talk) 18:12, 19 October 2016 (UTC)[reply]
@Halfak (WMF): I didn't think about that. What threshold would you propose in order to be "at the level of ClueBot"? Kaldari (talk) 19:42, 21 October 2016 (UTC)[reply]
The ORES service reports a threshold for minimizing false-positives in it's test statistics. E.g. https://ores.wikimedia.org/v2/scores/enwiki/damaging?model_info shows "threshold": 0.959 for recall_at_fpr(max_fpr=0.1). If we wanted to run this as a trial, I'd start there and adjust based on what we learn. --Halfak (WMF) (talk) 15:01, 24 October 2016 (UTC)[reply]
@Halfak (WMF): Thanks for the info. I switched my comments to a support. Kaldari (talk) 02:54, 26 October 2016 (UTC)[reply]
ORES isn't quite accurate in my view for active deferral. It might be selective enough for passive, but that would probably fill the queue unnecessarily. AntiCompositeNumber (Leave a message) 14:15, 16 October 2016‎ (UTC)[reply]
Changing to Support AntiCompositeNumber (Leave a message) 20:06, 6 November 2016 (UTC)[reply]
  1. Oppose per points raised above, this may create too much of an unnecessary workload. —
    talk | contribs) 00:11, 23 October 2016 (UTC)[reply
    ]
    Until ORES is better at predicting vandalism, I wouldn't trust it with something like this quite yet. It has the potential to flood Pending Changes with way too many false positives until its been more finely tuned. Gestrid (talk) 15:42, 25 October 2016 (UTC)[reply]
    Changing !vote. Gestrid (talk) 19:27, 5 November 2016 (UTC)[reply]
  2. Oppose because ORES is still at "release status: beta" according to its page on Meta. After the beta period is over, I would be able to change my position here. - tucoxn\talk 13:18, 11 November 2016 (UTC)[reply]
    Also, this sub-section doesn't seem to be counting correctly. - tucoxn\talk 13:20, 11 November 2016 (UTC)[reply]
    Fixed numbering -- AntiCompositeNumber (Leave a message) 14:01, 11 November 2016 (UTC)[reply]

Discussion (ORES)

There seems to be come confusion about m:ORES' accuracy level. ClueBot NG only reverts a small amount of vandalism because it's thresholds are set extremely strictly. Thresholds are set far less strictly in the ORES Review Tool because it is intended to benefit from human review and the goal in patrolling is to get all of the vandalism -- including the majority of damaging edits that ClueBot NG misses. The nice thing about ORES is that it allows external developers the possibility of tuning the threshold to their needs. --Halfak (WMF) (talk) 20:12, 19 October 2016 (UTC)[reply]

I don't understand what ORES is. What does it do? How would it help compared to regular recent changes or m:RTRC? epicgenius - (talk) 12:43, 21 October 2016 (UTC)[reply]

Hi epicgenius. See m:ORES for an overview. ORES is a machine classification service. It can flag edits that are likely damaging/vandalism for review. The m:ORES review tool surfaces predictions on Special:RecentChanges and a few other places. The review tool is installed as a beta feature on English Wikipedia. You can enable it in your your preferences. The ORES service, however, provides scores that can be used on all sorts of interesting ways (e.g. User:DataflowBot/output/Popular_low_quality_articles_(id-2). Quality control is just one of the use-cases of the system. --Halfak (WMF) (talk) 14:52, 21 October 2016 (UTC)[reply]
@Halfak (WMF): Thank you for the explanation. So, are the classifications automatic from the beginning, or do humans classify the edits in order for the machine to provide suggestions? epicgenius - (talk) 15:42, 21 October 2016 (UTC)[reply]
epicgenius, see m:ORES/What and Wikipedia:Labels/Edit quality. If you want more substantial discussion, see this presentation I gave about the system at the Berkman Center.--Halfak (WMF) (talk) 15:49, 21 October 2016 (UTC)[reply]
Thanks. Anyway, I'll be sure to try it out. I'll respond with a support or oppose later. epicgenius - (talk) 15:50, 21 October 2016 (UTC)[reply]

Given that there is some dispute about whether ORES is sufficiently specific for this task, with the counter-argument that it can be tuned to decrease positive results, would it be possible for the people behind ORES to provide a specific list of example edits which would be matched (e.g. from some contiguous sample of 10000 edits on enwiki)? ⁓ Hello71 15:35, 25 October 2016 (UTC)[reply]

Actually, now that I think about it, given that the edit filter proposal seems overwhelmingly popular, would it not be possible to feed ORES scores into AbuseFilter so that EFMs could tune it based also on other criteria (e.g. user is not confirmed or whatever)? ⁓ Hello71 15:38, 25 October 2016 (UTC)[reply]

Maybe not such a great idea, since it was pointed out that ORES scoring is slow. Maybe it would be worth it for this to split edit filters into immediate and queued types based on the actions taken, but that's a whole different can of worms. ⁓ Hello71 15:54, 25 October 2016 (UTC)[reply]

General discussion

References

  1. ^ Geiger, R. S., & Halfaker, A. (2013, August). When the levee breaks: without bots, what happens to Wikipedia's quality control processes?. In Proceedings of the 9th International Symposium on Open Collaboration (p. 6). ACM. http://grouplens.org/site-content/uploads/2013/09/geiger13levee-preprint.pdf
  2. ^ https://gerrit.wikimedia.org/r/#/c/218104/22/backend/FlaggedRevs.hooks.php
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.