Wikipedia:Administrators' noticeboard/CXT/Pages to review/Regex test 1

Source: Wikipedia, the free encyclopedia.

Testing regexes per WP:AN#X2-nuke interim period.

May 7 ver #779254187

As of version 779254187 of Wikipedia:Administrators' noticeboard/CXT/Pages to review (21:56, 7 May 2017) I count:

  • 738 <s> tags, 732 </s> tags, and 587 non-User space articles per this pattern:

Keepers

^\|\s*<s>\s*\[\[((?!User).*?)\s*\]\]</s>

What I learned

  • This pattern does not exclude
    Draft-space
    articles and (of course) cannot recognize articles that don't exist (red-links) but maybe there's a magic word the script can use for that?
  • Are we really keeping articles where the title is entirely in non-latin script (e.g.,
    三国
    )?
587 non-User space articles to keep from Pages to review:

Nukers

^\|\s*\[\[((?!User)(?!Draft).*?)\s*\]\]

What I learned

  • Doesn't yet exclude Template or Template talk spaces; doesn't exclude subpages (titles with slash in them).
  • Pattern above requires colons to avoid false positives like Draft board or User story.
  • 587 + 2785 = 3372, which is < 3602. There are a handful of malformed <s> tags, but not enough to account for the discrepancy.
2785 non-User/non-Draft space articles to nuke from Pages to review:

May 14 ver #780319854

Malformed strikeout test

As of version 780319854 of

WP:CXT/PTR
( 09:02, 14 May 2017‎) I count 43 malformed strikeouts, with an ending </s>-tag not immediately following the double close-brackets ( ]] ) of the linked article title.

These are items matching ^\|\s*<s>\[\[([^]]+)\]\](?!</s>) :
(the pattern ^\|\s*<s>\[\[([^]]+)\]\]\s*(?!</s>) would be more robust but wasn't used for this try):

@Mathglot: These should all be fixed now. Tazerdadog (talk) 10:35, 14 May 2017 (UTC)[reply]