Wikipedia:Bot owners' noticeboard Information & Wikipedia:Bot owners' noticeboard Links at HealthHaven.com
advertise
add site
services
publishers
database
health videos
Bookmark and Share

search wiki for    ?
web dir firms image gallery news pdf wiki shop video 
about
toolbar
stats
live show
health store
more stuff
JOIN/LOGIN
Shortcuts:
WP:BOWN
WP:BONB
WP:BON

This is a message board for coordinating and discussing bot-related issues on Wikipedia (also including other programs interacting with the mediawiki software). Although its target audience is bot owners, any user is welcome to leave a message or join the discussion here.

This is not the place for requests for bot approvals or requesting that tasks be done by a bot. It is also not the place for general questions about the mediawiki software (such as the use of templates, etc.), which have generally a best chance of being answered at WP:VPT.



Archives


[edit] Compliance problems?

I recently used the c# regex code from Template:Bots, but noticed that it gave a TRUE value for a page that contained {{bots|allow=SineBot}} (when my bot is ChzzBot).

I was doing a little independent check, and skipping pages with "{{bot" or "{{nobot" entirely in any case, to process those by hand later.

If my understanding of the compliance is correct, this should actually have been FALSE, ie the bot was not specifically permitted.

I'm not good at regex, hence I recoded it in what might be a bit of a long-winded fashion, which (for a month) you could check in this pastebin.

If someone thinks it's worth changing to that code on the page, suitably tidied up, then feel free.

Cheers,  Chzz  ►  03:32, 16 October 2009 (UTC)

Yes, only the Perl code on that page really handles all the nuances of the definition. I note that your pastebin code has a few issues:
  • It looks for {{nobots|deny=...}} rather than {{bots|deny=...}}
  • It will permit BarBot on {{bots|allow=FooBarBot}} or {{bots|allow=Bar Bot}}.
  • It doesn't implement optout at all.
  • I think it will return "not permitted" for {{bots}} and permitted for {{nobots}}, which is the opposite of the correct behavior.
  • It will also have trouble if the template contains linebreaks in the allow or deny list for some reason, since you forgot the RegexOptions.Singleline to allow ".*" to match newlines.
Also, BTW, the templates are {{bots}} and {{nobots}}; {{bot}} is something different, and {{nobot}} intentionally doesn't exist (because it existing would fool users into thinking it actually worked). Anomie 04:08, 16 October 2009 (UTC)
I don't claim to be amazing at regular expressions (only learnt it recently). But I had a go making a C# RegEx for this. So how's this code:
 bool DoesAllowBots(string botusername, string pagetext) {     if (Regex.IsMatch(pagetext, "\\{\\{(\\s|)(bots|nobots)(\\s|)(\\|((.)+|)(allow)(\\s|)(=)((.)+|)(" + botusername + "))"))         return true;     return !Regex.IsMatch(pagetext, "\\{\\{(\\s|)(bots|nobots)(\\s|)(\\|((.)+|)((optout|deny)(\\s|)(=)(\\s|)(all)|(optout|deny)(\\s|)(=)((.)+|)(" + botusername + ")|(allow(\\s|)=(\\s|)none))|\\}\\})"); } 
It works okay for me, but please point out any errors :). Anybody mind if I replace the C# code on Template:Bots with this? - Kingpin13 (talk) 09:36, 16 October 2009 (UTC)
Better to use the code from AWB: Rjwilmsi 11:35, 16 October 2009 (UTC)
 /// <summary> /// checks if a user is allowed to edit this article /// using bots and nobots tags /// </summary> /// <param name="articleText">The wiki text of the article.</param> /// <param name="user">Name of this user</param> /// <returns>true if you can edit, false otherwise</returns> public static bool CheckNoBots(string articleText, string user) {     return         !Regex.IsMatch(articleText,                      @"\{\{(nobots|bots\|(allow=none|deny=(?!none).*(" + user.Normalize() +                      @"|awb|all)|optout=all))\}\}", RegexOptions.IgnoreCase); } 
Maybe, but my code allows spaces in all the likely places, it allows new lines between the different bots which are disallowed, and it will also let the bot through if the page has {{bots|deny=all|allow=BotName}}. IMO, my code is better :) - Kingpin13 (talk) 12:58, 16 October 2009 (UTC)
  • While all this code talk is nice, getting back to the original post, simply " {{bots|allow=SineBot}} " is not enough to deny any bots anyway. Since all bots are allowed by default, explicitly allowing one with this template doesn't seem to opt out anything. –xenotalk 13:12, 16 October 2009 (UTC)
    Aye, but the reason that Chzz had this problem, is becuase he used the code provided for C# on Template:Bots. Both my code and AWB's code would have allowed ChzzBot even if the page had " {{bots|allow=SineBot}} " on it. So what we're trying to do (I think) is replace the C# code at Template:Bots - Kingpin13 (talk) 13:16, 16 October 2009 (UTC)
    Pls forgive the boorish intrusion by someone who doesn't know what they're talking about =) Teach a non-coder to stick his nose in a coder's party ;p –xenotalk 13:18, 16 October 2009 (UTC)
    Sorry, still confused. By this statement "If my understanding of the compliance is correct, this should actually have been FALSE, ie the bot was not specifically permitted." Chzz seems to think his bot should not edit a page that says bots|allow=SineBot. Any bot should be able to edit that page, in fact, the null statement should be removed. Am I still missing the mark? –xenotalk 13:21, 16 October 2009 (UTC)
    That's interesting, according to the Template:Bots the bots not on the allowed list should be banned. I kinda thought it would depend on whether the bot was opt-in or opt-out. The equivalent of that in my code would be {{bots|deny=all|allow=<bots to allow>}}, which makes more sense to me, but I'd be happy to modify my code accordingly? - Kingpin13 (talk) 13:37, 16 October 2009 (UTC)
    I think it would be easier to make the documentation reflect actual practice of compliant bots rather than trying to swim upstream and ensure they're all updated to reflect this non-intuitive implementation... But again, just my amateur opinion =) –xenotalk 13:44, 16 October 2009 (UTC)
    I think it would be better to make the bots/nobots system less difficult to be compliant with. AFACT, there are at least 4 different ways to prevent all bots from leaving messages on your talk page. Mr.Z-man 16:07, 16 October 2009 (UTC)
    Not sure if you are agreeing or disagreeing with me =) –xenotalk 17:54, 16 October 2009 (UTC)

Time to redesign bot exclusion?

As is clear above, the current method of bot exclusion is complicated and difficult to be compliant with. Perhaps it's time to redesign the system? So far, it seems the goals are:

  1. Only one way to say "These bots may not edit"
  2. Only one way to say "Only these bots may edit"
  3. Only one way to say "No bots may edit"
  4. Only one way to say "I do not want these types of messages from any bot"
  5. It would be nice if a bot could detect this without having to load the entire page contents.

One possibility:

  • {{nobots}} works as it does now: no bot may edit the page.
  • {{bots}} also works as it does now: it does absolutely nothing.
  • {{nobots|except=FooBot/BarBot}} says "Only FooBot and BarBot may edit". I chose "/" instead of "," because it's possible that a bot's name contains a comma.
  • {{bots|except=FooBot/BarBot}} says "FooBot and BarBot may not edit".
  • {{bots|except=#nosource/#nolicense}} says "No message-leaving bot may leave a 'nosource' or 'nolicense' message". We could do similarly for "#AWB", "#interwiki", and other classes of bots if we want. Individual bots could also recognize "FooBot#bar" to allow exception of only the "bar" feature. Again, "#" is chosen as no bot name may contain that character.
  • The optional "except" must be the only parameter. If any other parameter is present, the bot is free to ignore that instance of the tempalte.
  • If more than one {{nobots|except=...}} and/or {{bots|except=...}} are used on a page, the bot must be listed as an exception in all "nobots" instances and not be listed as an exception in any "bots" instance to be allowed to edit.

The detector for this scheme is pretty simple:

 function allowedToEdit($text, $botnames){     if(preg_match('!{{\s*nobots\s*}}!', $text)) return false;     if(!preg_match_all('!{{\s*(bots|nobots)\s*\|\s*except\s*=(.*?)}}!s', $text, $m)) return true;     $re='!/\s*(?:'.implode('|', array_map('preg_quote', $botnames)).')\s*/!';     for($i=0; $i<count($m[1]); $i++){         $found = preg_match($re, '/'.$m[2][$i].'/');         if($found && $m[1][$i] == 'bots') return false;         if(!$found && $m[1][$i] == 'nobots') return false;     }     return true; } 

The major drawback is that this scheme has no provision for bots trying to honor the existing syntax; they'll probably either ignore the new syntax completely or treat the nobots version as {{nobots}}.

Perhaps the best way to do #5 is to just declare that these templates must be placed in section 0, and then bots need only fetch section 0 to perform the check (possibly like this). Anomie 20:46, 16 October 2009 (UTC)

(Aside) I await clarification/new code; for now, fortunately my bot edits are trivial, so I just deal with any page containing "{{bot*" or "{{nobot*" manually.  Chzz  ►  13:24, 19 October 2009 (UTC)

[edit] *uber poke*

Could a BAGer please take a look at this its been open since September and has been tagged for BAG attention since October. --Chris 08:53, 8 November 2009 (UTC)

I would have, but I leave approval of adminbots to the admin BAGgers. Anomie 04:02, 9 November 2009 (UTC)
I wasn't looking at you :) The fact that you did code review and made suggestions could (theoretically) mean people could accuse you of having a COI when approving. --Chris 09:02, 9 November 2009 (UTC)



Product Results (view all...)

search wiki for    ?
web dir firms image gallery news pdf wiki shop video 



↑ top of page ↑about thumbshots