 | Please remember that all editors are encouraged to participate in the requests listed below. Just chip in - your comments are appreciated more than you may think! | If you want to run a bot on the English Wikipedia, you must first get it approved. To do so, follow the instructions below to add a request. If you are not familiar with programming it may be a good idea to ask someone else to run a bot for you, rather than running your own. | Instructions for bot operators | | I | Before applying for approval - You will need to create an account for your bot if you haven't already done so. Click here when logged in to create the account, linking it to yours. (If you do not create the bot account while logged in, it is likely to be blocked as a possible sockpuppet or unauthorised bot until you verify ownership)
- Ensure that you have read the bot policy and that your bot is compliant with it, and that your idea isn't listed as a frequently denied bot.
- It is also generally a good idea to create a userpage for your bot, linking to your userpage, describing its functions and including an 'emergency shutoff button' (template here) just in case anything goes wrong.
- If your task could be controversial (e.g. most bots making non-maintenance edits to articles and most bots posting messages on user talk pages), seek consensus for the task in the appropriate fora. Common places to start include WP:Village pump (proposals) and the talk pages of the relevant policies, guidelines, templates, and/or WikiProjects. Link to this discussion from your request for approval.
| | II | Filing the application - Replace
BotName with your bot's user name in the box below and click the button. If this is a request for an additional task, put a task number as well (e.g. BotName 2). - Complete the questions on the resulting page and save it.
- Your request must now be added to the correct section of the main approvals page:
- For new bots: Click here and add
{{BRFA|YOUR BOT ACCOUNT NAME GOES HERE||Open}} to the top of the list, directly below the comment line. - For new tasks for existing bots: Click here and add
{{BRFA|YOUR BOT ACCOUNT NAME GOES HERE|TASK NUMBER GOES HERE|Open}} to the top of the list, directly below the comment line. | | III | During the approvals process - After a reasonable amount of time has passed for community input, an approvals group member may approve a trial for your bot and move the request to this section.
- Run the bot for the specified number of edits/time period, then add {{BotTrialComplete}} to the request page and move the request to the 'trial complete' section by moving the {{BRFA}} template that applies to your bot (it helps if you also link to the bot's contributions, and comment on any errors that may have occurred).
- If you feel that your request is being overlooked (no BAG attention for ~1 week) you can add {{BAGAssistanceNeeded}} to the page. However, please do not use it after every comment!
- At any time during the approvals process, you may withdraw your request by adding {{BotWithdrawn}} to your bot's approval page.
| | IV | After the approvals process - After the trial edits have been reviewed and enough time has passed for any more discussion, a BAG member will approve or deny the request appropriately.
- For approved requests: The request will be listed here. If necessary, a bureaucrat will flag the bot within a couple of days and you can then run the task fully (it's best to wait for the flag, to avoid cluttering recent changes). If the bot already has a flag, or is to run without one, you may start the task when ready.
- For denied/expired/withdrawn requests: The request will be listed at the bottom of the main BRFA page in the relevant section.
- It is also good practice to list your bot on the bot status page to let other Wikipedians know about your bot and what it is doing.
| | | Instructions for approvals group members | - Approving a trial
- Before granting a trial, consider whether the task could be controversial (e.g. most bots making non-maintenance edits to articles and most bots posting messages on user talk pages). If so, and the request does not already link to a discussion showing consensus in an appropriate forum (or silence after a reasonable waiting period), use {{BOTREQ|advertise}} to request that that be done.
- When you are satisfied that enough time has passed for discussion and (if relevant) that any technical issues have been resolved, use {{BotTrial}} to approve a trial run.
- You should then move the request from the 'open' section to the trial section, and also change the last parameter of {{BRFA}} from 'Open' to 'Trial'.
- Approving/denying a request
- When you feel enough time has passed after the trial for discussion/analysis/improvements, and are ready to approve/deny the request (the procedure is the same for expired/withdrawn requests), add {{subst:BT|STATUS|BOT_NAME_AND_TASK_NUMBER}} (where STATUS is Approved, Speedy, Denied, Withdrawn or Expired) to the top of the request page, replacing Category:Wikipedia bot requests for approval and the code surrounding it.
- Make any final comments along with a relevant status template (listed here), and add {{subst:BB}} to the bottom of the page.
- Archival
- After closing the BRFA, you'll need to remove the BRFA template from whichever section it is in on the main requests page (tip: copy the template to the clipboard for use in the next step) and then archive it in the appropriate section:
- For approved requests where a bot flag is required: Click here, and add
{{BRFA|BOT_NAME|TASK_NUMBER (if needed)|Approved|~~~~~}} to the top of the list. - For approved requests where the bot is already flagged: Click the above link and add
{{subst:BRFAA|BOT_NAME|TASK_NUMBER (if needed)|Flagged|~~~~~}}. - For approved requests where the bot is going to run unflagged: Click the above link and add
{{subst:BRFAA|BOT_NAME|TASK_NUMBER (if needed)|Unflagged|~~~~~}}. - For denied requests: Click here and add
{{BRFA|BOT_NAME|TASK_NUMBER (if needed)|Denied|~~~~~}} to the top of the list. - For expired/withdrawn requests: Click here and add
{{BRFA|BOT_NAME|TASK_NUMBER (if needed)|Withdrawn/Expired|~~~~~}} to the top of the list. |
| Bot Name | Status | Last edit | Date/Time | Last BAG edit | Date/Time | | ActiveAdminBot (T|C|B|F) | Open | Chillum | 2009-11-28, 23:34:41 | Never edited by BAG | n/a | | RM bot (T|C|B|F) | Open | Harej | 2009-11-25, 21:17:27 | Never edited by BAG | n/a | | Alph Bot (T|C|B|F) | Open | Alchimista | 2009-11-18, 23:28:58 | Never edited by BAG | n/a | | Ripchip Bot (T|C|B|F) | Open | IP69.226.103.13 | 2009-11-21, 08:53:42 | Kingpin13 | 2009-11-19, 13:18:48 | | MichaelkourlasBot (T|C|B|F) | Open | Michaelkourlas | 2009-11-24, 03:04:15 | Kingpin13 | 2009-11-23, 14:19:21 | | SDPatrolBot II (T|C|B|F) | Open | ThaddeusB | 2009-11-20, 01:16:06 | Kingpin13 | 2009-11-19, 20:02:00 | | RjwilmsiBot (T|C|B|F) | Open | IP69.226.103.13 | 2009-11-26, 04:42:28 | Anomie | 2009-11-25, 22:52:52 | | Coreva-Bot 2 (T|C|B|F) | Trial complete | IP69.226.103.13 | 2009-11-19, 11:30:48 | Anomie | 2009-10-30, 22:19:18 | | ContentCreationBOT (T|C|B|F) | Open | Cybercobra | 2009-11-16, 02:42:00 | Mr.Z-man | 2009-09-24, 01:59:20 | | EmBOTellado (T|C|B|F) | Open | Ezarate | 2009-11-19, 22:25:47 | Kingpin13 | 2009-11-19, 20:23:12 | | SmackBot XXIII (T|C|B|F) | Open: BAG assistance requested! | Rich Farmbrough | 2009-11-28, 10:25:23 | Never edited by BAG | n/a | | SmackBot XXII (T|C|B|F) | Open: BAG assistance requested! | Rich Farmbrough | 2009-11-28, 10:25:41 | Never edited by BAG | n/a | | AnomieBOT 36 (T|C|B|F) | Open | IP69.226.103.13 | 2009-11-27, 22:38:29 | Anomie | 2009-11-15, 00:15:06 | | SmackBot XXI (T|C|B|F) | Open | IP69.226.103.13 | 2009-11-28, 20:26:41 | Never edited by BAG | n/a | | Chris G Bot 2 2 (T|C|B|F) | Open | Fritzpoll | 2009-10-29, 16:13:41 | Fritzpoll | 2009-10-29, 16:13:41 | | Andrea105Bot (T|C|B|F) | In trial | IP69.226.103.13 | 2009-11-25, 05:33:05 | Kingpin13 | 2009-11-23, 10:16:22 | | Orphaned image deletion bot (T|C|B|F) | In trial | Chris G | 2009-11-21, 09:29:20 | Anomie | 2009-11-14, 00:00:12 | | UltraBot (T|C|B|F) | In trial | Junaidpv | 2009-11-09, 05:52:01 | Mr.Z-man | 2009-10-26, 22:06:01 | | AnomieBOT 33 (T|C|B|F) | Trial complete | Nmajdan | 2009-11-09, 21:12:52 | Anomie | 2009-11-09, 20:35:08 | | AnomieBOT 35 (T|C|B|F) | Trial complete | Reywas92 | 2009-11-09, 20:39:58 | Jarry1250 | 2009-11-09, 17:23:14 | | SheepBot (T|C|B|F) | Trial complete | Babylonian Armor | 2009-10-23, 01:41:41 | Anomie | 2009-10-21, 12:32:54 | | SDPatrolBot 4 (T|C|B|F) | Trial complete | Mr.Z-man | 2009-10-26, 22:13:21 | Mr.Z-man | 2009-10-26, 22:13:21 | [edit] Current requests for approval [edit] tasks • contribs • count • SUL • logs • page moves • block user • block log • rights log • flag Operator: Chillum Automatic or Manually assisted: Automatic Programming language(s): Perl Source code available: In progress but with certainly post before going into production Function overview: Maintains a list admins who have recently edited. Links to relevant discussions (where appropriate): Edit period(s): Every 15 minutes Estimated number of pages affected: 1 Exclusion compliant (Y/N): Y Already has a bot flag (Y/N): Function details: The bot reads the page http://en.wikipedia.org/w/index.php?title=Special:ListUsers/sysop&limit=5000 every 10 hours to keep a list of administrators. It then watches the IRC Feed to see names of people who have recently performed some sort of on-wiki action. When it sees an administrator do something it updates that admin's last edit time. A list of the 25 admins who have edited most recently is kept and updated on a special page every 15 minutes. This would allow for users to quickly find an administrator that is active. This bot would use the same IRC connection that HBC NameWatcherBot watches so there will be no additional load put on the channel. The 15 minute interval and the 25 admin cap are both up for debate of course. I would like permission to perform a series of test edits to get the code in a reasonable state before I post it in full under the GFDL. Chillum 00:19, 27 November 2009 (UTC) [edit] Discussion - Note: I have created User:ActiveAdminBot/blacklist which can be used to tell the bot not to include certain administrators in the list. This can be used to keep bots off the list and also any admin who does not wish to be included can opt-out there. Chillum 19:42, 27 November 2009 (UTC)
Using a wiki to store this information seems silly. A dynamic script on the Toolserver seems much smarter. The revision table is large enough. --MZMcBride (talk) 19:47, 27 November 2009 (UTC) - I don't use the toolserver, I have my own servers. I could run this as a web service I suppose, but I don't think we are running out of revisions and this would limit the integration into the watchlist and deprive us of a history. The rate of 25 names every 15 minutes is tiny compared to most bot's activity levels. The advantage of having it on-wiki is the historical record of activity. I can also put the top 3 most recently active admins in the edit summary so that people can see the information from their watchlists(Example: "Posting active admins: User:admin1 - User:admin2 - User:admin3"). Chillum 20:06, 27 November 2009 (UTC)
Does it report sysops who have recently made logged admin actions too, or just actual edits? –Juliancolton | Talk 18:16, 28 November 2009 (UTC) - Any action, either contributions or logged events such as admin actions or moves. I could alter this to be more discriminating if a good reason is given. One of my goals is to reduce the load at ANI by making direct communication with admins more simple. Chillum 20:24, 28 November 2009 (UTC)
- User:ActiveAdminBot/Raw output An example of the bots current raw information(the number on the left is a unix epoch style timestamp), and a draft of how it might be formatted. Chillum 21:41, 28 November 2009 (UTC)
- A rough estimate(assuming average admin name length of 12 and showing 25 of them every 15 minutes) is that it would create a total of 199kb traffic per day. I don't think it is that much. Chillum 23:34, 28 November 2009 (UTC)
[edit] tasks • contribs • count • SUL • logs • page moves • block user • block log • rights log • flag Operator: Harej Automatic or Manually assisted: Automated Programming language(s): PHP Source code available: User:RFC bot/requestedmoves.php Function overview: Maintains Wikipedia:Requested moves and related pages. Links to relevant discussions (where appropriate): This bot has been operating under consensus via User:RFC bot since May; the original discussion is available in this archive. Edit period(s): Every thirty minutes Estimated number of pages affected: Wikipedia:Requested moves/current, Wikipedia:Requested moves/current-oldstyle, Wikipedia:Coordination/Requested moves, and talk pages involved in the process. Exclusion compliant (Y/N): No, but that's an oversight on my behalf. By the time the process is migrated to this account, such functionality shall be added. @harej 19:58, 21 November 2009 (UTC) Yes. @harej 21:17, 25 November 2009 (UTC) Already has a bot flag (Y/N): N Function details: This account will take over what User:RFC bot has been doing with WP:RM since May, which is to primarily update the list of requested move discussions (e.g. here) and to cross-notify talk pages that are involved in multi-move requests (e.g. here). @harej 19:58, 21 November 2009 (UTC) [edit] Discussion This process has had the consensus to operate, and has been operating successfully, for some time now. This BRFA is simply to shift the process to a different account. @harej 19:58, 21 November 2009 (UTC) - RFC bot is yours also? --IP69.226.103.13 (talk) 07:11, 22 November 2009 (UTC)
- Yes. @harej 07:11, 22 November 2009 (UTC)
- Thanks. I don't see any possible issues in that case. If BAG members are concerned about anything the conversation is available. --IP69.226.103.13 (talk) 07:21, 22 November 2009 (UTC)
This can be speedy approved IMO. –Juliancolton | Talk 19:51, 22 November 2009 (UTC) - It appears to me, also, to be an appropriate candidate for speedy approval, including trial if needed. --IP69.226.103.13 (talk) 05:54, 23 November 2009 (UTC)
[edit] tasks • contribs • count • SUL • logs • page moves • block user • block log • rights log • flag Operator: Alchimista Automatic or Manually assisted: Automatic Programming language(s): Py Source code available: Standard pywikipedia script Function overview: Interwiki Links to relevant discussions (where appropriate): Edit period(s): continyous Estimated number of pages affected: Exclusion compliant (Y/N): Already has a bot flag (Y/N): No, request opened on pt.wikipedia Function details: Ad interwiki. Alchimista (talk) 23:28, 18 November 2009 (UTC) [edit] Discussion [edit] tasks • contribs • count • SUL • logs • page moves • block user • block log • rights log • flag Operator: Beria Automatic or Manually assisted: Automatic, supervised Programming language(s): Python Source code available: "Standard pywikipedia" Function overview: interwiki bot Links to relevant discussions (where appropriate): Edit period(s): daily Estimated number of pages affected: Exclusion compliant (Y/N): Already has a bot flag (Y/N): no, in request on pt.wikipedia Function details: Add interwikis. Béria Lima Msg 19:59, 18 November 2009 (UTC) [edit] Discussion Hi there Beria. I see the bot has already done a small number of edits. While on some other wikis this is welcomed, here we prefer bots to make no edits until approved, or explicitly approved for a trial. Please make sure you read through our WP:BOTPOL, and make sure you understand it. If you need help understanding anything feel free to ask me :). If you are just using Python, I think it should be exclusion compliant by default. I'm assuming you haven't changed this setting? I see you reverted this bot's edit to Do You Know (Jessica Simpson album), it's good that you are keeping track of the bot's edits. But can you explain why the bot made this edit in the first place? - Kingpin13 (talk) 13:18, 19 November 2009 (UTC) - I stopped the bot. Sorry, I assume that here is like all the others wikis, when I have to do a few edits to test.
- And no, I'm not made any change in the pywikipedia. I'm really using the standard version.
- I reverted the bot because I give to him the wrong command (He ask me if i want to do that change, and i said yes, when should said no). Béria Lima Msg 23:52, 19 November 2009 (UTC)
- So the bot will be run supervised all of the time? Yeah, on en.wiki the bot policy is that you get permission before a trial. You only made a few edits, then stopped, so that shouldn't be an issue.
- Operator communicating, willing to learn. I don't have any special concerns about py.wiki bots, as it seems BAG members watch out for them fairly well and know the ins and outs. --IP69.226.103.13 (talk) 08:53, 21 November 2009 (UTC)
[edit] tasks • contribs • count • SUL • logs • page moves • block user • block log • rights log • flag Current version: v. 1.4.1.0 (Nov. 22/09) Operator: Michaelkourlas Automatic or Manually assisted: Almost entirely automatic. The only manual action needed is to start the process. Programming language(s): Visual Basic .NET 2008 Express Edition, DotNetWikiBot Framework Source code available: Yes. See here. Function overview: Tags new empty pages (or pages that contain just whitespace - spaces, tabs and line breaks) in the article namespace with db-a3 or db-blanked Links to relevant discussions (where appropriate): [1] Edit period(s): Daily, likely for a few minutes to an hour (I have school and homework, so I'll only be able to do it every so often) Estimated number of pages affected: About 30 sec per page, but checks to see if the last revision was 5 min ago (the time length can be changed) to allow new page patrollers to mark them first if they merit a different CSD template, and to allow the article owner to place content on the page. For actual amount of pages edited, it depends on the amount of new empty pages that have not been marked with a CSD template, and how many pages the bot is told to check. Exclusion compliant (Y/N): Yes Already has a bot flag (Y/N): No Function details: Tags new empty pages (or pages that contain just whitespace - spaces, tabs and line breaks) from Special:NewPages (not patrolled) in the article namespace with only one author with db-a3 (if one edit) and db-blanked (if more than one edit), and also warns contributor of tagging (if db-a3). [edit] Discussion Hey there Michaelkourlas, and thanks for offering to run a bot :). For this task, I feel there would need to be a delay on this. As we don't want to bot marking pages half a second after they are created when the creator plans to add to them. And we should also give any active new page patrollers time to see the article in-case the title merits a different CSD (e.g. CSD G10 ). Since both title and content can be taken into consideration when tagging for speedy deletion. Also, do you think you could start a thread at a relevant talk page (e.g. WT:CSD) asking for input? Thanks, - Kingpin13 (talk) 10:48, 18 November 2009 (UTC) - You bring up some very interesting issues. The first issue you brought up (about the creator needing time to add content) can be addressed using the 'Timer' control on VB.NET. The second issue could also be addressed using the timer. From the sounds of it, I will need to make a delay of about 5-10 minutes per page. This should give new page patrollers the opportunity to mark them first if they merit a different CSD template, and to allow the article owner to place content on the page. I'll also create a discussion on WT:CSD as well, creating a link to this discussion. Thanks for your comments. --Michael Kourlas (talk) 22:25, 18 November 2009 (UTC)
- Source code updated to reflect 5 min wait. --Michael Kourlas (talk) 23:17, 18 November 2009 (UTC)
- I do not use .net but aren't you loading the new page, then doing a hard loop for 5 minutes, then checking if the page is empty? What if the page changed in that period? I think Kingpin meant that a bot should ensure that a decent period (say 10 minutes) has elapsed since the last edit on the new page (i.e. from the timestamp). I did not study the code, but I think you are missing an "==" and a newline, to finish the header on the user page. Also, I wonder how "IsEmpty" works. Would a single space character be regarded as empty? Johnuniq (talk) 00:48, 19 November 2009 (UTC)
- Code changed to check to see if last revision was 5 min ago. MAJOR UPDATE: Code also changed to check for if it just contains whitespace. Code also changed to make user notice make sense (i.e. added "==", etc.). Thanks for finding the bugs! --Michael Kourlas (talk) 01:24, 19 November 2009 (UTC)
More updates (bug fixes and such) --Michael Kourlas (talk) 23:13, 19 November 2009 (UTC) The db-author tag should only be added by the author of the page. Tagging main namespace articles with a3 seems OK. For other namespaces, I think it's better to manually review empty pages from a database report, and the bot shouldn't be approved to tag them for deletion. — Carl (CBM · talk) 23:21, 19 November 2009 (UTC) Isn't db author added when the user explicitly blanks the page, and thus wants it deleted? Scratch that. I'll change it to db-blanked. As for the other namespaces issue, the bot only looks at articles in the main namespace (the article namespace) in the first place. --Michael Kourlas (talk) 23:53, 19 November 2009 (UTC) - You should probably change the bot description above to point out it only runs in mainspace, since as written the description applies to all namespaces.
- Since your bot is only looking at new pages, if a user blanks a preexisting page then the bot not will not notice it anyway. I think db-a3 (no content) is the clearest reason to delete a new, empty article. — Carl (CBM · talk) 23:57, 19 November 2009 (UTC)
- I'll change the description, but I don't really follow your second comment.--Michael Kourlas (talk) 00:15, 20 November 2009 (UTC)
- I mean that although users do sometimes blank existing articles to indicate the article should be deleted, your bot is not looking at long-existing articles. So rather than interpreting a new blank page as a request to delete the existing page (db-blanked), it makes more sense to treat a new empty page as simply not having any content (db-a3). — Carl (CBM · talk) 00:15, 20 November 2009 (UTC)
- That makes sense. OK, I'll change it. By the way, can people opt-out of user warnings for speedy delete templates through exculsion compliancy? --Michael Kourlas (talk) 01:38, 20 November 2009 (UTC)
- Do you mean via Template:nobots? You could see if there is a type of exclusion there that is close enough, maybe the prod or afd exclusion. — Carl (CBM · talk) 01:48, 20 November 2009 (UTC)
- Well, I don't think you can pick an exclusion that's "close enough". I think that there has to be an exclusion that expressly says "deletion templates" or "speedy deletion templates". I don't see anything like this on Template:nobots.--Michael Kourlas (talk) 15:30, 21 November 2009 (UTC)
- I think you're right. At worst, you can just check for {{nobots}} and {{bots|deny=MichaelkourlasBot}}, if you want. — Carl (CBM · talk) 18:24, 21 November 2009 (UTC)
- Bot now exclusion compliant. --Michael Kourlas (talk) 21:13, 21 November 2009 (UTC)
- The bot should not be marking pages with good content in the history as CSD A3 . As the bot was before, it would check the history, and if the author had blanked, and was the only contributor to the page, then the bot would mark as CSD G7 . While I don't mind if this is taken out, I do mind if the bot is marking pages with a good history as A3. - Kingpin13 (talk) 10:22, 23 November 2009 (UTC)
- I thought that might be a problem, so I kept the original code. I will revert it when I have time. --Michael Kourlas (talk) 13:47, 23 November 2009 (UTC)
- Great. @Carl, You do realise that the bot will only mark as CSD G7 if the page was blanked by the creator, not if it is created with no content? - Kingpin13 (talk) 14:19, 23 November 2009 (UTC)
- The bot description says, "Tags new empty pages" so I assumed that the page had to be created and then very quickly blanked for the bot to pick it up. Certainly a page that has been around for a few days and is then blanked is not a "new" page and so the bot will not be looking at those at all, according to its description. I do not think it would be reasonable to automatically tag blanked pages for deletion if they have been around for some time. The pages the bot is looking at are thus very unlikely to have much good cntent in their (short) history. — Carl (CBM · talk) 22:48, 23 November 2009 (UTC)
- Original code restored... Carl, there may be a chance that the page has some sort of content from the beginning, and thus a blanking would make it eligible for deletion, not under A3 but under blanked (G7). (This would not be the case if there was only one edit, but the bot takes that into consideration). What Kingpin13 says makes sense, so I have reverted the code. --Michael Kourlas (talk) 23:41, 23 November 2009 (UTC)
How does the bot decide which pages are "new"? — Carl (CBM · talk) 23:42, 23 November 2009 (UTC) - It looks at Special:NewPages with the not patrolled flag on (i.e. it does NOT look at patrolled pages)--Michael Kourlas (talk) 03:04, 24 November 2009 (UTC)
[edit] tasks • contribs • count • SUL • logs • page moves • block user • block log • rights log • flag Operator: Written by Kingpin13, will be run by Sodam Yat to start off with Automatic or Manually assisted: Automatic Programming language(s): C# using DotNetWikiBot Source code available: No Function overview: Will take over from User:CSDWarnBot Links to relevant discussions (where appropriate): recent discussion at WT:CSD / Disscussion which led to CSDWarnBot's block Edit period(s): Continuous Estimated number of pages affected: 50+ per day, according to CSDWarnBot's edits Exclusion compliant (Y/N): Yes Already has a bot flag (Y/N): N Function details: The reason CSDWarnBot was stopped is because it wasn't waiting before notifying users, this bot will wait 15 minutes (this time can be very easily changed if wanted). Basically, for each page tagged under CSD, the bot will check if the creator of the page was warned, if not, then the bot will warn the creator. [edit] Discussion The main problem is what template the bot should use to notify the user. I'll have a look into creating one :). - Kingpin13 (talk) 19:06, 17 November 2009 (UTC) - Note discussion on talk: Wikipedia talk:Bots/Requests for approval/SDPatrolBot II - Kingpin13 (talk) 08:58, 18 November 2009 (UTC)
- Can the bot also notify the CSDer that they should notify the creator of the article in a timely fashion? Time looks good.
- Why run by Sodam Yot? The user page suggests he has is not willing to discuss issues about deletion:
It's rare that I actually re-visit a discussion that I'm not actively involved in. If I commented on an AFD or marked an article for Speedy Deletion, I'm not actually paying attention to it beyond that point. Should my points be proven wrong, or should evidence that I missed surface, I have no problem with an Admin reading my !vote the other way. -
- Deletions are a matter of community consensus that results from discussions. A user who make unilateral decisions and announces an unwillingness or rather lack of concern about the discussion does not strike me as an appropriate bot operator. The second sentence is pointless, admins aren't going to say, "Oh, I read this guy's user page, and he won't mind if I discount his vote." So, essentially the user is commenting on AfDs and marking articles for speedy deletion, but is not amenable to community input once he's voiced his opinion on the article.
- So, he'll operate the bot and ignore what the bot does? IMO this bot should not go forward with this operator. --IP69.226.103.13 (talk) 11:24, 19 November 2009 (UTC)
- Firstly, I don't take that as him saying he doesn't care about it, rather that he doesn't manage to keep track of it. Anyway, I will be running this bot too, it's just I don't have a computer running 24/7, which Sodam Yat does. But I will most likely be the one paying attention to what the bot does, as I'm the one who has written it, and therefore the one who will actually be able to address concerns. - Kingpin13 (talk) 12:31, 19 November 2009 (UTC)
- P.S. Yes the bot can identify the nominator (and already does). I can have it leave them a message too. - Kingpin13 (talk) 12:34, 19 November 2009 (UTC)
- I think this is useful as it will cut down on repeat offenders who don't understand it's a courtesy to prioritize nominating the article's creator. --IP69.226.103.13 (talk) 19:27, 19 November 2009 (UTC)
- That was mainly because I rarely comment in AFD debates, and do not add them to my watchlist. So if I make an argument based on something presented, and that is later proven incorrect, chances are I will not recheck my !vote to make sure it is still the correct choice. If someone wants to bring something to my attention, I will be more than happy to talk about it. While I run the bot, I will make every effort to monitor it while I am available, and will try to check periodically even if I am away, in case something is going wrong. Sodam Yat (talk) 16:35, 19 November 2009 (UTC)
-
-
-
- My concerns are not alleviated. A bot operator has to keep track of what the bot does. I have to say that bot policy does not favor an operator who sees community consensus as his putting in his vote and then ignoring the discussion. Wikipedia is an encyclopedia written by community consensus. It's actually working. Bots require community consensus and operators who are communicating with the community. I am uncomfortable with this bot being operated at any time by this user, as I see high potential for unnecessary drama and incivility due to the operator's stated lack of involvement in developing community consensus for deletion discussions. This is particularly a problem with this bot and this bot operator. User:Sodam Yat has offered, on my user talk page, to attempt to find another operator to work with Kingpin13 on running the bot, and I think he should be taken up on this offer.
- As usual, I have no issues with Kingpin13 as operator or coder. --IP69.226.103.13 (talk) 19:27, 19 November 2009 (UTC)
- In light of the concerns raised, I withdraw my offer to run the bot. If I am able to find someone else to run it, I will do so, but even if I cannot I can't in good conscience run a bot with opposition. Sodam Yat (talk) 19:50, 19 November 2009 (UTC)
- Well, that's a shame IMO. I've contacted ThaddeusB, since he mentioned something about it.. - Kingpin13 (talk) 20:02, 19 November 2009 (UTC)
- If I don't get back to this, just a note that I see no issues with ThaddeusB as a bot operator. --IP69.226.103.13 (talk) 20:07, 19 November 2009 (UTC)
- I think a 10 minute delay would be sufficient.
- The templated message needed to be carefully crafted. I trust a draft version of it will be released soon.
- --ThaddeusB (talk) 01:16, 20 November 2009 (UTC)
[edit] tasks • contribs • count • SUL • logs • page moves • block user • block log • rights log • flag Operator: Rjwilmsi Automatic or Manually assisted: Automatic Programming language(s): AWB Source code available: AWB Function overview: Set page ranges within page parameter of citation templates to use en-dashes Per guidelines on Template:Citation etc. Links to relevant discussions (where appropriate): Guidelines on Template:Citation etc. Edit period(s): On download of new database dump Estimated number of pages affected: ~29,000 (first dump), less after Exclusion compliant (Y/N): Yes Already has a bot flag (Y/N): N Function details: AWB has logic to apply en-dashes to page ranges within the 'page' or 'pages' parameter of citation templates such as {{citation}}, {{cite web}} etc. Many page ranges are incorrectly given using a simple hyphen or (occasionally) an em-dash. Pages not matching this logic will be skipped. [edit] Discussion 29,000 edits to replace '-' with '–'? I think this would be better done as something in AWB's general fixes (if it isn't already) than as a standalone task. Mr.Z-man 00:44, 12 November 2009 (UTC) Sure, why not take care of it all in one go? --Cybercobra (talk) 10:40, 12 November 2009 (UTC) - Are these considered as cosmetic changes or not? -- Magioladitis (talk) 18:17, 12 November 2009 (UTC)
- I'd say they're slightly more significant than "cosmetic", especially within citations. I don't think AWB would be practical for 30k edits though. –Juliancolton | Talk
- They are cosmetic, but it's supported by the MoS and the task is very specific. --Cybercobra (talk) 07:49, 13 November 2009 (UTC)
- There are plenty of things supported by the MoS that are too trivial to have a bot enforce them individually; the specific-ness of the task is part of the problem. If it fixed several problems at once, or did this while fixing something more substantial - like AWB's general fixes - it would be fine. But I'm not convinced this is a significant enough task that it needs a bot to do it. Mr.Z-man 18:19, 13 November 2009 (UTC)
- I'm not seeing what the downside of running this bot would be. Are you suggesting server load as the problem or...? --Cybercobra (talk) 18:55, 13 November 2009 (UTC)
I can't find where a "-" is required over a whatever in the links above. Where is the discussion about this change? I thought I asked this before. --IP69.226.103.13 (talk) 08:30, 14 November 2009 (UTC) - See WP:MOSDASH. --Cybercobra (talk) 09:29, 14 November 2009 (UTC)
- Thanks, the links above don't discuss it. Yes if there are other similar MoS details, the bot could take care of that. Also, this would be an ongoing task. I also don't understand the objection. That's a lot of articles that need a trivial change, and a lot of future maintenance. It seems bot worthy to me. --IP69.226.103.13 (talk) 20:25, 14 November 2009 (UTC)
- The problem is the idea of "one bot per change" fills up page histories with inconsequential edits. We frequently deny bots that run the Pywikipedia cosmetic_changes.py script, and that does about half a dozen different things. I believe I've suggested in the past that people interested in enforcing the MoS/WP:CHECKWIKI with a bot get together, find the things that can be reliably detected by a bot, and make one bot to do all of them. There's also no real urgency with tasks like this such that all 29000 pages need to be fixed immediately, which is why I don't quite understand the objection (or rather complete lack of reply) to my suggestion to add this to AWB's general fixes, which is pretty much designed for things like this. Mr.Z-man 22:27, 14 November 2009 (UTC)
- The logic is already in the gen fixes. My idea was to request a simple task as my first bot task, in the expectation that it would get approved more readily, rather than a more complex one. Perhaps I was mistaken. Rjwilmsi 23:45, 14 November 2009 (UTC)
- Your logic or your reasoning for doing it this way is fine, maybe not the specific task, considering how many pages it impacts. Yes, I agree, single small edits to 29,000 pages when the same edit could fix more things should be considered. How about trying out this change on a small number of edits, then adding something? I don't know. Any ideas from anyone else. I see your points, Mr.Z-man. --IP69.226.103.13 (talk) 04:13, 15 November 2009 (UTC)
- I could run the bot with all AWB gen fixes enabled, and ensure at least the page range dashes were fixed. Does that help? Rjwilmsi 08:47, 15 November 2009 (UTC)
- Are all of the AWB gen fixes 100% reliable? I don't think they are, but I could be mistaken. --ThaddeusB (talk) 04:51, 20 November 2009 (UTC)
- Major concern, then. Is there a known and finite list of a number of 100% reliable AWB general fixes. Yes, this needs more input, but I think it's a great idea for a bot, particularly if it can do a number of edits at once, doesn't have to do all, leaving it still a basic programming exercise. --IP69.226.103.13 (talk) 08:58, 21 November 2009 (UTC)
- {{BAGAssistanceNeeded}} - This could use some more comments by BAG members other than myself. Mr.Z-man 22:40, 18 November 2009 (UTC)
- I've looked at this several times in the past week, and I just can't seem to care much one way or the other. It would be good if any other bottable general fixes were done at the same time, WP:CHECKWIKI might be able to help with that. Since it seems that the only potential controversy here is blowing up people's watchlists with relatively minor edits to 29000 pages, maybe we should run it past WP:VPR to try for a wider consensus for or against? Anomie⚔ 22:52, 25 November 2009 (UTC)
- Probably appropriate to seek a wider audience. I can't even guess what the result of more input would be. I think details should be fixed. If they can be fixed by bot, so much the better. But I agree with not doing 29,000 edits to fix one thing if the same number of edits could fix a handful of details at the same time. --IP69.226.103.13 (talk) 04:42, 26 November 2009 (UTC)
[edit] tasks • contribs • count • SUL • logs • page moves • block user • block log • rights log • flag  | This is a reopened bot request - please see the bottom of this page for the most recent information | Operator: Excirial (Contact me,Contribs) 18:42, 7 January 2009 (UTC) Automatic or Manually Assisted: Fully automatic, with the possibility to manually override the bots behavior if desired. Programming Language(s): VB.net, Function Summary: - Query Wikipedia API every X minutes (Currently: 30 minutes) for new pages
-
- If bot is cold started, fetch newpagelist with the last X (Idea: 500-1000) pages. (See: Note 1)
- If the bot is running, only fetch the list of new pages since the last visit.
- If the bot has found any new pages, load the page content and start to parse it.
-
- Bot will parse the content to determine if any maintenance tags have to be placed.
- If there is a need to place a maintenance tag, add the tag to the article, and resume with the next article.
Edit period(s) (e.g. Continuous, daily, one time run): Continuous Edit rate requested: 1 edit per new page tops. (Estimated 10 edits a minute tops, currently a test setting that is open to be lowered.) Already has a bot flag (Y/N): (Not applicable, new bot) | Extended content | | Function Details: |} Note: Coreva-Bot had a previous bot request located Here. Prototypes of the previous idea behind Coreva showed that it would be virtually useless. This request is for a functionally completely different bot (But with an identical name). Coreva's main task is placing maintenance tags on new pages that require them, similar to the way most newpagepatrol's work their beat. Coreva's will regularly(every 5-10 min) check the newpage list for new article's, fetch the new article's content, parse the content (See: Parser Table) and finally update the article, adding required maintenance tags. Just like the previous Coreva, this one should also be quite light on server resources. The bot queries the server's new page list every 5-10 minutes, and (So far) each article re quire's two server queries (getting the article's content, and a query to check if the article is an orphan). Category counts, link counts et cetera are handled internally by the bot. Additionally, the bot will require one database write to add the template's (In case this is required). The estimated edit rate for the bot will be 2 edits per minute on average. (See: Note 2) Coreva is not a miracle, and will never replace a living newpage patrol. Coreva cannot patrol for WP:CSD and does not understand hoaxes, advertising or vandalism. However, a lot of article's slip of the newpage list without having any form of maintenance tags. About half the pages on the newpagelist show as not being patrolled, and even though this is a very rough guess, this equals more then 2.000 pages a day. (See: Note 3) Since adding maintenance tags is thoroughly boring work, i think Coreva could spare quite a few patrols a bit of boredom :).(Unlike CSD tags which require at least some form of using your brain, maintenance tags require nothing more then checking 20 indicators, most of them nothing more then: Present/Not present) Finally, just like the old Coreva, its still pretty much work in progress, which is only done in spare time. While the progress on this Coreva is much faster then on the previous one, i assume it will still take a few months before it is capable of being a fully automated bot. Even if it would be technically capable to do so, it will not be a fully automatic bot until i tested it thoroughly (few weeks i guess) in assist mode, which means Coreva would only me feedback on what tag it would place on every page it checks. This way any annoying mistakes in the parser should be ironed out, while at the same time it allows to improve the parser code. [edit] Parser Table This table gives an overview of the templates Coreva will be placing on the articles, along with the current criteria configuration for doing so. Note that this is still pretty much in beta stage; templates may be added and removed depending on tests. Also, the criteria are still based on very simple algorithm's. Coreva's tests are conducted on a very small and varied set of locally stored articles, thus criteria are still general. In their current form they should, however, produce very little false positives (But would likely have quite a few false negatives). So all in all: Work in progress! (See: Note 4) | Tag | Criteria | Comment | | Wikify | No internal links | Amount of internal links = 0 | | Uncat | No categories in the article. | Amount of categories = 0 | | Unreferenced | No references in the article | Not Ref tags, or references/notes header detected. | | Footnotes | Article contains a standard "Notes" or "References" header, but no Ref tags | - | | Internal Links | Article contains less then (Amount of words / amount of links) internal Links | Percentage not yet set. | | Orphan | Article is linked by (0) articles. | - | | Stub | Article size is smaller then X | Suggestion: <1kb / 100 words / 1000 characters (inc. spaces) | | Sections | Article contains to little sections or readabilities sake (Note: section equals a linebreak) | < 6 sections counter && (Amountofsections * 2500) > articlesize. | | Too many links | To be determined | For this i still need to analyze guidelines, and the appropriate category. | | Too many categories | Amount of categories > X | X: 10? 20? 30? Depends quite a bit on the article size. Perhaps a base of, say, 10, and another cat for every x words. (For example, World War II has 42 cats, but its a huge article). | - Note 1: A second idea is to let the bot store his last query time permanently, and query all new (Non patrolled) pages since the bot went off line. These pages could then be processed at a lower priority, meaning that they would only be processed once the bot runs out of pages to process, with a limit on the amount of pages processed each minute. (So, if the bot limited itself to 5-10 edits a min tops, it would mean that 3-8 old low priority pages could be processed a minute). Being in the CET timezone, this would translate to a 250 or so page queue generates overnight that could be processed during the reminder of the day.
- Note 2: I am currently in doubt if the bot should notify the user with a template in case maintenance tags are placed, encouraging the article creator to recheck the page while it is still "Warm". This would double the bots database writes, and at the same time i cannot predict if user are adverse to being templated, or if anyone would chance an article (Or ask for help). On the other side: If a user created a page on the basis of web site's, warning them no sources are added could prevent a hell lot of wasted time for other users to verify all the article's content from web searches.
- Note 3: This is based on statistics from May, 2006. During that time Wikipedia got 3600 new articles a day. Nowadays the number is most certainly quite a bit higher, but due to the difference between peak and normal hours, its quite hard to make a guess based on special:newpages. :)
- Note 4: Its rather obvious, but since i didn't mention it: Coreva does not add template's to pages marked for CSD, and does not add templates that already exist.
[edit] Discussion | Previous discussion | | ThaddeusB | FYI, I was considering making a similar bot, so am familiar with the concept. Here are some things to consider: - many articles are not made in one edit. The bot should avoid tagging anything until a reasonable amount of time has passed since the last edit (perhaps 1 hour) to ensure that it doesn't tag works in progress.
- if the bot isn't going to make any effort to determine CSD criteria, it probably shouldn't mark pages as patrolled since a human look might still be needed.
- human editors often mark pages as patrolled with putting any maintenance tags (i.e. they only verify it is a legit article subject), so it is probably worth while to check all new articles.
- using {{articleissues}} is preferable to placing a bunch of different cleanup tags.
--ThaddeusB (talk) 20:21, 7 January 2009 (UTC) - Thanks a lot for the suggestions! While i wrote down most of the bots proposed structure, design, code layout and classes alocation, there are masses of fine details i did not consider yet, so bringing them to my attention to them is certainly helping a lot :)
- Many articles are not made in one edit. The bot should avoid tagging anything until a reasonable amount of time has passed
- I have spend some time determining what was better: Tagging (Near) real time, or tagging with a delay. And i am not certain what is better yet (Probally i won't decide until Coreva goes on autopilot).
- There are some advantages to tagging with a delay: Bad pages are likely already filtered out and people get a chance to finish their article. The disadvantage however, is that this modus operandi requires a few more internal checks to see if (Say, an hour) has passed since the time the article was created. At time Coreva could end up with a fairly full queue, but no permission to reduce that queue due to restrictions.
- There are also advantages to tagging real time: First off, its easier and more strait forward to implement. Another advantage would be that if the article's creater were notified about the tags with a template, it would be much more likely that they are still online. Disadvantage would be that it can be annoying to people that a page they are still working on gets tagged. Then again, thats the way each and every human new page patrol works, and it would also give an indication of things that are still needed within the article.
- If the bot isn't going to make any effort to determine CSD criteria, it probably shouldn't mark pages as patrolled since a human look might still be needed.
- Good point, and one that i didn't really think about yet. While most of the basic functionality is already there (Get the page list, store in a database, parse a working set and check the pages what tags need to be added), the entire "Return the data to the server" part of the bot is still non existent. When i got to that part i will surely pay attention to this suggestion/wisdom :).
- human editors often mark pages as patrolled with putting any maintenance tags (i.e. they only verify it is a legit article subject), so it is probably worth while to check all new articles.
- The only time i considered doing that was when the bot got a cold boot, meaning it has been offline longer then momentarily. Since the bot will run on a home PC, its likely the bot will be going down a few hours a day, which is while i am asleep. From empirical data it shows that 500ish pages are created during that time. When originally writing the RFBA that seemed like a lot, so leaving out anything patrolled was a good way to make the load less. Then again, even if i limited the bot to 5 edits a min, at an average edit rate of 2 pages a min (On working away the new pages) it would still just take 3 hours to remove the backlog, and only just over an hour if i set it to 10. So its indeed worthwhile to check all pages.
- using {{articleissues}} is preferable to placing a bunch of different cleanup tags.
- I fully agree with that. Friendly works the same way, just like the majority of the rest of the tools. Its just a few extra lines of code (Or maybe, even less) to work this way.
- Sorry for the length of this reply, as it ended up being nearly the size of a separate RFBA. Truth is that i tend to consult suggestions while im busy building the bot. Sometimes it might take a while to implement a certain feature i thought about earlier, and by then, i might already have forgotten half the things i thought about by then. So, just like with coding, good documentation while you are busy can be quite helpful :) Excirial (Contact me,Contribs) 21:34, 7 January 2009 (UTC)
| | Mr.Z-man | Some of these tags could have issues being added by bots. Some comments: - Unreferenced - I would expand this to not tag if it has any external links as well, as those could potentially be refs, especially by people unfamiliar with Wikipedia style.
- Uncat - Make sure to get the prop=categories along with page text to get categories added by templates
- Internal Links - This could potentially be screwed up by templates and stuff. A 3 sentence article with an infobox and a couple stub templates might only need 2 links, but it'll have a lot of text
- Orphan - Should this really be added minutes after creation? I think it would be better to give people some time, especially for this, as it requires editing other articles.
- Stub - This (and most others) will have to make sure to exclude disambig pages
- Sections - Again, could be screwed up by infoboxes, other templates, and refs
- Too many categories - I'm not sure if this can really be determined reliably with just numbers. Wikipedia's category system can be really f*'d up in some cases. Ideally, no article should need 40 categories, but with no category intersections, its unfortunately necessary in some cases.
- As for user talk messages, it has benefits and drawbacks. Messages customized based on the tags could encourage people to fix the issues rather than scaring them off, but they could be scared off anyway by a bot screaming at them about what's wrong with their article. It could also annoy established users (see next comment). You could put the messages on the article talk page instead, but people might not see them there.
- You might want to check the edit count of the creator, established users could probably be given the benefit of the doubt and their articles skipped, or just skip users on pages like User:JVbot/patrol whitelist.
- {{articleissues}} would probably be best if its adding more than 2 tags.
- Make sure to add the date to tags, so other bots don't have to come through and do it.
-- Mr.Z-man 21:06, 7 January 2009 (UTC) - So many suggestions to improve Coreva already, Thanks both of you, i really appreciate your thoughts on this!
- Unreferenced
- Very, VERY helpful suggestion. I have seen that kind of referencing more then once, and i was already none to happy with the crude detection for the references. I can simply modify one of the filters a little bit, and it would do the trick.
- Uncat
- Good point. Coreva can't detect those in its current incarnation. While i could request the page with the templates expanded, that would only create potentional problems with article length and so on.
- Internal Links
- Its would not make much of a difference i think, but i need research this a little more before i draw conclusions. Most infoboxes could use some form of linking themselves. Either way, the values i was thinking about are 0,5-2% ish. The Bill Gates infobox is 190 words for example, meaning that the article would need between 0,95 and 3,8 links. Seeing the infobox itself contains a few links, it should not be that hard to reach the threshold :).
- Stub
- Indeed, Indeed, just as they need to avoid redirect pages. As long as a disambig template is present, it should not be to hard to detect a disambig page.
- Sections
- Im translation "Section" to everything that does not contains a linebreak(Enter) in between them. Then again, there is no real need to keep an enter between the text, and the closing characters of an infobox, table, etc. Not sure how i am gonna solve that problem just yet, but ill think of something eventually (I hope :-))
- Too many categories
- 99.5% of the article's don't have categories in the first place, so this i never expected this late addition to the list to get much attention. The only reason i included it anyway was because i saw a few (Read: 10 a year tops) articles where mostly business owners ended up adding their article to each and everything even remotely relevant, causing articles to end up with a cartload of categories. As for normal, every day use, i tend to set the threshold to add this template quite high, so the absolute majority of the articles will simply never be tagged with this; Just the weird cases i saw every now and then. This especially since there is no definitive guideline on the amount of categories i know.
- Orphan
- This tag is in no way easy to solve, especially not for a new user. The problem with not adding this tag is that it is exceedingly difficult to find an orphaned page later on. Special:LonelyPages is no longer updated, and since the article has no incoming links, the only way to get to the page is typing the title into the search box. I am (Excuse my ignorance if i am mistaken) not aware of any method to check the amount of links to an article other then trough querying the API, or through a search in one of the special pages. Especially on pages with somewhat less regular titles ("Accessible publishing" would not be something i would put between link brackets for one) or titles that a bit different to regular writing language(Not a good example, but if someone would write "Hearth Disease" then (If it would be orphaned) "Cardiovascular disease" would never come up) it could very well be that articles never get linked as time passes. Thus, the only solution i know is placing an orphan tag. Though i would very much like to hear suggestions or comments on this reasoning, as im not especially happy with this situation either.
- Talk Messages
- I kept a tally of this once, and virtually every unreferenced, uncat and wikify tag ended up on a quite new user's page. Template addition based upon editcount could end up being a very good idea. A new user could receive a friendly template containing information about the placed tags (If i would end up including talk page messages i intent to write step-by-step guides for each template coreva adds some day, so that new users don't get slapped with a lengthy guideline). Experienced users could be skipped, or could receive a short notifier one line notifier that coreva added templates. Either way, their should, and will be, be a way to opt out indefinitely if implemented.
- Again, my thanks for these suggestions! Excirial (Contact me,Contribs) 22:27, 7 January 2009 (UTC)
| | X! | Question - Why load the API every 5-10 minutes? Why not just use he IRC RC feed? Xclamation point 06:09, 8 January 2009 (UTC) - Several reasons. The first reason is quite simple: It is a hell lot less work to code. I already needed an XML parser for the other API queries, so it was incredibly easy to let it fetch the newpage list from the api. Furthermore its as easy as querying the server, and receiving the data. If i was to use the IRC Feed, i would have to code an entirely new feature based upon IRC parsing (Connect to irc, login to the nickserver, join the correct channel, parse the format used there). All in all it would end up being a lot of extra work, with the additional disadvantage i never coded anything involving IRC in the first place.
- Second reason is that the IRC feed offers no advantages over the API in Coreva's case. Coreva offers no advantage if it gets the newpagelist realtime. Hence, 5-10 minutes wait time allows CSD patrols to add CSD templates to the article, which prevents Coreva from parsing useless pages. Excirial (Contact me,Contribs) 10:42, 8 January 2009 (UTC)
| [edit] Reopening request Over the past two ish months the amount of time i could spend on Wikipedia was drastically reduced due to other duties, causing a certain lapse in coreva's development. Another issue halting development progress was caused by an old programmers trap: Building a patched together prototype which should be trown away once i had a proof of concept it actually worked, and instead keeping the prototype and resuming work on it, which eventually let to a horrible code mess and a completely non understandable program. In the past month i finally found the time and willpower to use a step trough debugger throughout the entire program to decipher and salvage the mess as much as possible, before rewriting coreva from scratch, sans for a few salvaged functions that actually worked. The actual working of the bot have changed very little from the table i added above - i dropped the STUB, TOMANYCATS and TOMANYLINKS due to them being prone to false positive. I am currently testing a module that can detect peacock pages (Based upon statical analysis, weighted word lists andsome basic calculations); So far it work fine when comparing featured article's versus peacock articles (1 false positives on 270 correct tags), but the calculation algorithm makes to many mistakes on small articles, so its disabled for now. [edit] Et Cetera - The bot language switched from C#.net to VB.net to force recoding, rather then copy paste while rewriting it.
- Par suggestion, the bot dates every tag, and uses articleissues over single template when more then 2 tags are placed.
- The bot won't check redirects, pages marked as disambiguation and pages marked for CSD, nor any pages that are already removed.
- The bot won't double-tag. It can detect already placed {{ArticleIssues}} tags; Similarly it can detect single level templates, along with every listed alias of those templates.
- Since the bot won't be running all day it remembers the article it last tagged before shutting down. When started again it will proceed where it left of (Can be manually reset after vacations, etc)
- By default the bot checks the api for new pages every 30 minutes; The bot will keep those stored in a database and will form a buffer of articles to tag, with older article's having priority. Tests showed it is extremely rare for the bot to tag article's younger then 5 minutes. I hope this covers the "Pages take time" argument.
- Edit rates are currently locked at 1 edit every 6 seconds. I am open for any advice regarding this rate.
Excirial (Contact me,Contribs) 21:05, 11 June 2009 (UTC) - Bots that add tags to articles tend to be controversial. See Wikipedia:Bots/Requests for approval/Erik9bot 9, where a proposal to add {{unreferenced}} to all unreferenced articles had to be modified in order to overcome objection from a number of editors. Some people feel that visible cleanup templates on articles detract from the reader's experience, and should not be used. – Quadell (talk) 14:46, 14 June 2009 (UTC)
-
- Well, if there is consensus i will add categories instead of visible templates, but i find myself surprised with the discussion at Erik9bot 9. The majority of the tags added to new articles these days use friendly, which are all visible tags. Similarly the 500k or so articles in WP:BACKLOG all use visual tags; why should a bot that does exactly the same work be subject to an issue called "Ugly Templates"? If this should be changed i much rather see an RFC that changes this wikipedia wide, rather then trough individual judgement that will only create a mass of different styles instead of uniformity. For now i wont pre-emptively change this as i see no consensus on this. Excirial (Contact me,Contribs) 07:23, 16 June 2009 (UTC)
-
-
- if there is consensus i will add categories instead of visible template I oppose adding hidden categories without tags unless the bot regularly runs a review of edited articles to remove the categories when the parameters are no longer met.BirgitteSB 14:27, 17 June 2009 (UTC)
-
-
-
- It will not. Coreva will only tag an article once it is created. Technically it could easily iterate trough every wikipedia article, but that would be inefficient to say the least. The intent of the bot is giving new article's some basic improvement advice and traceability. New editors can technically see what they should improve, and it prevent article's from fading into the great unknown because they are not linked to\from anything else. I assume that was the original criticism basis: The other bot would scan a mass of (Long time) article's and add visible templates to them. Excirial (Contact me,Contribs) 14:47, 17 June 2009 (UTC)
-
-
-
-
- I can't say whether that was the basis of the original criticism or not. I do not object to adding visible tags. But I disagree with this idea that because people object to that it will be OK to just add hidden categories. Without any sort of visible prompt, people are not going remove these categories as the article matures. The categories will be filled with false positives by the time the backlogs are worked through to these months.BirgitteSB 18:37, 17 June 2009 (UTC)
Approved for trial (10 edits). This is a very long RfBA, and the specs have changed throughout and are difficult to follow. I think the best way for all parties to understand what this bot would do is to give it a very small trial. – Quadell (talk) 13:12, 18 June 2009 (UTC) - It seems that Coreva needs a slight correction - On the first two edits i made a manual issue, setting the bot to "Show only tags" mode which caused it to blank a page. On the next two page i noticed that i made a slight error in the saving code. Coreva accidentally saved the article's it was currently checking to the page is was processing, causing an overwrite with the wrong page contents.
- My fault for assuming that a not throughly tested function would just work! It should not take to long to fix this though - ill run Coreva in diagnostics to test it, and after that i will resume the test run. Excirial (Contact me,Contribs) 17:42, 18 June 2009 (UTC)
-
- {{BotTrialComplete}} - I took the liberty to make a new set of 10 edits after i fixed the above issue - it proved to be a minor issue where i confused two functions, one used for working on the NEXT page, and one that was user for the current page (Causing it to mix up two pages). The new edits are marked 0 to 9; The error on tag 9 - the incorrect addition of a sections template - has already been fixed. Excirial (Contact me,Contribs) 18:24, 18 June 2009 (UTC)
-
-
- Another issue, the incorrect dating of maintenance tags (It didn't include "Date=") has also been solved now. Excirial (Contact me,Contribs) 07:54, 19 June 2009 (UTC)
Approved for trial (20 edits). Okay, let's have another go. – Quadell (talk) 22:38, 22 June 2009 (UTC) Trial complete. Again a few slight bugs which have been fixed. In retrospect it might have been wiser to develop Coreva for a single template and add additional functionality later on - at least it would have prevented the need for repeated trials. - * Bug 1: Incorrect tagging with Unref templates. - Not really a bug. I optimized the regex and missed a character in the process, causing it to always fail.
-
- Umm, of course it's not a bug if you forsaw the false positives, but not very considerate to leave it to others to fix them! Why dont you manually review flagged articles first? Sparafucil (talk) 00:33, 29 June 2009 (UTC)
-
- Of course i did not foresee the errors, if i did, i would have fixed them before running Coreva wouldn't you think :)? What i means with "Not really a bug" is that i already had the correct code, but that i made a small copypaste error in it causing it to malfunction. As for cleaning this one up, i went after those myself and reverted them? Excirial (Contact me,Contribs) 06:53, 29 June 2009 (UTC)
- Ah, we're perhaps talking about different things then, and I'm sorry for the sarcasm. What got my temper up was [edit] in the article space, labled as a trial edit. Why should checking for references be done by a bot? The article clearly is based on the Encyclopedia article given at the end. Sparafucil (talk) 23:01, 29 June 2009 (UTC)
- * Bug 2: Tagging of a disambig page. - Caused by the use of a specialized disambiguate template ({{Hndis}}). Coreva now checks for these, and every other template listed onto that templates "See Also" section. Never even knew those excisted :).
- * Bug 3: Dating categories incorrectly.- Apparently an uppercase "Date" is not accepted as a parameter, so i use "date" now, which is accepted.
- Excirial (Contact me,Contribs) 20:37, 28 June 2009 (UTC)
- If you haven't put one in (I couldn't see a mention from skimming through this page), then you should add a limit to how soon after creation the bot it "allowed" to tag. Because it's possible (although highly unlike since it doesn't run very often) that it would get in the way of deletion tagger etc. It should only mark articles which have been around for a few hours or so. - Kingpin13 (talk) 19:33, 2 July 2009 (UTC)
- * Bug 4: adding requests for inline references to one or two sentence article. This makes it hard to read the article. Particularly after smackbot moved the huge banner to the center of the article. Footnotes are useful for the reader, but the banner has more text than the article, and with so little text, footnoting is not so urgent. Please look at the articles and watch-list them while your bot is in its trial phases. If the banner overpowers the article it should not be there. Consider that very short stubs (about 32 words of texts) with banners with the same number of words may not need the banner and a copy of bots modifying the article. --69.226.103.13 (talk) 07:11, 4 July 2009 (UTC)
- {{OperatorAssistanceNeeded}} Any news on bug four or the status of this request? MBisanz talk 22:04, 18 July 2009 (UTC)
| [edit] Quick status summary Since this RFBA is quite old, it contains a lot of information which is no longer completely up to date. Besides, it has become so long that it is somewhat unreadable, thus here is a summary for quick reference. General
What will Coreva-Bot's task be?': Coreva-Bot will function as a newpage patrol, checking article's for problems. Once it has found an issue it will add the appropriate maintenance templates to the articles. How will Coreva operate? If coreva is started the first time - that is, its database backend is empty - it will query the server for the last 500 new pages list and save that list to the backend; If coreva already has data in its back end it will query the server for all pages created since it last ran (5000 limit, 500 for now as it is still not marked as a bot). Coreva will then load pages and check pages, filling its save buffer. The speed at which pages are checked depends on the amount of pages in the buffer - more pages means longer intervals. Every 6 seconds the buffer will be checked if there are pages to save - in case they are the oldest page will be saved with templates added. Tagging Article's What will Coreva-Bot template for?: {{Uncategorised}}, {{Unreferenced}}, {{Footnotes}}, {{Wikify}}, {{Orphan}}, {{Sections}}, {{internallinks}}. Statical analysis shows that the {{peacock}} template is prone to errors, which is why it is disabled indefinitely. What restrictions apply for tagging: Coreva will not template any pages marked as CSD - but it will template PROD and AFD pages. Coreva will not tag removed pages. Coreva will not tag pages marked as Disambiguations (Includes the basic disambig template, all aliases and specialized disambiguate templates such as {{tl:hndis}}), It will not tag pages twice with the same template, in case maintenance templates already exist, What are the criteria for each template to be added?: (Note: These criteria are constantly improved - Do note that they only grow stricter trough). Templates will not be added if one is already present. - Uncategorized: The article has 0 categories - Note that any category, including maintenance categories, count to this limit.
- Unreferenced: The article contains no, or an empty reference header and no <ref> tags.
- Footnotes: The article contains a reference header with any non whitespace content, and no <ref> tags. Also, the templates {{1911}} and {{JewishEncyclopedia}} must not be present.
- Wikify: The amount of internal links is 0.
- Orphan: The article has no other article's linking to it.
- Sections: Exponential mathematical formula
- Internallinks: The article has less internal links then one for every 1000 characters. Note that, while being a rather unsophisticated filter, this works pretty well.
Technical and operational limits
- Article's younger then an hour are not checked - instead the bot goes to sleep mode until it is allowed to tag again.
- Coreva tracks pages tagged - unless manually reset it will not tag the same page twice.
- Edit rate will never exceed 10 edits per minute; Mostly the bot will be around 7 or 8 ish edits, depending on the amount of pages in its buffer.
- The bot will query the server once on startup, and then again once every 30 minutes for new pages. Each page checked requires two queries: One for the article content, and once to check if the article is an orphan. In case the article needs to be updates the bot will save the page once for every required article.
Todo Coreva is quite near being "finished", at least the integral part of it. Due to the amount of templates the bot handles its filters will likely be constantly tweaked to reflect new templates or guidelines. In the future i might submit another feature request that in case Coreva runs out of new pages, it will check trough older pages at snale speed. Other then this the only thing that remains is some work on the GUI and efficiency of certain sections - none of which should change it controversially. [edit] Re-Opened (Yet again *Sigh*) Due to some unforeseen circumstances i have been almost completely inactive the last 3 or so months, causing this bot request to expire yet again. Finally having found some spare time to work on this bot again, i would like to reopen this RFBA. As for the current status: Bug number 4 is now solved, Coreva will only add the footnotes template to pages of substantial length. It will also converts ampersands and other reserved HTML characters correctly now before saving the page, and I also updated the regex's used to determine if a template should be placed; thus reducing the amount of false positives. Excirial (Contact me,Contribs) 22:11, 30 October 2009 (UTC) - The intention is to tag 6 minute old articles with maintenance templates? Why? Is there community consensus that an editor should have only 5 minutes to write before a bot tags the article? What might be missing, imo, is a few more minutes to write an article.
- Personally, I'd let the bot finish it if my editing was interfered with in this manner. It takes hours to write an article. Sometimes I post a stub first. I'd like to see the community consensus for these tasks, for the templates to be added by a bot, and for the amount of time before adding the templates. It seems hostile if I understand the time frame correctly.
- Also, how many templates will it add? It seems to say it will only add one, but which one of the many? Or will it add more? --69.226.106.109 (talk) 02:41, 31 October 2009 (UTC)
-
- Im not certain where you got the 6 minutes part, as Coreva is hard coded not to check any article's younger then an hour - If it runs into article's younger then an hour it will automatically disengage from tagging them until they are the required age. In that time the bot could very slowly iterate trough wikipedia's older articles to see if they have any issues - though for now it just halts itself until it is allowed to tag again.
- You said you'd check for new pages every 5-10 minutes, so I guessed 6 minutes after the new page appeared it could have a tag on it. Is an hour a time that the community considers reasonable?
- (See below)
- The "Minimal time" part is of course easily changeable to a longer or shorter duration (It used to be 30 minutes actually), but in this case i chose for an hour so that any new contributer still has a chance to see them - and thus receives some input on how to improve this article. Keep in mind that new page patrols using FRIENDLY or similar software exhibit the same behavior as the bot - only faster. For example this article was tagged within 15 minutes and this one was tagged within 40 minutes. Note that these are just two random article's i angled up; I have seen plenty being tagged within 10 minutes. Similarly quite a few are left completely not tagged while they still need quite some work.
- I don't think it works that way, and it's hard to follow the reasoning behind, well, human editors do this and it's worse so than what the bot will be doing...
- Due to the way patrol tools work article's tend to get tagged sooner rather then later as article's are mostly processed on a near real time speed. During the development i tended to mimic already present tools and procedures as much as possible as those are obviously legal to use within the guidelines. Coreva has the added advantage it can simply query the API to receive a list of recent changes, so from that perspective it matters little if the wait time is an hour, a day or a week. As far as i know there is no community consensus regarding tag time with maintenance templates - if there is please tell me. It takes rather short since it only means adjusting a single number.
- How about finding out about some reasonable length of time by getting some feedback from the community? An hour seems reasonable to me, unless someone is still working on the article right then. I created an article of average difficulty from one of the lists of missing articles to see how long I usually work on it before I would leave it for a while, Chaetopterus and an hour seems okay, because I usually add more sources to my articles than most editors. But I would feel more comfortable about the timing if it were in lines with voiced community guidelines. I do appreciate that you considered how users usually go about it. --69.225.3.198 (talk) 21:36, 2 November 2009 (UTC)
- Certainly, it is always good if a bot has some form of community consent, and i will inform tomorrow at the village pump what users think a reasonable time would be. As said before my timing was mostly based upon given editors some time, while at the same time allowing new users to receive some feedback. However, seeing you raised the issue that a bot tagging halfway can be annoying im more then happy to change that - Personally i always work in user space unless its a small stub i can just create in minutes.
- Yes, I think asking the community is good for what would be a reasonable time for a bot tagging new articles.--69.225.3.198 (talk) 23:29, 2 November 2009 (UTC)
- Asked here. Feel free to comment if you are interested in it. :) Excirial (Contact me,Contribs)
- As for the templates, Coreva will add one for every issue it detected. However, when multiple issues are found it will mimic WP:Friendly and add the grouped {{articleissues}} template instead. Last, Coreva will only check an article once - after that it will not check it again unless i manually reset the bot. I this optic it is not that different from a new page patrol, who might tag your article with maintenance templates as well. Neither Coreva, nor patrols are mindreaders, which means both do not know if you intend to continue work on an article later on. In both cases removing the templates in an edit you were already making is sufficient to keep the tags off. Excirial (Contact me,Contribs) 10:52, 31 October 2009 (UTC)
- So, if a user writes a single line stub on an organism it could essentially be tagged with so many templates in an hour after it has been written that the reader cannot find the text in the article? IMO this is the equivalent of a speedy deletion, if you make it impossible to read the article by obscuring the text with tags? --69.225.3.198 (talk) 16:15, 2 November 2009 (UTC)
- The only thing Coreva usually signals a well written stub article for is the lack of references; Diego de Miguel for example came trough without any tag at all. Of course what you mention is possible; On the other side Domohani Kelejora High School was tagged with three templates but this was because the article was plain text without any wiki formatting at all, which meant it really needed work done. Excirial (Contact me,Contribs) 18:16, 2 November 2009 (UTC)
- So, what's a well-written stub? --69.225.3.198 (talk) 21:36, 2 November 2009 (UTC)
- I wrote a quick tool this evening based upon Coreva, which allows me to evaluate any article within seconds, while giving feedback what Coreva would have done if it encountered it (And i tell you, its a blessing as it is more versatile then Coreva in its analysis, meaning that i can easily test and improve the detection algorithm).
-
-
-
-
-
- Now, as for a well written stub: Your own Chaetopterus article would not have received any tag since its first revision. Also, pressing Special:Randoma while looking for stubs this were a few results: Aigües - unreferenced. Bērze parish - unreferenced. Paddy Forde - none. McCulley Township, Emmons County, North Dakota - none. Cigaritis - unreferences. These are of course older articles, so i took 5 successive new article's as well: Belarusian Independence Party - None. Infimenstrous - Ignored for CSD. Aventure en Australie (TV episode) - Uncategorised, Unreferences. The Reincarnation of Peter Proud (1973 novel) - Uncategorised, Unreferences, Orphan. Jonas Cutting Edward Kent House - Orphan.
-
-
-
-
-
- There was one false positive related to the sections template, which i traced back to a typo while coding the analysis tool, rather then in Coreva. Excirial (Contact me,Contribs) 23:09, 2 November 2009 (UTC)
- Use {{Article issues}} not {{Articlesissues}}. Rich Farmbrough, 19:21, 9 November 2009 (UTC).
From looking at these, I think I would like to have broader community consensus for the orphan tagging, and for the tagging in general. The time looks like it should be longer, say 3 hours during some periods, but this may be flexible. I don't know if the question you asked is sufficient for understanding the community's desire to tag in general. I am concerned, as I said, about adding tags to certain types of generally stubby articles. Many stubs about living things are just a single line and a taxobox, while Cigaritis would be a better article if referenced, and should be referenced, and its lack of references should be called to someone's attention, adding a no references banner across the top will overpower the text and essentially, imo, make the article useless to the reader. It might as well be deleted. Can articles be categorized unreferenced without the huge banner, or can it be put on the bottom of the page? Where are these categories of unreferenced articles, by the way, I would like to add references to many of them. --69.225.3.198 (talk) 09:26, 4 November 2009 (UTC) - On the "unreferenced" issue, the bot's stated mechanism doesn't seem nearly sophisticated enough. "The article contains no, or an empty reference header and no ref tags" misses many potential referencing techniques. Generally, I doubt the bot is going to be an effective way to process for this tag; when, for instance, there are raw links in the article, it will be difficult for the bot to differentiate between ones that are useful references and ones that are not. Christopher Parham (talk) 15:06, 4 November 2009 (UTC)
-
- (69.225.3.198) It is of course possible to add the category to the article without adding the "Visual" template, but i believe community consensus is against doing so because the requirements for improvement should be visible (If i remember a discussion some time ago correctly). The reasoning for this was that readers should be aware of the issues with the information they are presented. As for the category: it is located under WP:backlog, or more specifically under Category:Articles_lacking_sources. Currently just 188,583 are tagged, so by tomorrow you could be done with the backlog :P.
-
- (Christopher Parham) Which is why im constantly busy improving coreva's detection algorithms. The majority of the article's either has no references or references which are added correctly as stated in WP:MOS. There are indeed other techniques such as linking websites within the middle of the text (Either with an external link or just textual), dumping them all at the bottom without a section header or ref tags, and i can go on for a while with these.
-
- Most of these can however, be reliably detected. A regular expression can easily filter websites out of the article, even if they are not marked as an external link. Seeing these kind of pages are slightly rare i do not have the amount of test subject i normally like, but i was considering marking pages with multiple external links in the text for cleanup. Alternatively it is possible to ignore article's which seem to have links. This would certainly give false negatives, but it would still tag plenty of article's correctly. Currently a substation part of the article's end up being completely untagged in the first place, so it would already improve the situation, even if it does not solve it. Excirial (Contact me,Contribs) 16:34, 4 November 2009 (UTC)
- You should have a look at Erik9Bot's BRFA to see some more ways articles can contain references that aren't immediately apparent. Also \([^)]* p+\. is a good string to look for. Rich Farmbrough, 19:21, 9 November 2009 (UTC).
-
-
-
- That is indeed quite the handy RFBA. Im glad to see that Coreva covers most of the points it mentions, but there are a few things that Coreva doesn't do, or at most does differently. It seems that the mentioned bot accepts any form of link starting with http:// as a reference, regardless of where the link leads. Perhaps A valid strategy as it is quite difficult to have a false positive this way (Though false negatives would likely increase). Searching for ISBN is something i certainly have to add, similarly with "List of" / "Lists of" check, but this is something i was already planning to add.
-
-
-
- If anything i would rather not be forced to create a separate hidden category in which Coreva lists possibly unreferenced articles. If that would be the case i think i prefer dropping the check for the unreferenced template as it doesn't justify the extra work implementing it would create. I will be integrating the suggestions from that RFBA soon, but for now i became a little sidetracked with the idea that i could use Coreva to track dead references as well. The last few days i mostly spend my time tinkering on a prototype that i could integrate with Coreva. Seeing Coreva will likely have quite some downtime due to the finite amount of article's it has to check, it seems that a second activity could fit neatly into that time. Excirial (Contact me,Contribs) 22:51, 9 November 2009 (UTC)
- * I would be dead against repeating what Erik9bot did. We have a hidden category with 100,000 + articles in it: I have seen people go through their "baliwicks" just hoiking it out.
- * In terms of the tag overpowering the article I have offered
-
-
-
-
-
- and this could be made smaller, used for orphaned too. Uncat is not a problem, that is one backlog that is under control.
- * there is a question in my mind about the usefulness of "orphan" anyway. I shall raise that at VP.
- Rich Farmbrough, 21:05, 18 November 2009 (UTC).
- I like it much better than the current one. Living thing stubs, though, aren't likely to be removed even with this tag, and, again, for one sentence and a taxobox it's still overpowering. Can it be put at the bottom of the article? I think it's better to have an article flagged in some way, by a banner like this for example, if it has no references, because encyclopedia articles, in general, should not be unreferenced. I'm just never sure who's fixing these unreferenced articles, or if the banners are just permanent parts of the articles. --IP69.226.103.13 (talk) 11:30, 19 November 2009 (UTC)
[edit] tasks • contribs • count • SUL • logs • page moves • block user • block log • rights log • flag Operator: ThaddeusB Automatic or Manually assisted: automatic, unsupervised Programming language(s): Perl Source code available: here Function overview: fill in tables with data on prehistoric creatures Edit period(s): one time run Estimated number of pages affected: 13 Exclusion compliant (Y/N): N/A - this only applies to user & talk pages, correct? Already has a bot flag (Y/N): N Function details: Using a large database of information downloaded from http://paleodb.org and http://strata.geology.wisc.edu/jack/ the first function of this bot will be to fill in the tables found in various "list of" articles. A sample entry has been filled in here. Any data that is missing from the database will simply to left blank. Only a tiny number of pages will be affected, but the amount of bot filled in content will be immense. As such, I am suggesting the bot trial be something along the lines of "the first 10 entries on each page" rather than a number of edits. A copy of the database is available here (425k). The database is organized as follows: Genus--Valid?--Naming Scientist--Year named--Time period it lived during--Approx dates lived--locations - A "1" in the valid column means it is currently listed as a valid genus, "NoData" means it couldn't be determined - most likely because there are two genus with the same name, and "No-{explanation}" means it is not currently listed as a valid genus.
- Data proceed by a "*" means it was derived from Sepkoski's data, using the dates found here (compiled by User:Abyssal). All other data came from paleodb, using their fossil collection data for more precise dates (when available.)
- Spot checking of my data is encouraged, although I'm confident no novel errors have been introduced. If anyone knows of additional sources to derive similar data, let me know and I'll incorporate those sources into the database.
List of pages to to affected: (might be expanded slightly if others are found) [edit] Discussion Long discussion Source code will be published shortly, although the code itself is a quite simple "read from db publish to Wikipedia" operation. --ThaddeusB (talk) 01:48, 3 September 2009 (UTC) - Now available here. --ThaddeusB (talk) 13:16, 8 September 2009 (UTC)
I have spammed asked the relevant projects for input: [2] --ThaddeusB (talk) 02:08, 3 September 2009 (UTC) [edit] Bullets? Maybe the countries should be a bulleted list to save horizontal space. IE: as opposed to "Switzerland, Poland". Abyssal (talk) 15:30, 3 September 2009 (UTC) - Which is preferable: Option 1
| Genus | Authors | Year | Status | Age | Location(s) | Notes | | Advenaster | Hess | 1955 | Valid | 171.6 Late Bajocian to Late Callovian | Switzerland, Poland | | | SampleEntry | Hess | 1955 | Valid | 150 Early Cretaceous to present | Switzerland, Poland, United States, France | | | Bad Genus | | | Invalid | | | rank changed to subgenus, name to Genus (Sub genus) | - Option 2
| Genus | Authors | Year | Status | Age | Location(s) | Notes | | Advenaster | Hess | 1955 | Valid | 171.6 Late Bajocian to Late Callovian | Switzerland Poland | | | SampleEntry | Hess | 1955 | Valid | 150 Early Cretaceous to present | Switzerland Poland United States France | | | Bad Genus | | | Invalid | | | rank changed to subgenus, name to Genus (Sub genus) | - or Option 3
| Genus | Authors | Year | Status | Age | Location(s) | Notes | | Advenaster | Hess | 1955 | Valid | 171.6 Late Bajocian to Late Callovian | | | | SampleEntry | Hess | 1955 | Valid | 150 Early Cretaceous to present | - Switzerland
- Poland
- United States
- France
| | | Bad Genus | | | Invalid | | | rank changed to subgenus, name to Genus (Sub genus) | - Any is fine by me. --ThaddeusB (talk) 18:49, 3 September 2009 (UTC)
-
- Definitely 2 or 3, although I don't care which. I love that you're using the sort template. <3 Also, could you have the year link to the "year in paleontology" article? And link to the countries? Abyssal (talk) 04:35, 4 September 2009 (UTC)
- No problem, I will link the date & countries. --ThaddeusB (talk) 13:45, 4 September 2009 (UTC)
- Hmm, WP:Context and all that - re countries. Surprise links to "year in" are not that great either. Rich Farmbrough, 09:51, 6 September 2009 (UTC).
Invalid genera should be in separate table, because it is better for general public. If there is any reason to have them together, then it could be also OK. Such tables will be also useful. --Snek01 (talk) 11:48, 4 September 2009 (UTC) -
-
-
-
- I'm neutral on linking the countries, although I will point out that such links would allow someone to easily figure out where in the world the fossil was found. I view the "year in" links as completely appropriate as naming of new genus is something that is/should be covered in those articles. --ThaddeusB (talk) 13:16, 8 September 2009 (UTC)
- The bot is only going by the entries that already exist in the tables & it looks like there are only about 5 invalid entries across the dozen or so pages. As such, I've leave it up to regular editors to pull the entries out of the main table rather than trying to write a function to do it.
[edit] Improper age sorting (fixed) A sample update has been performed here. --ThaddeusB (talk) 13:16, 8 September 2009 (UTC) - I haven't had time to look closely at it, but I notice the age sorting isn't working right, but I can't figure out why. Any idea? Abyssal (talk) 18:31, 8 September 2009 (UTC)
- Hmm, could you be more specific as it seems to work for me. The first click put it from most recent to oldest and the second from oldest to most recent. --ThaddeusB (talk) 00:57, 9 September 2009 (UTC)
- The first 20% or so is alright, but after the Miocene (when viewed in ascending order) it starts listing Jurassic stages, then it goes back to Pleistocene, and for some reason Cretaceous ages are listed as if they were the oldest. It's looked this way both at home and at school. The browser I used is Firefox. Abyssal (talk) 19:09, 9 September 2009 (UTC)
- OK, I figured out the problem. Apparently the {{sort}} template doesn't work properly with numbers, so everything is being sorted "alphabetically" - that is 1 < 10 < 2 etc. It was only coincidence that the first 20% is still correct (and I didn't look down far enough to realize the error). I'll add a fix for this tonight & re-upload the sample page. --ThaddeusB (talk) 20:05, 9 September 2009 (UTC)
- Good sleuthing! Abyssal (talk) 22:28, 9 September 2009 (UTC)
I believe the latest upload fixes the issues. --ThaddeusB (talk) 21:19, 10 September 2009 (UTC) -
- How did you fix the problem (I'll need to use the same method for other articles). Abyssal (talk) 02:28, 11 September 2009 (UTC)
-
-
- By adding enough zeros in front of the numbers to make them correctly sort as strings. That is, by converting "1.23" to "001.23", "45" to "045", etc. --ThaddeusB (talk) 20:41, 12 September 2009 (UTC)
[edit] Need for outside input Just a thought, but in light of the Anybot debacle - it might be a good idea to put a call out to the WikiProjects and recruit some marine biologists/fossil guys/crustacean guys/etc. to come take a look at your trial edits and check them over with a fine tooth comb before the bot is given final approval. If Anybot taught us anything, it's how simple errors in interpreting database content can lead to masses of incorrect information going live to the 'pedia and remaining there for months, unnoticed. --Kurt Shaped Box (talk) 22:28, 10 September 2009 (UTC) - I certainly want several people to look over the data and have notified the 6 most relevant WikiProjects. So far, those notices don't seem to have attracted many people. :( --ThaddeusB (talk) 22:45, 10 September 2009 (UTC)
- Can you try asking specific editors by the type of articles you intend to produce? Asking editors if they will check the data? --69.225.12.99 (talk) 02:34, 11 September 2009 (UTC)
13 pages? Why do we need to approve a bot for that? Run your program. Dump the output to a screen. Post it by hand. Preview. Save. A bot will save you a few minutes whilst vastly increasing the risk to the project. Hesperian 23:55, 10 September 2009 (UTC) - First off, let's not be ridiculous - me running a program locally & manually uploading the data is no less of a risk than me running a program locally & it automatically uploading the data. Now, there are several reasons I am requesting approval rather than just uploading the data:
- Yes it will only edit a few pages, but the amount of data it will import is immense, as we are talking about the automated filling of several thousand table entries. The amount of info that is being auto generated and added deserves community consent, IMO.
- I want as many people as possible to look it over to make sure the bot isn't adding inaccurate info. If I just uploaded it all in my name, it wouldn't get the same scrutiny
- There is a planned second part of this task (automated creation of stubs) that will edit thousands of articles. This will be a separate BRFA, but the idea here is to get any bugs/inaccurate input data fixed on a relatively non-controversial task before moving onto a possibly controversial one. If the bot can handle accurate adding content to existing articles, then there is concrete evidence that it should be able to create stubs with prose based on the same information.
- I hope that clear it up. --ThaddeusB (talk) 00:20, 11 September 2009 (UTC)
- I obviously agree with Thad. This sort of tedious point-by-point extraction of information from a database is what bots are Wikipedia bots are made for. As someone who has filled similar tables out manually, I can vouch that using a bot for this purpose is the most effective way to accomplish it. Abyssal (talk) 02:18, 11 September 2009 (UTC)
- I agree with that, Abyssal. I run scripts like that myself. But you don't need a bot account to run a script against an external data source. You only need a bot account to post the formatted results to Wikipedia. Personally I prefer to run my scripts, examine the results, tweak the scripts and run them again if necessary, iterate, eventually load the results into an edit window, preview, tweak, and finally save. It is a lesser risk to do it this way. The risk is only the same if you are going to copy-paste-save without examining the results, just like a bot would do. And in that case, I cannot comprehend why a bot is necessary. Once you've generated the data, posting 13 pages by hand will take you 6½ minutes. Thaddeus, I sure hope you'll be spending more time than that implementing and testing your bot. So where is the benefit? I also dispute the scrutiny argument. People don't scrutinise bots more; they largely ignore them until they screw up royally. And no, you don't need to obtain community consent before you edit Wikipedia, even if you are posting big pages. Hesperian 05:42, 16 September 2009 (UTC)
- I already have examine the input data, tested the bot, and reviewed its results fairly extensively. I've probably put around 40 hours into it in fact. I am well aware that I didn't need approval to do this task. I merely feel it is better to do it with approval than without. This vetting process has already led to some subtle improvements that likely would have never happened if I'd only reviewed the output on my own. --ThaddeusB (talk) 13:09, 16 September 2009 (UTC)
[edit] Discussion regarding some objections -
-
- In my opinion, I don't think this bot should go forward without proactive community support for the bot. This means more than no one disapproves or shows negative interest. It requires editors from relevant projects get on board for vetting uploaded data. Without a group of editors to check data, it is my opinion the potential for another AnyBot type mess exists. Yes, this is the type of work that bots should be used for, in my opinion. But it requires a human editorial community to accompany the creation of articles. I'm also not thrilled with the sideways answers to some of my questions about this bot, before the RFBA. A single straight-forward answer to a question, when first asked, is more in keeping with the kind of communication that should be done when running bots that create articles, in my opinion. --69.225.12.99 (talk) 02:33, 11 September 2009 (UTC)
- To be fair, no one asked me a single question. An IP (possibly you) asked Abyssal some questions, but they seemed to be directed towards his editing activities & not this bot. Additionally, he obviously didn't know the exactly details of how the bot would operate since he wasn't programming it... and frankly I didn't know the exact details either since it wasn't complete yet. I have explained what the bot plans to do in this BRFA (which is the correct place to do so), released the source code, and released the database. I personally have manually checked dozens of entries and I'm pretty sure Abyssal has as well. If you are offering your help in spot checking, then please check how many ever entries you want from the database, against whatever data source you can. If you find any problems, by all means please tell me so I can make adjustments to the database.
- Beyond that, there really isn't anything I can do. I can't force people to spot check the data. Nor can I, or anyone else, check more than a trivial % of the total. I have to rely on the accuracy of the data I obtained from reliable sources. I can't verify every piece of data by hand, but only confirm the general integrity of the sources from which the data came. --ThaddeusB (talk) 03:06, 11 September 2009 (UTC)
- And BTW, the bot isn't going to be creating any articles at this time. --ThaddeusB (talk) 03:08, 11 September 2009 (UTC)
-
-
-
-
- If the bot is going to be creating tables of data on wikipedia and there is not a single editor outside of the creators interested in the data, there appears to be no desire for the bot on wikipedia. If aren't willing to check the data and are not interested enough to comment on the bot, who wants the bot?
-
-
-
-
- According to bot policy, to gain approval, a bot must :
-
-
-
-
-
-
- be harmless
- be useful
- not consume resources unnecessarily
- perform only tasks for which there is consensus
- carefully adhere to relevant policies and guidelines
- use informative messages, appropriately worded, in any edit summaries or messages left for users
-
-
-
-
- Consensus involves discussion and other editors. No other editors = no discussion = no consensus for the task to be done, much less by a bot. --69.225.12.99 (talk) 06:57, 11 September 2009 (UTC)
- You are mistaken about the way Wikipedia works. Discussion is not needed to take action, discussion is only needed if the action taken if met with objection. Being bold is a core principle. If we had to discuss every action first, very little would actually get done. The task clearly falls under policy so there is already consensus to do the task. This discussion is to establish that my bot can do the task accurately and efficiently.
- Second, you mistake low input for lack of interest. Just because few people have commented here, doesn't mean no one is interested in the data. We are talking about scientific data on prehistoric creatures here, not Britney Spears. The audience interested in this material is limited, of course, but nearly everyone would agree that this sort of information is at least as important to have on Wikipedia as pop culture, even though far fewer people are interested in it.
- Third, you misunderstand what this request is for. The request is to fill in the existing tables, not create new ones. Technically (as pointed out by Hesperian) I do not even need BOT approval to do the task. The request was created in part to solicit additional eyes to make sure the data is accurate. Again, the information is 100% from reliable sources and a bot will copy the information far more accurately than a human ever could. So, I ask again are you willing to look over the data? Or are you just trying to block the task from happening? --ThaddeusB (talk) 15:28, 11 September 2009 (UTC)
- No, I'm not mistaken about how wikipedia works, and I'm not mistaken about your insulting me rather than addressing the issue. The use of bots does not function by the, be bold, run a bot, make 10,000 entries, then decide if the community wants it theory. Please read the bots policy in its entirety before requesting approval for a bot. It is your responsibility as a bot operator to adhere to bot policy. You can't do that if you don't know it.
- I think it's time to close this bot request for approval until there is community consensus for this task to be done. The lack of editors monitoring AnyBot was problematic enough, but that bot at least had some community consensus. This bot has absolutely none, and its operator denies there is any need for community consensus for its task. This is a bad start. Couple with the bot operator's combative nature and inability and/or unwillingness to address issues, I can't see anything but disaster and another mess of 1000s of entries for someone else to clean up. --69.225.12.99 (talk) 16:32, 11 September 2009 (UTC)
-
- As the creator, designer, and near sole contributor to Wikipedia's lists of prehistoric invertebrates, and also the user who "proactively" sought out someone capable of programming a bot to perform the task at hand, I am curious as to who you expect us to seek consensus from. Should I make sock puppets and then ask them if they approve? I created every single one of the lists ContentCreationBOT will be contributing to, and somewhere around 23-26 of the 28 lists of prehistoric invertebrates in total on Wikipedia. Yes, I gave a range of pages there, as in "I created so many of them that I've lost count." The community-of-people-who-contribute-to-Wikipedia's-lists-of-prehistoric-invertebrates consists nearly entirely of myself, and there is strong consensus between myself and I that this task should move ahead. It's also nice of you to dishonestly claim that we're creating stubs here. We've worked diligently for months preparing this bot and you're willing to shut us down without even having read the description of the task we're requesting approval for. And the when we don't just hump up and take it, you throw a hissy fit and demand that the discussion be closed. Wow. Abyssal (talk) 17:20, 11 September 2009 (UTC)
- I didn't use the word stub, until now. I think that this type of personal attack of people ("you throw a hissy fit") who have questions and concerns about the bot, once more, bodes poorly for the use of this bot to create any type of content. --69.225.12.99 (talk) 03:39, 12 September 2009 (UTC)
- I took "create 10,000 entries" as you implying the bot would engage in stub generation, if that's not what you meant, sorry. Assuming you were referring to the data-adding task the bot isn't "creating" the enries, it's filling in blank entries in tables that already exist. Abyssal (talk) 18:15, 12 September 2009 (UTC)
- 1) I didn't insult you, and I am sorry you took it that way. I merely stated that you don't appear to understand what consensus means (at least as it applies to bot tasks).
- 2) "Perform only tasks for which there is consensus" means perform a task for which there is generally consensus. It doesn't mean we need 20 editors to come comment on every bot and say "yep, there is consensus for this task." There is already implicit consensus for adding this sort of information to articles as it has been done hundreds of time by many different editors with no objections. The bot can just do it faster and more accurately.
- 3) I have 3 approved bots and understand bot policy thoroughly.
- 4) You are arguing over semantics, not substance. I most certainly didn't claim the bot doesn't need consensus.
- 5) I can't "address the issue" as you have not outlined any actual issue with either the bot or the RS data it is using. You have only stated your personally opinion that you think more people should look at the data.
- 6) Do you have any actual policy based objection to the task of filling in existing tables with reliable source data? Or any objections to the data/code to put it on Wikipedia?
- 7) If the bot is rejected I will just manually upload the exact same data - as is my right as an editor - and the community will loose the benefit of explicitly knowing it was extracted from a database by a bot. Again, I didn't even have to ask for approval of this task as there is no actual need to automate the uploading. I did so for the community's benefit. --ThaddeusB (talk) 16:56, 11 September 2009 (UTC)
I would also like to add that the data has been looked over by accredited scientists or it wouldn't have been on Paleodb.org to begin with. To expect me or anyone else to manually check every entry is completely unreasonable and, in my opinion, would be more likely to introduce novel errors than improve overall accuracy even if it was possible. --ThaddeusB (talk) 17:02, 11 September 2009 (UTC) -
-
-
-
-
-
-
-
- I stand by my original "hissy fit." --69.225.12.99 (talk) 03:39, 12 September 2009 (UTC)
- Does that mean you also stand by your refusal to provide any concrete objections? --ThaddeusB (talk) 17:38, 12 September 2009 (UTC)
[edit] Question for ThaddeusB and Abyssal As a matter of interest - and this may help to assuage "the IP algae guy"'s (as I think of him, based on our previous work together cleaning up after Anybot and in lieu of a better name) doubts and concerns, how familiar are you guys with the taxonomy of prehistoric invertebrates in a non-WP context? Is this your chosen field, area of scholarly interest or hobby? IOW - do you *know* these ex-critters, or are you strictly data-processing here? The reason I ask is so that it may be established how likely it would be that a subtle misunderstanding/misinterpretation of the data presented at Paleodb (perhaps due to incorrect assumptions being made from incomplete knowledge) could occur, go unnoticed during the transfer because no-one knows what to look for - and thus result in massive factual errors being introduced to the wiki. Going back to Anybot, one of the reasons that it failed so hard was the that BotOp didn't really 'know' algae to any great extent - but did he earnestly believe that he was capable of extracting the data automatically and formatting it into encyclopedia articles. That's all well and good, if it works - but well as misunderstanding some fundamental algae-related terms, he incorrectly assumed (as far as I am aware) that 'number of taxa/species listed at AlgaeBase = number of taxa/species known to science' and ran with it. Then, as there was no-one else around at the time who knew better (or perhaps because no-one else even looked at the resultant articles once they'd gone live), the assumption was made that the bot's output was correct. IIRC, the systematic errors were only uncovered when "IP algae guy"'s students started handing in coursework containing WP-sourced nonsense. This scenario may not be exactly applicable to the Paleodb dataset - but before this goes any further, I would like to gauge the likelihood of the same thought process being applied again and leading to a different, but equally-borked end result. --Kurt Shaped Box (talk) 09:09, 12 September 2009 (UTC) -
- I believe the main problem with Anybot was programming, not lack of knowledge - although the second obviously contributed. The programmer made several fundamental errors like not resetting variables & making it runnable from a remote location without a password. These problems were not caught because 1) the code wasn't published and 2) no one who knew what they were doing looked over the sample stubs from the trial run. My code has been published, as has the data, as has a sample page. The code is not runnable remotely.
- I do not personally have any knowledge of the subject. I was solicited as a capable bot op, with (what I believe to be) a reputation for carefully checking my bots' output & correcting errors. Abyssal is the one with the knowledge of the subject and the idea for the bot. He was doing the task manually for some time, but some users contacted him to say they thought a bot would do the task faster and more accurately, which is true. A human copy and pasting data will make an occasional error despite their understanding of the subject that a bot wouldn't make.
- The reason I have been asking User:Abyssal about the bot is because there are problems with the fish stubs he/she created en masse. I am still waiting for him/her to respond to a question about the fish stubs I put on the user's talk page on May 30th.
- While Abyssal claims to be the only working invertebrate paleontologist on wikipedia that is incorrect. I edit invertebrate paleo articles as do a number of my colleagues. Ultimately I will be more concerned about the stubs, but, thank you Kurt Shaped Box for reminding me how I met Abyssal: correcting problems with stubs. Oh, by the way, I'm not really algae guy, as I've said before, I'm marine invertebrate paleo guy. --69.225.12.99 (talk) 09:29, 12 September 2009 (UTC)
-
-
- You are putting words in Abyssal's mouth. He didn't claim to be the "only working invertebrate paleontologist." He claimed to be the only one working on the specific articles for which this bot will provide data. Additionally, the error you found is precisely why this task should be done by a bot. No human is going to be able to copy gobs of data without introducing novel errors. A well made bot, won't introduce novel errors, although obviously it won't correct any errors in the original data either. (However, by Wikipedia policy we really should be goign by what the source says anyway, not using our own knowledge.) If you have any source to cross-check the paleodb data against, I'd love to hear it. Otherwise, I say that this is the best available data and that there is absolutely nothing wrong with reproducing it.
- Again, why are you trying to block this task. You say you have knowledge with the subject, yet you refuse to offer you help looking over the data. You demand I find people willing to do this, yet you yourself are a prime candidate to help & refuse. Why? --ThaddeusB (talk) 17:38, 12 September 2009 (UTC)
- If anyone has issues with the stubs I made before, all they have to do is ask. If they've asked and I've forgotten to respond, all they have to do is ask again and remind me. That's much nicer than asking, then waiting for months before bringing it up as an act of passive aggression. Also, the substubs I created are not only irrelevant to the discussion on face, but they also bear little resemblence to the fuller, more complete stubs that ContentCreationBOT may create in the future and serve as a poor analogy for such. Abyssal (talk) 18:30, 12 September 2009 (UTC)
- Okay - 'Marine Invertebrate Paleo Guy' is is, then... :) By the way, is this the user talkpage post you're talking about? If so, Abyssal did reply to you. --Kurt Shaped Box (talk) 09:41, 12 September 2009 (UTC)
- No, in fact he/she didn't. The last question I posed has been ignored since it was posted--reread the post about the two articles with similar names. He/she only responded to the first part, agreeing I had corrected his data for the one article, and thanking me for doing so, but not for the question of whether both articles for what appear to be a single organism should be on wikipedia. This last is precisely the type of mistake that needs reviewed and corrected by humans with content creation bots. This bots owner and assistant have resorted to bullying. Bullying by the bot operators coupled with failure to act = giant mess on wikipedia that someone else has to clean up. It took months and probably a dozen wikipedia editors to clean up the AnyBot mess. In my opinion it's time to put an end to this RfBA as a place for User:ThaddeusB to post his personal attacks, since he doesn't have what is necessary for running a bot of this nature, and is focused on attacking me rather than getting the bot together. --69.225.12.99 (talk) 18:17, 12 September 2009 (UTC)
-
- The only one attempting to bully people here is you with your whole I don't agree with this so let's shut down discussion right now attitude. You have repeatedly demanded this not take place, but still have yet to offer a single constructive comment. You say I am focused on "attacking you rather than getting the bot together." Um, the bot is together. There is nothing to "get together." Again, you have yet to offer a single actionable complaint with the actual bot that I can address.
- You claim to be interested in making sure the bot doesn't make any errors, yet you refuse to help. I think it is pretty clear that your objection is either philosophical against this sort of task ever being done, or is motivated by personal dislike for me and/or Abyssal.
- I have not made a single personal attack against you. I merely comments on your comments, just as you have commented on mine. Somehow it is perfectly acceptable for you to distort others comments and say whatever crap you want about them, but if they dare mention you in a reply they are personally attacking you?
- Finally "this is the kind of mistake that needs to be reviewed by humans" is an irrelevant comment because this mistake was made by a human, not a bot. In fact, this example is proof why the task should be done by a bot - humans will always make some mistakes when copying large amounts of data. --ThaddeusB (talk) 19:05, 12 September 2009 (UTC)
- I'm male, no need to use the double pronoun thing. As for bots adding errors, the articles won't be set in stone after creation, they will be subjected to the same scrutiny and incremental revisions and fact-checking that all other Wikipedia articles are. It's almost certain that bot generated content will introduce some errors, however, our human editors do substantial amounts of that as well. If a human added 99% good information and 1% inaccurate information, we would think of them as doing a good job. It's illogical to demand more from an automated contributor than from a flesh-and-blood one, but you seem to be expressing that double standard anyway. "Failure to act"? We've already run succesful demonstrations and made the full code public! What would it take to please you? As for us being bullies, well, an old proverb comes to mind. Abyssal (talk) 18:51, 12 September 2009 (UTC)
-
- I understand that all bots can hiccup and make mistakes from time to time. Unless they start crapflooding, blanking or overwriting a huge number of articles, it's a pretty matter to put right. I'm more concerned about systematic errors that could result in non-apparent-except-to-experts factual inaccuracies across the majority of the bot-generated content. How confident are you with the subject matter at hand and the interpretation of the database content that this may be avoided - or if it did occur, that you'd be able to spot it quickly? I don't want it to come across as though I'm picking on you and ThaddeusB here - but Anybot has left me wary of bots that autogenerate content in this manner, wary enough at least to be thorough in asking questions. --Kurt Shaped Box (talk) 20:17, 12 September 2009 (UTC)
-
-
- I think that thorough testing and a bit of preliminary fact-checking will demonstrate whether or not ContentCreationBOT can succesfully utilize the database to generate new content for Wikpedia. If it does prove successful in drawing from the database, then any errors will be on the database's side and thus out of our control. However, since the database was compiled and is operated by scientists, I have confidence that there will be no major problems. At this point it's really just a matter of testing. Abyssal (talk) 15:22, 14 September 2009 (UTC)
- One of the reasons the anybot mess wound up so spectacularly bad was poor communication on the part of the bot operator and unwillingness to respond to concerns. These two users see expressions of concerns about their bot as an opportunity to attack someone for expressing concerns. This will make communication hard to impossible. Poor communication means it won't matter how a mistake is made, because the response will be to attack those who raise issues. And keep attacking and attacking them. Then come back and attack them some more. In my opinion, it simply doesn't matter, once an attitude of this manner has taken hold of the bot operator, there will be no means for issues of concern about the bot to be raised, no means for problems with the data to be pointed out. All such actions will get is an attack. And another attack, and an attack from a different angle, and a new attack. --69.225.12.99 (talk) 06:43, 13 September 2009 (UTC)
- For the dozenth time, do you have any actual objection to express or are you just trying to block the bot? Also, for the dozenth time I can't address "your concerns" until you actually express something concrete. And no that isn't a personal attack despite what you seem to think. --ThaddeusB (talk) 13:06, 13 September 2009 (UTC)
- Oh yeah, I see what you mean. Abyssal - could you check to see whether Graphiuricthys and Graphiurichthys (with an extra 'h') are supposed to be two separate articles? --Kurt Shaped Box (talk) 20:23, 12 September 2009 (UTC)
- Thay're the same animal, so far as I can tell, but both have been used in the technical literature. I'm not sure which one is correct. Abyssal (talk) 15:16, 14 September 2009 (UTC)
- What to do, then? Pick the most commonly used name and redirect the other article to it (Google Scholar would suggest that 'Graphiurichthys' is the way to go)? I know that these two were human-created articles - but this is exactly the sort of thing that the bot must not be permitted to do, if it starts creating stubs. --Kurt Shaped Box (talk) 19:47, 14 September 2009 (UTC)
- Sepkoski spelled it wrong. It seems I added both the correct and incorrect spellings, forgot about it, and accidentally created an article for both. The problem seems to be pure human error on my part, and therefore unlikely to be duplicated by a bot. Abyssal (talk) 13:03, 15 September 2009 (UTC)
The problem with anybot was not necessarily the database. The data at algaeBase are fine, and the means of gathering data are identified. Part of what led to a huge mess, that caused the deletion of over 4000 articles and a couple of thousand redirects, was the lack of understanding of the data by the human coder and no community involvement in checking and verifying the articles. Add to this an operator who would not deal with the problem articles as they were pointed out and you get a couple dozen other editors having to sort out the mess and delete the content. In spite of the accusation of my being "passive-aggressive" two problem articles generated by Abyssal have been on wikipedia for a long time. He's willing to throw accusations at me, but still hasn't risen to the occasion of correcting the error. If he's going to leave articles that need deleted or corrected up, and these are just two, maybe he's expecting that someone else will clean up after the bot. These are just two articles with little information in them, and one article is wrong and needs to be either a redirect or deleted. If this bot contributes 10,000 data items, who's going to check for accuracy? If there is any inaccuracy who's going to clean it up? It seems ThaddeusB is going to blame me for not cleaning up the articles he wants to create-no, I'll do my own volunteer work on wikipedia, not yours, ThaddeusB. Let me know when you're going to start creating the articles I want. And Abyssal is going to throw accusations at the reporters of errors, but not going to correct errors. Anybot generated errors due to human mistakes. Bots are subject to human errors. An unwillingness to address or correct errors is not an indicator for responsible bot running. --69.225.3.119 (talk) 05:18, 17 September 2009 (UTC) - Yet again do you have any actual objection I can address? Anybot's code was never checked & it seems was riddled with errors. I am sorry that happened, but its operator's problems are not a reflection on me. This bot's code has been checked & verified that it will copy the data exactly as planned. Further, there is no special knowledge required to copy, for example, the naming scientist from a database to a table.
- I most certainly will listen to complaints if you have any that I can actually address. So far your complaints consist of 1) I won't manually check every entry and 2) I allegedly won't respond to complaints. The first is an unreasonable demand that would defeat the point of the bot. The second is merely speculation on your part, and runs contrary to my actual history on Wikipedia.
- Then there is your constant stretching the truth\drawing unreasonable conclusions. E.g., "two problem articles generated by Abyssal" somehow equates to this bot screwing up massively. Wow, a human that made two errors in over 2000 articles. (OK, he probably actually made a few more, but clearly the error rate is very low.) One of which was copying Sepkoski's non-standard spelling, which is hardly a serious error. That is hardly reason to shut down this bot. And again, it is completely disingenuous to compare stub creation to filling in a table - the two are hardly the same thing.
- Finally, I am not asking you to check 10,000 items I am just asking you to be reasonable and not expect me to check every item either. I have checked about 100 items and found no errors. That is a reasonable spot check. Others have checked some items as well. If you are unwilling to check even one item, then you have no right to complain that others haven't checked enough. --ThaddeusB (talk) 12:23, 17 September 2009 (UTC)
- I'd feel a little less concerned about all this if Abyssal had fixed those duplicate articles already. It's been a few days now since they were pointed out to him. Now, if the bot buggers up the current task, fixing it will be a simple matter of a few one-click reverts. However, if something goes wrong if/when the bot starts creating stubs and we end up with another Anybot-type mess, I'd hope that A. would be much more enthusiastic in trying to put it right (being the guy with the subject knowledge) than he seems to be WRT the above. --Kurt Shaped Box (talk) 23:17, 17 September 2009 (UTC)
- I went ahead and redirect it myself. Rest assured that if the stub creation (which obviously isn't being approved in this task) were to go awry I wouldn't hesitate to "delete all" first and then go and find the problem before starting over from scratch. --ThaddeusB (talk) 01:59, 18 September 2009 (UTC)
My impression from following this discussion (and participating slightly) is that the bot operator is reasonably careful and conservative, and appreciates the concerns being raised in the aftermath of the AnyBot debacle. There is no reason to tar Thaddeus with that brush. So long as Thaddeus continues to bear in mind the concerns raised here, and works slowly, and works closely with Abyssal or someone else who has a solid grasp of the field, and is willing to put the brakes if and when problems and issues are raised by others, then I am not opposed to this going ahead. Hesperian 23:52, 17 September 2009 (UTC) -
- I will certainly be cautious with this. I view myself as directly responsible for every edit my bots make & always proceed with caution. I always comb my bots' contributions and try to stamp out even the tiniest errors before releasing them on a larger scale. I assure everyone reading this that I most certainly will take any and all complaints about data integrity seriously.
- Furthermore, I am well aware the reputation of bots that produce content has been severely tarnished by Anybot. This is part of the reason I brought this minor task here to begin with. Sure, I could have just uploaded the tables manually and no one would have ever questioned it. However, I want accurate data and I want to start re-building the community's trust that bots can build content. Thus I came here. --ThaddeusB (talk) 01:52, 18 September 2009 (UTC)
--ThaddeusB (talk) 01:52, 18 September 2009 (UTC) - Thank you. I still think we should run some more trials before going ahead, just to be safe. Abyssal (talk) 01:04, 18 September 2009 (UTC)
- I disagree with you, Hesperian. The attitude by ThaddeusB and Abyssal is: they are not responsible for the mistakes the bot makes. I was accused of being "passive aggressive" for failing to parent Abyssal through a correction of an article mistake he made. I don't think this is a team that will clean up after themselves.
- The correct response to the problem with the two articles, to show good faith effort toward dealing with future problems with bots, would have been for one of them to correct the articles immediately. But, no, it was more important to call someone names ("passive aggressive) than to make the encyclopedia accurate.
- There is no community support for this bot. ThaddeusB is weirdly trying to bully me into being the bot's monitor. If he can't get anyone to check the bot, and he is not able to, and Abyssal won't, and the community isn't interested, why should this bot go forward?
- The way to deal with someone who disagrees with something you want and to gain their support is to address their concerns, stay rigorously on target on the issue, and don't tell them they are "passive-aggressive," "throwing hissy fits," "mistaken about how wikipedia operates." All of these comments are personal issues about me. If they are more important than the data, maybe the data aren't that valuable or useful to the encyclopedia.
- ThaddeusB and Abyssal have established how they will act already: They will make personal accusations against people raising issues about the bot.
- This bot is a disaster in the making because of its operating team. That's my passive aggressive, mistaken-user, can't-raise-substantive-issues, hissy-fitting opinion. --69.225.3.119 (talk) 05:09, 18 September 2009 (UTC)
- If you don't have any concrete objections (and you have yet to offer any), and no actual evidence of how I'll address complaints, then this is just your personal opinion and nothing more. And if you look at my actual record with my actual bots, you will see that I do address actual complaints in a timely manner.
- No one is trying to force you to do anything, but you posted here and said you think the bot will screw up but offered no evidence. Of course I am going to respond to that by telling you to check the data if you think it'll mess up. I have personally already checked it & found it to be accurate, but that isn't good enough for you.
- Yet again, I can't respond to some theoretical eventual complaint until one actually surfaces. Yet again, do you have an actual complaint with the bot or is this just a philosophical objection and/or personal vendetta?
- P.S. Saying someone is mistaken about something isn't a "personal issue" and I take no responsibility for the other two comments, which I didn't make. --ThaddeusB (talk) 13:18, 18 September 2009 (UTC)
- The correct response to the problem with the two articles would have been to tell you to shut up and stay on topic, but I tried to be more diplomatic. I said part of the reason your problems with edits I made (that are irrelevant to ContentCreationBOT's approval) were not addressed is because I get busy and sometimes forget about messages left on my talk page. Further, I said, if you had problems with me not addressing those issues, all you had to do was remind me about them on my talk page. Instead you waited weeks and weeks and only bought the subject up when you could use it to beat me over the head in an unrelated discussion, namely, this one. "Name-calling" or not, I stand by my description of your actions as "passive agressive."
-
-
- I have little reason to believe that you raised the issue out of legitimate concern because even after you expressed the complaint you were not very helpful in the matter of getting your own problems addressed. No progress was made towards resolving your own issues until Kurt Shaped Box stepped in. Not that any of this matters, because this is the ContentCreationBOT request for approval discussion page, not the "whine about Abyssal making an error that wasn't even entirely his own fault in a tiny article on an obscure genus prehistoric fish" discussion page.
-
-
- There is no community support for this bot? What? Who should we be asking? The guy who started the List of graptolites? That was me. What about its chief contributor? Me, again. List of prehistoric starfish? That's me as well. List of prehistoric barnacles? Another one by me. Crap! List of crinoid genera? Uh oh, it looks like a pattern is emerging. Turns out I'm both the creator and sole major contributor to every single page that the bot is slated to edit. Every. single. one. If you can find another major contributor to the articles, please do invite them to see if we can form a consensus.
-
-
- Thad trying to bully you into monitoring the bot? Come on. You claim to be an invertebrate paleontologist and you're on a website based around volunteering to edit encyclopedia articles. So, when we come here with a plan to add a lot of information to encyclopedia articles on prehistoric invertebrates, and then you oppose the addition of information to articles in your field complain that no would be monitoring the data, it's only natural that we stare at you in disbelief. Regarding my willingness to fact-check, considering that I've explicitly called for more fact checking and testing before we proceed with the bot, even though we've both performed successful tests and received tentative approval from another member, your claim that I'm unwilling to check the data rings very hollow.
-
-
- I'd love to "rigorously" stay on topic, but someone keeps raising issues about stub creation and something to do with a typo in the title of an article I created about a prehistoric fish no one has ever heard of. I'd love to address your very serious objections, but for the life of me I can't remember them. I remember a complaint about a lack of consensus, but when I pointed out that I was the only one who contributed any meaningful content to the articles the bot would edit, and that I both supported and actively solicited the creation of the bot, you ignored me. Other than that, all I remember is a long series of complaints that we weren't taking your complaints seriously enough.
-
-
- Congratulations. You've cast a dark shadow over the topic and single-handedly discolored the entire discussion. Sadly, the useful input given by of Kurt Shaped Box, Anomie, and Hesperian has gotten somewhat lost in the resulting din. Abyssal (talk)
- --69.225.3.119 (talk) 21:23, 18 September 2009 (UTC)
I see no issues with this bots proposed work, and all the legitimate issues raised have been addressed. Unless the anon IP wishes to raise a useful objection that has to do with this specific bot approval request, I don't see any further issues which need to be addressed. I'm in favor of approving the bot as it currently stands for this test run on the 13 lists given above (and perhaps the additional ones listed below if they are similar enough and the source database contains information which could be added to them). ···日本穣? · 投稿 · Talk to Nihonjoe 14:18, 23 September 2009 (UTC) - I've raised issues, and ThaddeusB and Abyssal have played word games and delivered personal insults and criticisms against me as a person. If this is the response to issues before it's running, this is, imo, how they'll respond when it's running: insult the person who raises the issue (personal attacks), play word games (wikilawyering), insult the level of wikipedia knowledge of the person raising the issue (biting the newbie--although I'm not new), and demand that if someone has a problem with the data they should devote their wiki career to monitoring the bot's input.
- No. That's my opinion. --69.225.3.119 (talk) 22:45, 23 September 2009 (UTC)
- You have absolutely refused to make any concrete objection that can be addressed. Instead you merely repeat the same line over and over about this bot will obviously screw up because Abyssal and I are bad people.
- According to your own words, you have the ability to provide expert advice on the material. Your advice on the data would be appreciated, but apparently all you want to do is criticize others and offer nothing. It's a shame that you want to play petty games ("they tried to bully me into helping, so I won't help") rather than helping to improve Wikipedia. --ThaddeusB (talk) 23:39, 23 September 2009 (UTC)
- I see that despite your fervent insistence that this proposal not go forward, Mr. IP, that my challenge for you to remind us of just one of your many very informed and serious objections continues to go unanswered. Abyssal (talk) 00:11, 24 September 2009 (UTC)
- Given the work generated for volunteers and the risks to Wikipedia's credibility when these bots go badly wrong, in my view there should be almost a standard of proof applied to bots wishing to undertake such work. If it is met, fine, then it goes ahead. But it appears one is not being applied (note I am not blaming present parties for this, I am blaming BAG for making themselves the sole arbiters and granters of such things then failing to take responsibility). This is quite dangerous given the scale of the Anybot fiasco, and some others I have seen at times where semi-automated accounts and bots have gone on mass creation sprees which have had to be deleted. Is it possible to generate a dataset or even a list with the bot (as in, not articles) with all data to be included that some appropriate person with the requisite knowledge can check, and if we can do say 500 or 1,000 of those and they basically work fine then go create the articles as per the original plan. That way we get the articles but they're credible at the end of the process. I believe Abyssal is acting in good faith, the problem is not that but the lack of a checking process by people who know the content area or a meaningful approval process - even I wouldn't feel comfortable applying under such circumstances. Orderinchaos 02:04, 24 September 2009 (UTC)
[edit] Code review ThaddeusB asked me for a code review, so here it is. Not much to mention, really: - The error checking could use some work. You properly check for HTTP errors, but for API errors in the initial page query
or for json decoding errors (i.e. a truncated response) (never mind, from_json just dies on error). - $timestamp2 will not have a value unless you run into a maxlag error when querying the edit token (check rvprop in the first query). That would probably give an error in the action=edit request.
- Will it output a period such as "Mid Ashgill to Mid Ashgill"? If so, wouldn't that be better as just "Mid Ashgill"?
- It looks like it will screw up the location field if the last entry is not a one-word location name; that may be left over from changing from plain text to a bulleted list. Should the <br /> and the substr($line, 0, -4); line just be removed?
- I note that the bot will wipe out whatever content is currently in the tables, even those marked "NoData". This may not matter, as I don't know whether there is any such content currently in the tables for those entries. It also seems that it will die if any of the tables contains a genus not in the database, which is an appropriately safe failure mode.
Anomie⚔ 16:41, 13 September 2009 (UTC) -
- Thank for the help. I fixed all the errors. The "Mid Ashgill to Mid Ashgill" thing was something I meant to correct, but apparently forgot to do. The location thing was indeed left over from changing to a bulleted list. All of the tables are currently blank, so overwriting them isn't an issue. If I needed to rerun the task for some unforeseen reason I'd change the code to be more cautious at that time. --ThaddeusB (talk) 02:55, 15 September 2009 (UTC)
- Not a programmer, so I can't say much, but thanks for reviewing the code. Also, the "Ashgill to Ashgill" thing has been bothering me, too. Is it possible to to remove the duplicate? Abyssal (talk) 15:14, 14 September 2009 (UTC)
[edit] Additional pages The bot may also be useful on the following pages: These pages would work well with the bot if put into the table format: Abyssal (talk) 16:37, 17 September 2009 (UTC) [edit] Trial? Is this ready for a trial? Mr.Z-man 00:43, 24 September 2009 (UTC) - So, this is a 100% dismissal of all objections to the bot? Why? --69.225.3.119 (talk) 01:04, 24 September 2009 (UTC)
- Oh, so this is what your bad faith-filled tirade on my talk page and the village pump is about. No, its a request for someone to summarize the huge amount of text above. If it was a dismissal of concerns, I probably would have actually done something other than ask a simple question. Mr.Z-man 01:45, 24 September 2009 (UTC)
- No, no, it's a hissy fit, not a tirade. And, if you had read the discussion you would have known that.
- You asked elsewhere if there was a summary for another bot. Here, you don't ask for a summary, you ask if it's ready for trial. This implies that the next step is a trial. --69.225.3.119 (talk) 01:54, 24 September 2009 (UTC)
- My apologies for not being perfectly consistent. That's what you get with unpaid labor. Mr.Z-man 01:59, 24 September 2009 (UTC)
- Sorry, for the bold, but I want to point out that successful trials have already been run. A link to the results of the test are here. Abyssal (talk) 15:10, 24 September 2009 (UTC)
- It would seem from reading the text that it is not yet ready for a trial. Orderinchaos 01:56, 24 September 2009 (UTC)
- On what basis? I will be happy to address any actual concerns but the only objection to date is 69.225's philosophical objection that 1) bots shouldn't do this sort of task unless every scrap of data is pre-approved and 2) I am a bad person who won't address concerns when they are raised. --ThaddeusB (talk) 02:00, 24 September 2009 (UTC)
- I raised issues. I think that bots should be used for content creation. You missed my anybot arguments. I was the editor in support of bots being used for content creation. I think User:Hesperian was against it. And, ThaddeusB, I think you're the one telling me I'm having hissy fits, I'm passive-aggressive, and now this ridiculous comment. You can continue to ignore all of my issues and call them invalid. But, my issues stand. --69.225.3.119 (talk) 02:15, 24 September 2009 (UTC)
- I obviously am too stupid to get your objections because I have told you 10 times I don't see any that aren't "there is no consensus" or "you must check every fact manually" (neither of which I can address because they are both your opinion only). I have requested be specific 10 times and you have ignored me 10 times, so who exactly is ignoring who? Oh and for the record a didn't make a single one of those comments you attributed to me, so I would appreciate it if you strike that part of the comment. --ThaddeusB (talk) 02:24, 24 September 2009 (UTC)
- Yes, of course, if I say something is a problem, and you say it ain't, it must not be an issue. Oh, I'm sorry, are they Abyssal's comments? Well, still, you're an administrator, you've approved this discussion being held while one party is being personally attacked. So, no, I won't strike it. I will render a correction: the personal attacks and insults are above. They were issued by Abyssal without any desire to see them stricken by this administrator.
- Call my issues what you want. Dismiss them. Allow others to call me names. I raised objections. You still choose to ignore them. If you choose to ignore my issues and then claim I'm ignoring you, that's just a game. I will continue to not play your game, while you dismiss my issues. No community consensus. Bot operator aggressively ignoring issues, encouraging personal attacks. --69.225.3.119 (talk) 02:29, 24 September 2009 (UTC)
- Yes, I ignored Abyssal's personal comments. That doesn't mean I approved of them. I also ignored various personal comments you directed at me and Abyssal. That doesn't mean I approved of them either. --ThaddeusB (talk) 02:40, 24 September 2009 (UTC)
- Please also note, that while some of the things I've said were a bit uncivil, and maybe even in poor taste, I only said them after the anonymous IP had displayed a pattern of rudeness and condescension both here and on my talk page that potentially stretched back months. I would hereby like to apologize for anything inappropriate I've said thusfar. Abyssal (talk) 15:23, 24 September 2009 (UTC)
Ive reviewed the discussion and the anon's complaints are for the most part invalid. the only issue that I see is limited input from the affected wikiprojects. something that we cannot force, the only thing I can suggest is make a post to ANI and see if that brings in a wider group for input. otherwise I thing a small trial would be useful. βcommand 02:05, 24 September 2009 (UTC) - My issues are valid. If you can only claim they aren't without naming them and identifying their invalidity, just to parrot Thaddeus, you haven't increased the support for this bot. It doesn't matter if the projects don't support it. Fact is, Thaddeus hasn't gotten anyone who supports the creation of data in paleontology tables. --69.225.3.119 (talk) 02:15, 24 September 2009 (UTC)
- Ok let me explain the facts to you since you obviously ignored everyone elses comments. The information in question is coming from a very reliable source. The bot operator is not creating articles, just expanding a few lists. The operator has spot checked and confirmed that the data in question is reliable. there are several other uses who work in related areas have confirmed that the information is correct, and the programming of the bot is accurate so that there will be no issues with the imported information. The only real complaint that I see left is the fact that you did not like what anybot's method of operation. on that point we agree. this bot however will not be creating articles but rather filling in tables from a reliable database on existing articles. Please stop trying to raise the drama level. if you have any issues that have not been addressed the best method is a numbered list with a short explanation. βcommand 02:26, 24 September 2009 (UTC)
- No, I didn't ignore it, and others won't get it either because the credits for the information are incorrect. I understand the bot is putting data into lists (thanks for not reading my comments, but saying you've read everyone else's instead). The bot operator is not a vertebrate paleontologist. He hasn't confirmed the reliability of the data. What other users have confirmed? The ultimate purpose of "ContentCreationBot" is to create content.
The drama level? Ignored, called names, personally attacked gets drama. Don't issue personal attacks, don't dismiss my legitimate complaints. No drama. And, stay on target. That helps no drama also. --69.225.3.119 (talk) 02:32, 24 September 2009 (UTC) - I see no proof other than your word that your anything more than a 13 year old child who is attempting to make a point by forum shopping, leaving uncivil comments, and attacking others. The information that the bot is adding is reliable. you have yet to prove otherwise. So unless you can actually make a logical statement and prove that the content and database the bot will be using is wrong (besides a few typos) I see no reason for your behavior. βcommand 02:39, 24 September 2009 (UTC)
- Ahem! Enough, Betacommand! You've been warned about this before. This is not the way for you to interact with people. Stick to the actual 'bot issue at hand (Goodness knows! There's been enough diversion from the core focus of the discussion, already.), and do not give us your guesses about who participants in the discussion may be. Uncle G (talk) 03:43, 24 September 2009 (UTC)
- I am not asking for approval to create stubs at this time, so the bot's "ultimate purpose" is irrelevant to this BRFA. --ThaddeusB (talk) 02:38, 24 September 2009 (UTC)
Doesn't matter, since I'm a hissy-fitting, drama mongoring, passive-aggressive 13-year-old. I suggest you now delete all the anybot articles I saved, because you don't really want a hissy-fitting, drama mongoring, passive-aggressive, 13-year-old writing articles. That's a good one, though, since I'm 13 I'm incompetent. I missed the age limit earlier. My bad. --69.225.3.119 (talk) 02:46, 24 September 2009 (UTC) - No one has criticized or doubted the quality of your work outside the BRFA or how valuable you are to Wikipedia as an editor. My issues, and so far every issue I've seen raised against you has regarded your conduct here. Please stop trying to play the victim here. Abyssal (talk) 15:17, 24 September 2009 (UTC)
[edit] No need for approval A "bot" that runs one time only on 23 pages is not distinguishable from a human editor and does not require approval. In fact, you could generate the content for the 23 pages on your local computer and then save them by hand using your usual account. So there is not really need for a discussion here at all. Just do the edits, and then discuss them on the talk pages of the articles involved like you would discuss any other edits. — Carl (CBM · talk) 02:54, 24 September 2009 (UTC) - Nonetheless, ThaddeusB should be applauded for seeking approval anyway, so that experts can review the data sources, and we don't repeat history. Learning from history, and acting upon that learning, is a good thing. In all of the above, it seems that none of the self-declared subject experts have provided the necessary evaluation of the data source. Uncle G (talk) 03:33, 24 September 2009 (UTC)
- I agree with both comments here. The actual edits could very well be made by a human; it may have even been easier to leave a note on a few WikiProject's talk pages and discuss it there, but no matter. This will have to do as an alternative outlet for discussion about the task. Easily noticed by reading the discussion above is the fact that this has morphed into less of a conversation about the actual edits the bot will make, and more of an unconstructive argument between various parties. I just hope we can return to discussing the task itself. Regards, The Earwig (Talk | Contribs) 03:54, 24 September 2009 (UTC)
- That's the issue. When we are talking about 23 edits total, which is the current scope of this request, there is nothing for BAG to review. The edits themselves can be reviewed so easily that its counterproductive to spend too long trying to do a technical review of the code. Moreover, this forum is not ideal for discussion of content issues, as the discussion above painfully highlights. A discussion on a wikiproject page would be much more likely to be productive. BAG is not intended to review the quality of data sources. — Carl (CBM · talk) 10:46, 24 September 2009 (UTC)
- Please not that I have said on several occasions that I was aware approval wasn't actually required. The reason I sought approval was to make it explicit where the content came from (I bot pulling RS data). I could have just done the actual edits manually and avoid the drama. In retrospect, maybe that would have been best. However, I will say this process has resulted in some improvements to the output, so it wasn't a total waste. --ThaddeusB (talk) 12:58, 24 September 2009 (UTC)
Note: I will be modifying the code to fix the problem outlined here. I thank 69.255 for pointing out this error, and kindly ask him/her to restrict future comments to indicate specific problems that can be addressed. --ThaddeusB (talk) 01:04, 25 September 2009 (UTC) -
- For the record, here is my reply to the issue which the IP reverted as "taunting" --ThaddeusB (talk) 15:24, 25 September 2009 (UTC)
- Could you explain, in layman's terms, how that error occurred? Is this a widespread problem, affecting a significant percentage of the trial run output? If so, I suggest that a fresh trial run be carried out following the fix. I'm not going to comment on the drama that occurred over the last day or so, of which I was completely oblivious to until now - save to say that we should probably just attempt to put it in the past and move on from it. --Kurt Shaped Box (talk) 02:04, 25 September 2009 (UTC)
- The bot fills the "Age" column based on the fossil record information returned by Paleodb. In this case, the fossil record says 14ma to 4ma; however, the genus is actually extant and thus obviously the fossil record is insufficient. Since over 90% of the db is extinct, this particular problem should be quite rare. However, it does raise a question about the accuracy of using the fossil record in general.
- I used the fossil record to estimate ages because I feel this is, in general, the best estimate available. However, the fossil record is very much incomplete (and paleo's db is far from a complete record of the known fossil record either) - a fact which is not obvious to the casual observer. As such, I will address this concern via the following adjustments:
- Any genus that is extant will get "present" as the end date regardless of the fossil record
- The column will be renamed to "estimated time period" (or alternate upon suggestion)
- A footnote disclaimer will be added to state the estimates are based off the fossil record, which by nature makes them imperfect.
- I am also open to suggestions about alternative sources for age range estimates.
- Once the code is adjusted, I will re-upload the demo page. (I suggest every one use this terminology to describe that page, as it isn't a "trial" in the BAg sense of the word.) --ThaddeusB (talk) 03:00, 25 September 2009 (UTC)
-
- There is no significant percentage of the output. There are only 23 pages, so even 100% would be an insignificant percentage. That's why the whole idea of "trial runs" is flawed in this case. — Carl (CBM · talk) 02:25, 25 September 2009 (UTC)
-
-
- You may have missed it, but in the original requested (way up there somewhere :)) I suggested having the bot fill in only a small percentage of each table for the trial rather than the normal "X pages" approach. --ThaddeusB (talk) 03:00, 25 September 2009 (UTC)
-
-
-
- As long as it fits with the size limits, the size of the edit really isn't the issue. A "trial" is warranted when the bot is going to make a lot of actions, so that it would be painful to have to undo them all. The expectation is that the bot operator will carefully review every edit in the trial to make sure there are no technical problems. The size of the edits during the trial is entirely up to the bot operator. The difficulty here is that this project, regardless of any other merits it might have, simply doesn't fit into the framework for approving bots. — Carl (CBM · talk) 10:55, 25 September 2009 (UTC)
-
-
-
-
- I am well aware of the "rules", including WP:IAR. I thought it would be beneficial to get approval for the bot, even though none was technically required, as thus I filled the request. Can we please stop the wikilawyering now? --ThaddeusB (talk) 15:24, 25 September 2009 (UTC)
-
-
-
-
-
- I'm saying there are more appropriate forms of review than BAG for this task (for example, the talk pages of the articles involved, or a wikiproject page). As you say below, you are not even looking for a bot flag — so why create a "bot" account, for a task that has none of the attributes of a bot? I wanted to bring up this point in case other people see this nomination and mistakenly think it represents our best practice for when to ask for bot approval. — Carl (CBM · talk) 02:20, 26 September 2009 (UTC)
- Fair enough, thanks for clarifying. --ThaddeusB (talk) 03:05, 26 September 2009 (UTC)
[edit] Chaunax 69.225.5.4 directs my attention to the original version of the article Chaunax, posted by Abyssal back in May. I must say I find this a disappointingly poor effort at a cookie-cutter stub. Problems include - Unsubstituted {{pagename}} templates;
- Omission of class, order and family from the taxobox;
- Specifying (but leaving blank) taxobox parameters that are inappropriate for a genus article, such as "binomial";
- The absense of a fossil range, a piece of information that I would have thought was critical to the decision to post a stub like this;
- The incorrect claim that it is extinct;
- The redundancy of referring to it as both "extinct" and "prehistoric" in the same sentence. "Prehistoric" implies "extinct"; extant genera that are present in the fossil record are never referred to as prehistoric.
- The absence of references.
Naturally the false claim that the genus is extinct is the biggest problem. This sums up the problem with content creation bots. (1) They introduce errors; and (2) even when they don't introduce errors, they produce clunky, incomplete, redundant articles that utterly fail to communicate in an interesting or informative manner. I don't want to beat a straw man here; but it does seem reasonable to assume that the purpose of ContentCreationBOT is to enable the creation of these dreadful cookie-cutter stubs on a grand scale. Am I wrong? If so, what steps have or will be taken to ensure that the content produced is more useful than the example above? Hesperian 12:16, 25 September 2009 (UTC) - Thad and I have prepared at least a tentative template that is much more thorough than my "cookie cutter stubs." Further input and ideas are appreciated.
- The possibility of introducing inaccurate claims to Wikipedia is a real one, and could come from two sources. 1, faulty information in the database and 2, the bot mishandling the data. Source one is unlikely, since the database is maintained by experts. Source two is preventable and is the reason we'll have to do test runs. We recognize the possibility of things going very much awry, and we have always intended to proceed cautiously. It's one of the reasons we decided to try the data-table filling process- to prove the bot could handle the data properly.
- I don't claim to have made good stubs, but surely you wouldn't suggest that a very short "substub" is worse than having no article at all? If all the stub did was set up the basic framework for the article, then it still would justify their creation. Say you wanted to create the Chaunax article. First you'd have to go to another animal article, copy the taxobox, replace the data, etc. Then you'd have to find out the best stub template to use and add it. Then add the name, portal templates, links, etc. Let's just say that it takes three minutes of time to do that. Now, if every prehistoric fish was done manually, the community would have to spend three minutes for every article. Lets say I made 500 stubs the cookie-cutter way. Because I'm just copy-pasting, article creation time is instantaneous. Therefore, if I had created 500 practically instantaneously, I had saved the community the approximate 25 man-hours worth of work. It's the same idea with the bot. If the bot creates 5,000 stubs (which would all be much higher in quality than the one I made, see the template), the amount of work saved would be 250 man-hours even if all it had done was build the basic set-up of the article.
- I'm not going to respond to specific criticisms of the Chaunax article, not because they aren't valid (they were, although I'd still personally refer to extant ancient taxa as prehistoric), but because the bot-generated stubs will be so much better in quality than my "cookie cutter" types that they aren't really relevant.
- To answer your last question, we do intend to create large numbers of relatively high-quality stubs eventually however this particular discussion is supposed to be only about the data-table completion. We will start another Request for Approval when we feel that we're more prepared to handle the much larger task of stub-creation. Thanks for the input! Abyssal (talk) 15:51, 25 September 2009 (UTC)
-
- "I don't want to beat a straw man here" well that is exactly what you are doing.
- 1) The "horrible stub" wasn't created via any sort of automation but rather by hand by Abyssal
- 2) I am not asking for approval to create stubs at this time --ThaddeusB (talk) 15:26, 25 September 2009 (UTC)
-
-
-
- I did ask "Am I wrong? If so, what steps have or will be taken...?" You haven't answered that. Abyssal has, but the link he has provided, User:ThaddeusB/PAC template, fails to re-assure me.
Re (2), If I posted a request here that said "I am requesting permission to scratch my backside... but later I might want to deploy my bot to correct spelling errors... but right now I am only requesting permission to scratch my backside", I'm pretty sure discussion would centre on my proposal to deploy a bot to correct spelling errors. This is only natural. Hesperian 05:52, 26 September 2009 (UTC) - Considering I am not asking for approval to make stubs, am not ready to make them, and if when\I am, I would most certainly have to post a new request to do so it is completely reasonable for me not to want to debate them here. --ThaddeusB (talk) 15:11, 26 September 2009 (UTC)
It's an example of how articles created by people without knowledge in the subject area aren't useful. I had corrected a few hundred of Abyssal's fish stubs, making them more useful by adding class, when I was interrupted by the Anybot mess. This article problem is relevant to this discussion because the article was added by Abyssal, who is strongly advocating for this bot and worked with ThaddeusB on creating the bot. Adding 10,000 pieces of data to 23 pages is worse than not having the data, when those adding the data are not reading the database correctly (see Abyssal's sample "successful" upload above) and admit (above) they don't have the necessary expertise to read the database correctly. If wikipedia editors don't know if the data are correct, they do not belong on wikipedia for any amount of time. They do not belong uploaded by a bot or by a human. As I said early on, until this bot has experts (whether paleontologists or wikipedia enthusiasts on the taxa) on board, its task is inappropriate. It is not supported by the community. The community is not asking for unvetted data to be uploaded. Abyssal and Thaddeus don't know the data, can't tell when it's incorrect, and they don't act quickly when they create articles that are incorrect. --69.225.5.4 (talk) 20:26, 25 September 2009 (UTC) I think also that Abyssal's comment that the test run can prove the bot can handle the data correctly should be remarked upon, because, what the test run did was prove exactly my problem with this bot: it doesn't matter if the bot can handle the data correctly when there is no one available who can vet the data. --69.225.5.4 (talk) 20:37, 25 September 2009 (UTC) -
- The fish articles aren't relevant, no matter how hard you insist that they are. You might as well choose any random project I've engaged in on Wikipedia. The fish articles will not reflect the quality of the created stubs because we are using a different template for the article design. They were not created by the same process that will be used here. We are not even supposed to be discussing the planned stub creation process here. Also your language is misleading. If you were just adding class information, that's not "correcting," that's just "adding."
-
- We can read the database just fine. It doesn't matter if we understand the content, it's just a matter of making sure the content added to article is the same as is in the database. If the generated article on Abyssalgenus says its a member of the Thadidae while the PBDB says it's a Kurtboxid, then we know an error has been made regardless as to whether we understand the basics of either taxon's anatomy/classification/lifestyle/etc. The only skills needed to ensure the validity of the final result is the ability to compare the data in the article to the data listed in the database, either the words will be exactly the same or an error will have occurred. Expertise is irrelevant.
-
- What do you mean, implying that the bot handled the data correctly? The bot didn't handle the data correctly, it failed to verify the "age range" information for Cryptoplax with the "basic information" data in the first tab. That mishandling was supposedly the basis of the complaint you raised yesterday on your talk page. It has nothing to do with mine or Thad's ability to read the database. Even as a non-programmer I can see an easy solution for this: use the Sepkoski age range data for extant species and the PBDB "age range" information solely for extinct taxa. Had we forseen that the problem would have never occurred, but you can't forsee everything which is why we ran the test in the first place. Abyssal (talk) 23:34, 25 September 2009 (UTC)
-
-
- You're the one who said you ran the trial to prove the bot can handle the data correctly, see your above post.
-
"It doesn't matter if we understand the content, it's just a matter of making sure the content added to article is the same as is in the database." - Yes, it matters if you understand the content. You didn't, so you posted a "successful" trial that included wrong information. The information in the database was correct. It still is correct. It lists the species as Late Miocene to recent. --69.225.5.4 (talk) 00:08, 26 September 2009 (UTC)
- I thought you were going to stay on topic? I guess that was either an empty promise or you are completely incapable or unwilling to do so.
- At least a dozen times you have said Abyssal and I have admitted to having no knowledge about the subject. That is entirely 100% untrue. I have stated I am not an expert, but that is not the same thing as "having no clue." Abyssal has never said anything at all about not having knowledge of the subject and indeed he has contributed more to paleobiology on Wikipedia than nearly anyone else.
- Abyssal has created more than 1000 articles on prehistoric genus. You have to date found 3 that contained errors. Wow, a human with only a 99.97% accuracy rate must clearly be a complete fool who doesn't know a thing about the subject matter. Right?
- Every post you make is a half-truth or distortion of the facts. You repeatedly make insulting claims like I ignore all feedback, or that I asked for you to be blocked, that have absolutely no basis in fact. You claim to be an expert yet you refuse to provide a single concrete criticism until after 3 weeks of bickering and a block for being disruptive. You claim you want a bot to supply this data, but your actions say otherwise. Tell me, what is your real motivation here? If you want to help, than please do so. If you want to argue, than please go someplace else.
- I 100% absolutely want every shred of data produced by this bot to be accurate. I will listen to any concrete complaints you or anyone else has, and will fix any actual problems that are identified. However, your standard some magic human that can review 10000 items and instantly knew item 245 is an error is impossible to meet. There is not one person on the entire planet with the ability to do what you demand. Anybot was a horrible POS, but I had absolutely nothing to do with that, so please stop taking out your rightful hatred of that bot on me. --ThaddeusB (talk) 23:37, 25 September 2009 (UTC)
-
- Take the personal comments elsewhere, ThaddeusB. --69.225.5.4 (talk) 00:08, 26 September 2009 (UTC)
- LOL, your refusal to be truthful is directly relevant to this conversation. You can't just imply I'm incompetent, post outright lies about what I & Abyssal have said previously, claim I never listen to people, and ignore 90% of everything that is said and harp on the 10% that looks bad, and not expect me to comment on it. You hate the very idea of this bot (despite your claims to the contrary) and your actions make it quite clear you are interested only in derailing the bot, not in making it work correctly. --ThaddeusB (talk) 00:16, 26 September 2009 (UTC)
Note: after a productive conversation with 69.225, I have sent out requests for more expert input. --ThaddeusB (talk) 03:40, 26 September 2009 (UTC) [edit] Let it drop Considering you're only proposing to make 30-odd edits, which you're entirely capable of doing through your user account; and considering this request has been utterly derailed, for better or for worse; it seems to me that the best way forward for you is to let this request drop, and go ahead and produce these lists, if you still want to. If and when the time comes that you want to do something that actually requires approval, a fresh start to this approval process would be useful for everyone concerned. By that time you will have learned from the experience of posting these lists, and you'll go into the approval process knowing that some of us are still smarting from the last debacle, and need concrete reassurance. Hesperian 05:52, 26 September 2009 (UTC) - I still think it is more appropriate to make bot edits under a bot account rather than hiding them under my own user name. If I had just put them under my own name to begin with, of course, none of this would have ever happened, but I really don't think that would have been better. For example, no one would have questioned anything and whatever errors that might have occured most likely would never have been caught. --ThaddeusB (talk) 15:05, 26 September 2009 (UTC)
-
- I agree with Thad. Despite the drama we did get feedback that saved us from very serious errors. Abyssal (talk) 16:16, 26 September 2009 (UTC)
-
- Don't "hide bot edits under your own name". Make human edits. There are only 30 of them. Hesperian 06:47, 27 September 2009 (UTC)
- I meant hide the fact that table was generated by a script - I didn't mean make the script upload them under my name. --ThaddeusB (talk) 16:25, 27 September 2009 (UTC)
-
-
-
- There is no reason to worry about whether the content of a single edit is generated by a script or not. If the content of the edit is good, it doesn't matter if a script made it, and if the content is flawed, it also don't matter. I use a scripts somewhat often to create citation templates, for example, but there is no reason I need to indicate that in the edit summary. Indeed, it would even be valid in this case to let the script upload the content in your name, provided that you review the content yourself. — Carl (CBM · talk) 03:21, 30 September 2009 (UTC)
[edit] Bot flag? Are you asking for a bot flag? There seems to be no need for one. --Apoc2400 (talk) 21:15, 25 September 2009 (UTC) - If it doesn't get one, that is fine by me. --ThaddeusB (talk) 23:37, 25 September 2009 (UTC)
- FWIW, I would support a bot flag if one was given. I agree that one is not necessary for this test, however. ···日本穣? · 投稿 · Talk to Nihonjoe 06:46, 30 September 2009 (UTC)
[edit] Task approval Let's just approve this task! It's a handful of edits, and its good that it's been through the process. Or approve a trial run of 25 edits... They can be reverted and re-run if there are any problems. Rich Farmbrough, 01:56, 13 October 2009 (UTC). {{BAGAssistanceNeeded}} - It is currently stalled probably because of some issues that arose from expert input about one of the lists. It's only a handful of edits, but each edit is hundreds of lines in a table, for a total of thousands of lines of information.
- As there is an issue about the validity of the genera in the lists that should be addressed first, there's no point in pushing the bot operator to get the bot going to create data that will be mirrored and is incorrect.
- Which reminds me, I have to delete a made up organization from an article that shows up in 77 google hits, all wiki mirrors. It's much politer to not do this in the first place, meaning not create articles with faulty data to begin with, rather than go and correct them after the fact. There's no hurry here. --69.225.5.4 (talk) 04:53, 13 October 2009 (UTC)
-
- Fix the bug won't be difficult, but I haven't had a chance to do it yet because I've been busy with more pressing tasks. Once I've fixed it, I'll update here. --ThaddeusB (talk) 14:33, 13 October 2009 (UTC)
[edit] tasks • contribs • count • SUL • logs • page moves • block user • block log • rights log • flag Operator: Ezarate Automatic or Manually assisted: Manually Programming language(s): Python Source code available: Standard pywikipedia Function overview: add interwikis Edit period(s): daily Estimated number of pages affected: 50 Exclusion compliant (Y/N): Already has a bot flag (Y/N): N Function details: add interwikis, you can see the test editions here [edit] Discussion - Er, I'm slightly concerned at the test edits without having actually gotten authorization for test editing; unless, was the bot being run in assisted editing mode (i.e. with human supervision)? --Cybercobra (talk) 23:25, 2 November 2009 (UTC)
It's only five edits, though. Why did the bot remove the it interwiki link on the one article? --69.225.3.198 (talk) 23:56, 2 November 2009 (UTC) -
- The interwiki removed said "It's A Long Way To The Top (If You Wanna Rock 'N' Roll)" (wrong). The valid interwiki is: "It's A Long Way To The Top (If You Wanna Rock 'n' Roll)"
- This bot has the flag in the spanish wikipedi as you can see here. These five edits were to test the bot here, only adding interwikis. Regards!!!
- here you can see the bot's edits on the spanish wikipedia. --Esteban (talk) 00:53, 3 November 2009 (UTC)
- So it will handly Italian wikipedia links also, not just Spanish? Well, I'm not too concerned about interwiki bots, you're addressing user questions, that's all I care about. --69.225.3.198 (talk) 03:33, 3 November 2009 (UTC)
- I agree, interwiki bots are uncontroversial, and a few unapproved edits aren't cause for a lot of concern. Except from that it shows that Esteban doesn't understand our bot policy. At Esteban: Please read through WP:BOTPOL, it's important that you and your bot keep to it, if you have trouble understanding anything in there just ask us to clarify. Best, - Kingpin13 (talk) 08:26, 3 November 2009 (UTC)
- I think that those test edits is due to a "cultural problem". In the Spanish Wikipedia we do ask the bot controllers to perform some test edits in order to facilitate a decision about the way the bot is operated. Regards, Poco a poco...¡adelante! 13:46, 3 November 2009 (UTC)
- I'm sorry the unapproved test edits, I wanted to make sure the proper functioning of the bot before submitting you. I just read the policy. I am using the latest pywikipedia and I'll wait your authorization for continue testing. Best regards!--Esteban (talk) 14:06, 3 November 2009 (UTC)
- I'm trying to see what happened. I'm sorry --Esteban (talk) 21:09, 5 November 2009 (UTC)
This should not happen at all. Especially, when you make such edits on several wikis at the same time and even don't have a userpage informing who is the owner of the bot. -- Mercy (☎|✍) 20:11, 5 November 2009 (UTC) - Yes, that's a problem. If you make 5 test edits, then editors tell you to read bot policy first, then you continue making test edits, that's not a good thing. --69.225.2.24 (talk) 03:31, 6 November 2009 (UTC)
- Hopefully Ezarate's lack of knowledge about the bot approval process in this Wikipedia (and after correction of the mentioned problems) will not be determining for the decision to approve the bot. I think that these kind of bots are always welcome, regards, Poco a poco...¡adelante! 14:40, 6 November 2009 (UTC)
- I forgot to delete the line of this wikipedia in the file user-config.py of my notebook, I am editing on my home and with my notebook. --Esteban (talk) 15:48, 6 November 2009 (UTC)
Is the bot exclusion compliant? I believe that py is by default, you haven't changed it have you? - Kingpin13 (talk) 08:14, 9 November 2009 (UTC) -
- I don't change nothing on the pywikipedia, the bot's exclusion compliant --Esteban (talk) 14:15, 9 November 2009 (UTC)
- Will you be adding interwikis to templates using this bot? Also, why do you estimate that 50 pages will be edited? Is this throughout the bot's whole "life", or per day? - Kingpin13 (talk) 20:23, 19 November 2009 (UTC)
- Only in articles or categories, not in templates. I think a week for editing 50 pages. It's per day. Regards!!!--Esteban (talk) 22:25, 19 November 2009 (UTC)
[edit] Requests to add a task to an already-approved bot [edit] tasks • contribs • count • SUL • logs • page moves • block user • block log • rights log • flag Operator: Rich Farmbrough Automatic or Manually assisted: Auto Programming language(s): AWB Source code available: AWB Function overview: Replace or consolidate deprecated parameters in {{Cite web}} and other cite templates Links to relevant discussions (where appropriate): Created section at template talk here Edit period(s): one time Estimated number of pages affected: c. 10k Exclusion compliant (Y/N): Y Already has a bot flag (Y/N): Y Function details: merge various access date fields into one ("accessdate"), merge "year" "month" and "day" fields (when all present) into "date" [edit] Discussion Per these edits the following fields are deprecated and use places the article in the hidden category Category:Cite_web_templates_using_unusual_accessdate_parameters. - accessmonthday
- accessdaymonth
- accessyear
- day
- accessmonth
- accessday
The removal is a simple matter of merging the fields to "accessdate", except for "day" which needs merging with "month" and "year" to "date". I forsee that this task may need finishing by hand as there will be many poorly formatted expressions, however preliminary testing indicates most cases (over 90%) can be dealt with with half a dozen simple rules, and refined rules will doubtless improve the hit rate further. - Rich Farmbrough, 23:11, 18 November 2009 (UTC).
- Looks fine, deprecated template, competent bot owner, community requested task (through deprecation of other templates), going after reasonable level of fixing with bot. --IP69.226.103.13 (talk) 07:14, 22 November 2009 (UTC)
A user has requested the attention of a member of the Bot Approvals Group. Once assistance has been rendered, please remove this tag. [edit] tasks • contribs • count • SUL • logs • page moves • block user • block log • rights log • flag Operator: Rich Farmbrough Automatic or Manually assisted: Auto Programming language(s): AWB Source code available: AWB Function overview: Delink full dates Links to relevant discussions (where appropriate): See User:Full-date unlinking bot Edit period(s): One time Estimated number of pages affected: Depending on division of labour and other tasks taking priority up to 500,000, more likely 80-100,000 Exclusion compliant (Y/N): Y Already has a bot flag (Y/N): Y Function details: Delinks dates pretty much per the spec of User:Full-date unlinking bot [edit] Discussion I have been asked to turn SmackBot to this task, as FDUB has run into difficulties with maintaining the throughput planned. The settings are available here. They have been tested against FDUB's test data [3]. Regards, Rich Farmbrough, 22:35, 18 November 2009 (UTC). - When FDUB was approved it was agreed that any given article would only be processed once to prevent human editors finding themselves in an edit war with a bot. This bot should coordinate FDUB so that neither will revisit an article that either of them has edited. Jc3s5h (talk) 01:41, 19 November 2009 (UTC)
-
- This is a valid point, I have inspected all of FDUB's edits (12,000+ at the time) and there 5 where date linking had been re-introduced to an existing date, by two editors, one was unaware and was simply editing the articles, the other by an editor who wished to retain the auto-formatting as there were mixed formats on the page. Nonetheless it is possible for SmackBot to do this relatively easily, and to create logs provided FDUB can deal with them. Rich Farmbrough, 08:59, 19 November 2009 (UTC).
A user has requested the attention of a member of the Bot Approvals Group. Once assistance has been rendered, please remove this tag. [edit] tasks • contribs • count • SUL • logs • page moves • block user • block log • rights log • flag Operator: Anomie⚔ Automatic or Manually assisted: Automatic, unsupervised Programming language(s): Perl Source code available: User:AnomieBOT/source/tasks/CategoryCleaner.pm Function overview: Clean up miscategorized pages Links to relevant discussions (where appropriate): WP:VPT#Category:Redirects Edit period(s): periodically Estimated number of pages affected: 956 to start, then depending on the rate of pages mis-added. Exclusion compliant (Y/N): Y Already has a bot flag (Y/N): Y Function details: Certain categories have easily machine-testable bright-line criteria for excluding certain classes of pages. For example, Category:Redirects should contain only pages and categories about redirects; it explicitly states that it should not contain any actual redirects. But due to people not reading the category header text, it currently contains 956 redirects. This task will periodically check that category for redirects and remove any that are found. If any similar situations are brought to my attention, I intend to do the same for them. [edit] Discussion Looks useful and straightforward for this particular miscategorization. Will you be explaining the to the misactegorizing editor what was done to prevent the bot being reverted? --IP69.226.103.13 (talk) 08:34, 14 November 2009 (UTC) - For the initial run at least, posting to users' talk pages wouldn't likely be all that useful as some of them date back to early 2008 (others may be even earlier). Do you think a sufficiently detailed edit summary would be sufficient? For example, "Removing Category:Redirects: That category contains pages and categories ''about'' redirects. It should not contain any ''actual'' redirects, as there are so many of them that a category of them all would be unmanageable." Anomie⚔ 00:15, 15 November 2009 (UTC)
- I can see not notifying for the initial run, then notifying to maybe limit the number of users miscatgorizing. The summary may not need to be that detailed, but, imo, yes, using the edit summary would work. --IP69.226.103.13 (talk) 04:10, 15 November 2009 (UTC)
- There are only about 970 redirects in this category, it's not like there were thousands. Most of those were added by bots anyway, and users don't seem to use it regularly, so a notification seems unecessary. A rename has been suggested at VPT and when the category is cleared, I'd like to open a CFD. Cenarium (talk) 22:28, 27 November 2009 (UTC)
- That sounds okay by me. I'm not too familiar with categorizing, and you've answered my concerns. --IP69.226.103.13 (talk) 22:38, 27 November 2009 (UTC)
[edit] tasks • contribs • count • SUL • logs • page moves • block user • block log • rights log • flag VVVV See revised proposal VVVV Operator: Rich Farmbrough Automatic or Manually assisted: Automatic Programming language(s): AWB Source code available: AWB source available Function overview: Replace category:Articles_lacking_sources_(Erik9bot) with an unreferenced tag, dated to fro the creation date of the article. Edit period(s): One time Estimated number of pages affected: 116,000 Exclusion compliant (Y/N): Yes Already has a bot flag (Y/N): Y VVVV See revised proposal VVVV Function details: - If has an unref tag or similar will simply remove the cat
- If has evidence of references will simply remove the cat
- There are some "long tail" items i.e. months with a single item or a few. These may be consolidated into the oldest reasonable non-empty month.
[edit] Discussion VVVV See revised proposal VVVV Background. Erik9bot created this category, which should probably have been simply applying an appropriate tag. CiterSquad has provided cites for a good number of articles, and a couple thousand others have had cites added and the cat left in - I have fixed these already, though there may be more by now. The category fails for a number of reasons: - it is hidden, so it is not seen and it gets left behind.
- it is too large.
- it stops articles getting tagged properly.
- it doesn't follow the dating pattern of other categories
- Erik9bot is now banned so it is not maintained.
Rich Farmbrough, 00:00, 31 October 2009 (UTC). [edit] Hidden? - will this be a hidden version of the template, what I don't want to see is a lot of stub articles getting plastered with annoying obnoxious templates. is there some way of adding it so that its not visible yet still in a hidden category? βcommand 01:47, 31 October 2009 (UTC)
- We can make the template hidden but it is rather against consensus. Stub articles need to be referenced too. Rich Farmbrough, 18:51, 31 October 2009 (UTC).
- I too dislike the idea of very short stubs being tagged. Could the bot be programmed to recognize stubs, perhaps with a regex, and not tag those articles with unreferenced? Or maybe recognize them by byte size? -- Ϫ 18:17, 1 November 2009 (UTC)
- It could simply skip stubs. or we could have a stub=yes parameter. But stubs are dangerous - they hang around a long time un-reviewed and unreferenced - they are smaller and get less traffic but that doesn't mean they are less prone to error. In theory stubs should be easy to find references for since they contain very little. Also the act of finding reference number 1 should go with some basic assessment/fixing of other aspects of the stub. One of the things that does concern me is that we get copied and used all over the place and can easily end up using something derived from a WP article as a reference for the article itself. Rich Farmbrough, 22:02, 1 November 2009 (UTC).
We could do - Rich Farmbrough, 22:06, 1 November 2009 (UTC).s
-
-
-
- One problem area with stubs is with organism stubs. The orphan templates and the unreferenced templates make the text of the article hard to see. It looks like there is no text at all. When an organism article is a stub it can be a one-lined stub, maybe someone didn't add a reference, and it certainly should be referenced, but by putting a template that obscures the text, you've made the article completely useless. You might as well speedy delete the article. But with the single line of text and the taxobox the organism stub is at least a starting point for a reader finding more information. Is it possible to categorize these articles? In particular, could they be categorized unreferenced by phyla or divisions or orders? It might be easy to get wikiprojects to reference the articles, then. I add references to organism articles by the boatload while cleaning them up.
- I don't see that with single line stubs, as most organism articles are, that the information is copied into reliable sources from wikipedia. Wikipedia is pretty much forbidden as a source for taxonomic information, because it uses mixed taxonomies, among other reasons. So, this is a non-issue for these articles. --69.225.3.198 (talk) 16:08, 2 November 2009 (UTC)
- The majority of organism stubs are created in chunks and can be referenced in chunks - if they aren't already, most recent ones are. And yes they can be categorised, most are already, I was looking at some beetle stubs the other day and they really fit my ideas for dynamic taxonomic categorisation. Rich Farmbrough, 22:35, 2 November 2009 (UTC).
- Is there a page where I can find unreferenced by category organism stubs? I used to have lists of articles I was going to write or reference, and I think they were in categories like this. Having them categorized by phyla, in my opinion, would be incredibly useful. I don't know about referencing in chunks, but if they can be that would be great. Still, categorizing by phyla at least, better by class, etc., would make future referencing easier. --IP69.226.103.13 (talk) 08:38, 14 November 2009 (UTC)
[edit] Unreferenced stub - I like this proposal; I particularly like the small separate {{Unreferenced stub}} would these articles be identifiable in the (same or different) category? As mentioned above finding references (or not) for short stubs is an easier body of work then full articles that are unreferenced. I would support the automated use of {{Unreferenced stub}} placed at the bottom of the article for any article with a stub marker.This is a good compromise between all the different views on unreferenced templates and stubs. JeepdaySock (AKA, Jeepday) 16:35, 2 November 2009 (UTC)
- As it stands same categories. Rich Farmbrough, 22:35, 2 November 2009 (UTC).
The reason this category was used at all is because there was not agreement to just add the unreferenced tag automatically (see Wikipedia:Bots/Requests for approval/Erik9bot 9). The arguments made there (I did not participate) would apply to this request as well. On the subject of stubs, the stub tag itself indicates that the article requires significant improvement. There is no reason to add "unreferenced", "expand", or similar templates to articles that are already marked as stubs. The stub tag itself indicates that the article is likely to be deficient in many ways. — Carl (CBM · talk) 12:54, 19 November 2009 (UTC) -
- Would those who would prefer not have stub articles tagged, be willing to jointly pursue expanding {{stub}} (This article is a stub. You can help Wikipedia by expanding it.) to something like "This article is a stub. You can help Wikipedia by expanding it with cited sources. Jeepday (talk) 11:31, 22 November 2009 (UTC)
[edit] Opposed - I oppose this task, as it is currently formed. I find the over-use of these cleanup templates to be garish, distracting, and unnecessary. Worse, the addition of these templates often seems arbitrary, which has led to the use of them becoming semi-permanent on many articles, which is problematic for several reasons. I bring all of these up because I feel strongly that this is not an appropriate bot activity. Bot actions should be essentially uncontroversial and easily reverted, and I don't think that this activity qualifies. Something should definitely be done with the category and the pages that are in it, but this doesn't seem to be the most effective solution in my view. We could, and likely should, have a wider discussion (outside of the RFBA process) on this issue as a whole.
— V = I * R (talk to Ω) 13:43, 19 November 2009 (UTC) - I agree - in fact I had rather optimistically assumed that the bot was adding this category to articles as an alternative to the ugly and misleading "unreferenced" tags. We certainly don't want a whole lot of automatically generated instances of this tag, that serves no purpose except to tell people what they can already see, and falsely implies that articles which cite references are somehow reliable.--Kotniski (talk) 13:45, 19 November 2009 (UTC)
- Well there are items that have remained unreferenced for 8 years. The tagged items only backlog 3 years. If you think the tag is ugly and misleading go and fix the tag. Rich Farmbrough, 22:18, 19 November 2009 (UTC).
- Having said that a smaller tag is suggested for stubs, or indeed a sane compromise is to make the tag invisible for stubs. Why is this sane - compared with what was done before? Because it addtresses items 2,3,4 and 5,and if 1 is still found to be a problem it can be addressed later on. Rich Farmbrough, 22:47, 19 November 2009 (UTC).
- See though, this won't actually be dealing with the real issue here at all. Namely, those articles that have been unreferenced for 8 years will still be unreferenced regardless of how many tags, categories, or anything else that is added to them. I don't object to giving smackbot or any other bot as many useful tasks as possible, and this certainly isn't about any aesthetic concerns (on my part), but this task really just plain doesn't seem useful. I don't think that either this or the original Erik9Bot approval to add the cat really took the criticisms that are being offered to heart. Is this proposal (or the existing erik9bot category) really helping the encyclopedia in some manner?
— V = I * R (talk to Ω) 00:10, 20 November 2009 (UTC) - As proposed all the articles will be addressed by Wikipedia:Unreferenced articles which will work to systematically address each article based on age. Articles get references or they get deleted, if they can't be referenced. I would say that qualifies as "helping the encyclopedia in some manner" Jeepday (talk) 00:27, 20 November 2009 (UTC)
- Assuming that said project is actually active (which is quite an assumption), is it clear that they are supportive of having their workload essentially doubled overnight? Has anyone actually notified the project, or talked to them at all? I should reiterate that the usefulness of categorization or tagging is really an entirely separate question, since we should be discussing the bot editing here. The two issues naturally conflate together somewhat, but the basic question is "should a bot do this", and my reply to that question tends to be "no".
— V = I * R (talk to Ω) 00:39, 20 November 2009 (UTC) - The project has completed 219 articles so far this month, which puts them on track to finish all the presently-tagged articles 31 years from now if they keep up the same rate. — Carl (CBM · talk) 00:48, 20 November 2009 (UTC)
- OK, well I just aded a notification about this discussion to Wikipedia talk:Unreferenced articles. I don't want to speak for them, and this is really a secondary concern for me regardless. My main issue is essentially the same as the objections that were ignored in regards to Erik9Bot. The simple fact is that this is clearly not an uncontroversial task, and therefore it should not receive approval.
— V = I * R (talk to Ω) 12:45, 20 November 2009 (UTC) - I have some familiarity with the Wikipedia:Unreferenced articles, I expect there will be no objects to adding these articles to the tasks. We will start with the oldest ones and move forward. Jeepday (talk) 12:38, 21 November 2009 (UTC)
- What about the fact that erik9bot tagged these articles a long time ago and that many of them may have references now? Will articles with references automatically be excluded, even if erik9bot has categorized the article as needing references? - ʄɭoʏɗiaɲ τ ¢ 18:15, 19 November 2009 (UTC)
- I have run a clean-up and taken out those with references, and would do so again. This shows why the hidden cat was a bad compromise. Rich Farmbrough, 22:18, 19 November 2009 (UTC).
- I ran some code this morning to check, and it looks like about 70,000 of the 115,000 articles in the category are tagged as stubs. — Carl (CBM · talk) 18:18, 19 November 2009 (UTC)
-
-
- That excludes those maths stubs where you removed the category? Stubs do need references. Certainly they don't need expand tags which I have removed many of. Rich Farmbrough, 22:21, 19 November 2009 (UTC).
- Of course; I ran the counting code again this morning to get a fresh count. Although the number of math articles I can manually inspect is much less than 70,000. The stated goal of the Erik9 category was to let people review the articles by hand, and I followed that goal by looking at the wiki source code of a collection of math stubs in the category to make sure the articles really were tagged as stubs before removing the category. To be fair, I also removed {{unsourced}} from stubs when I come across it. Stubs do need to be verifiable, but they do not need most maintenance templates. — Carl (CBM · talk) 22:54, 19 November 2009 (UTC)
-
-
-
-
- "The stated goal of the Erik9 category was to let people review the articles by hand", very true, This was I believe mostly because there was a concern that many of the articles could have poorly formatted reference that the bot would not recognize. I personally have reviewed and tagged a few hundred or a couple thousand from the category working from Wikipedia:CiterSquad. I found maybe one in 200 that had a poorly formatted references. Having also worked Wikipedia:Unreferenced articles I can tell you that the number of articles that currently have {{Unreferenced}} or a variation and also have numerous well formatted references is far in excess of any that may be incorrectly identified. When the process was suggested at Wikipedia:Bots/Requests for approval/Erik9bot 9 I was hesitant to have a bot adding {{Unreferenced}} or {{Unreferenced stub}}. I have now lots of experience in these bot identified articles and what has been marked is appropriate for tags in the unreferenced family. Jeepday (talk) 23:40, 19 November 2009 (UTC)
-
-
-
-
-
- Over half the articles in the Erik9 category are tagged as stubs already. Both in the Erik9bot BRFA, and higher above on this one, various people have expressed the opinion that stubs should not be automatically given "unreferenced" tags when they are already tagged as stubs.
-
-
-
-
-
- In particular: in the previous BRFA, Erik9 said, "Based on comments by Gimmetrow, Antandrus, Keith D, and Geogre opposing the automated addition of template:unreferenced to articles, I am revising the task for which approval is requested: ...". In this BRFA, Betacommand, OlEnglish, Ohm's Law, and Kotniski have objected. I also object to the automated addition of "unreferenced" to stubs. So it seems there is quite a bit of objection to the idea of mass tagging. — Carl (CBM · talk) 23:48, 19 November 2009 (UTC)
- This is a good point, imo. Stubs are already classified by the projects, animal stubs as feline, shark, moth stubs, etc., etc. Plants also. Maybe eliminating stubs from this list is a good idea. --IP69.226.103.13 (talk) 00:15, 20 November 2009 (UTC)
- Just because they have stub tags does not mean they are stubs (RE:User:Triddle/stubsensor), All articles including stubs require references WP:V, any article that has no references is appropriate for an unreferenced template Template:Unreferenced. I appreciate that many users have opinions counter to this statement, but the application of the tags proposed is completely within policy. Jeepday (talk) 00:36, 20 November 2009 (UTC)
- WP:V requires that all articles are verifiable, not that every article must explicitly list references right now. Of course the goal is to eventually reference all articles, and the goal is also to expand all articles so that they are not tagged as stubs. In the meantime, articles that are still tagged as stubs can be expected to have many problems, so adding extra cleanup tags to them is just redundant. When the articles are expanded beyond stubs, references will be added, and if they are not added, then an unreferenced tag is reasonable. — Carl (CBM · talk) 00:45, 20 November 2009 (UTC)
[edit] Huge category -
- This category is so huge it's worthless.
- Yes! That's exactly the point #2 above. Rich Farmbrough, 22:19, 19 November 2009 (UTC).
- Can orphans be subcategorized some way, like orphaned angiosperm stubs? --IP69.226.103.13 (talk) 19:48, 19 November 2009 (UTC)
-
-
- We aren't talking about orphans - that's another matter - see my Village Pump proposal to de-deprecate orphans. But yes it is possible to categorise unreferenced angiosperm stubs if that would be useful. Or by classis, or regnum or whatever. Rich Farmbrough, 22:18, 19 November 2009 (UTC).
- Anything that makes it possible for an editor to jump in and start dealing with a list of 30,000 or 70,000 articles should be done. No one's going to tackle a list they can't make a dent in, or a list where 27,600 articles are outside of their area. But, sometimes an editor will be willing to tackle a list that is manageable.
- I will ask at the projects how they would like their stubs classified. Animals lists probably by order, but plants and fungi may be different. Removing at least stubs of living things to lists that are workable would be really useful, imo. --IP69.226.103.13 (talk) 22:57, 19 November 2009 (UTC)
[edit] Year articles and lists I notice that a lot of the articles are years or other navigational pages (are there disambigs in this category too?). I was under the impression that these do not require a list of references. OrangeDog (τ • ε) 13:31, 20 November 2009 (UTC) - Disambigs will have the cat removed (I think I have already done this). List articles may need references, or they may de facto delegate some referencing. For example List of harmoniun players might be a pure bulleted list of links, in which case provided each article supported the harmonium hypotheses it would be fine (but better to use a category?) - if however it included un unlinked bullet:
Then a citation would be needed. The same applies to year articles - they probably mostly don't need citations, just checking with the articles they link to. Rich Farmbrough, 14:44, 20 November 2009 (UTC). -
- A discussion on this topic is in the archives Wikipedia_talk:CiterSquad/Archive_1#Tagging_date_articles, If anyone is aware of anything else relevant please point it out. Jeepday (talk) 12:32, 21 November 2009 (UTC)
- Strong Support Jeepday (talk) 12:40, 21 November 2009 (UTC)
- OK with me, although I don't think there is much reason to think the template is useful. — Carl (CBM · talk) 13:05, 21 November 2009 (UTC)
- If it categorizes it in some useful way, yes. If not, as in just puts it in unreferenced category, it's still useful to gather unreferenced articles together knowing they require work. --IP69.226.103.13 (talk) 07:16, 22 November 2009 (UTC)
- Support this. Rettetast (talk) 11:52, 22 November 2009 (UTC)
- Support, but can understand reluctance by some to embrace even {{Unreferenced stub}}. Some people believe the {{stub}} sufficiently indicates that the article is in need of significant work. Other people feel more strongly about highlighting the lack of references, as warning to novice readers and as a inspiration to improve the article by adding references, while providing links for how and why. Jeepday (talk) 12:50, 21 November 2009 (UTC)
- Numerous editors have disagreed with this, both on this BRFA and the Erik9 BRFA. The stub tag already indicates that the article requires a lot of espansion. When the article is expanded until it isn't a stub, if it still has no references, then tagging it as unreferenced is more reasonable. But when we are talking about a typical one- to two-sentence stub, the lack of references is overshadowed by the lack of content in general. Anyone who adds to the content is likely to add at least one reference, and anyone who adds a reference is also likely to know enough to expand the content if they like. So, for stubs, expanding the article and adding references are two sides of the same coin. For that reason, the stub tag itself is enough of a maintenance tag on stub articles. — Carl (CBM · talk) 13:10, 21 November 2009 (UTC)
- Again, unreferenced echinoderm stub is useful, but, once an article is tagged as a stub, then as an unreferenced stub, and the main space templates overpower the text, while the fail to meaningfully categorize, I don't see the point. --IP69.226.103.13 (talk) 07:17, 22 November 2009 (UTC)
- I would support this but there is no consensus for this. What about using the tag, but hiding it in some way. A |bot=yes parameter? This would categorize articles in the dated cats and make it easier for wikiprojects or editors who work on specific categories to identify the articles needing work using tools like WP:CATSCAN or Cleanup listings, and at the same time hide the big bad tag. Rettetast (talk) 11:59, 22 November 2009 (UTC)
- The articles already have a hidden marker, namely the category that Erik9bot placed on them. How does changing from one hidden marker to another help? — Carl (CBM · talk) 19:00, 27 November 2009 (UTC)
-
-
-
- Part of the rationale is based here Wikipedia:Administrators'_noticeboard/IncidentArchive566#Erik9_appears_to_be_the_sock_of_a_banned_user, while there is no question that the actual work of the bot was within policy, having a category that includes the name of a banned user has created controversy. If the stub articles remain marked as unreferenced (new category) then they will be available for work in a category for WP:CSQ or other projects. There are stubs with references and there are stubs without, only stubs without references will be in the catogory. These are relatively easy to address and will be good for users not up to referencing projects like WP:FRC. If I have not fully addressed your question, please let me know. JeepdaySock (AKA, Jeepday) 20:45, 27 November 2009 (UTC)
[edit] Revised proposal Function details: - If already has an unref tag or similar will simply remove the category
- If has evidence of references will simply remove the category
- If it is a stub an invisible tag will be added. This will include a parameter to show it was automatic, and if it contains a taxobox the appopriate taxon or taxa. The template will categorise the stub accordingly.
- Otherwise a simple tag will be added, and the category removed.
- Rich Farmbrough, 22:54, 26 November 2009 (UTC).
- Works for me. JeepdaySock (AKA, Jeepday) 12:00, 27 November 2009 (UTC)
- I still fundamentally fail to see the point of any of this. How does it help the encyclopedia to add tags to thousands of articles, informing people of something they can see already? --Kotniski (talk) 13:09, 27 November 2009 (UTC)
- It is a question of workflow - and a significant amount of cleanup work is being driven by these categories . If you object to visible tags that is a different question - see for example VP Orphans. That answers your question. Now the hidden assumption that Joe Random will look at an article and say "Ooops, not cited." seems to me false. Joe may never have read a learned journal, or even a book with footnotes - for a whole bunch of reasons, age, socio-economic background, accessibility, education, interest, etc. Rich Farmbrough, 11:24, 28 November 2009 (UTC).
-
- Well yes, I don't object to the categories, but it is the question of visible tags that concerns me. Looking at the orphans thread you link to I see I'm not alone in being opposed to the "drive-by tagging" of which this bot's proposed activity would consist. Unless there's been a proper wide community discussion that shows there is general consensus for this activity (i.e. that we opponents are a small minority) then I don't think it's appropriate for a bot to do it.--Kotniski (talk) 12:03, 28 November 2009 (UTC)
- Do we have community consensus for each of the tags? --IP69.226.103.13 (talk) 20:26, 28 November 2009 (UTC)
[edit] tasks • contribs • count • SUL • logs • page moves • block user • block log • rights log • flag Operator: Chris Automatic or Manually assisted: Auto Programming language(s): PHP, my classes Source code available: yes Function overview: Remapping the old sockpuppet templates to the new single template Edit period(s): One time Estimated number of pages affected: Exclusion compliant (Y/N): N Already has a bot flag (Y/N): Y Function details: Full details of the remapping here [edit] Discussion It's a suitable bot task, but I notice that there is some debate between two templates to be used as replacements. Avi and Foxy Loxy have each come up with a template, but I'm not seeing a specific consensus for which one should be used. I'd suggest we wait until consensus and the operation of the template are clearer, since here, some of which occurred after the filing of this BRFA, there is some debate about the use of the templates and whether there should be one, two, different flags, etc. Fritzpoll (talk) 16:13, 29 October 2009 (UTC) [edit] Bots in a trial period [edit] Bots that have completed the trial period [edit] Approved requests Bots that have been approved for operations after a successful BRFA will be listed here for informational purposes. No other approval action is required for these bots. Recently approved requests can be found here (edit), while old requests can be found in the archives.
[edit] Denied requests Bots that have been denied for operations will be listed here for informational purposes for at least 7 days before being archived. No other action is required for these bots. Older requests can be found in the Archive. [edit] Expired/withdrawn requests These requests have either expired, as information required by the operator was not provided, or been withdrawn. These tasks are not authorized to run, but such lack of authorization does not necessarily follow from a finding as to merit. A bot that, having been approved for testing, was not tested by an editor, or one for which the results of testing were not posted, for example, would appear here. Bot requests should not be placed here if there is an active discussion ongoing above. Operators whose requests have expired may reactivate their requests at anytime. The following list shows recent requests (if any) that have expired, listed here for informational purposes for at least 7 days before being archived. Older requests can be found in the respective archives: Expired, Withdrawn. |