Arachnode.net Information & Arachnode.net Links at HealthHaven.com
advertise
add site
services
publishers
database
health videos
Bookmark and Share

search wiki for    ?
web dir firms image gallery news pdf wiki shop video 
about
toolbar
stats
live show
health store
more stuff
JOIN/LOGIN
Featured Results:
clinicallab.net: The Leading Medical Lab Site on the Net
clinicallab.net: The Leading Medical Lab Site on the Net
clinicallab.net
 StratOG.net | About StratOG.net
StratOG.net | About StratOG.net
stratog.net
 
arachnode.net
arachnode.net homepage
arachnode.net homepage
Developer(s) Mike Anderson
Stable release 1.2 / 2009-08-02; 4 months ago
Written in C#
Operating system Windows
Type Web crawler
License GPL
Website http://arachnode.net/

arachnode.net is a .NET web crawler written in C# using SQL 2008 and Lucene and is released under the GPL.

[edit] Features

  • Lucene integration allows for full-text searching through a familiar web interface.
  • Can be configured to run any number of threads and to use as much or as little processor time and memory as required.
  • Plug-in architecture allows for custom pre- and post-request crawl rules and actions without source recompilation. The existing crawl rules and actions architecture easily enables crawling enhancements such as federation, partitioning and distributed caching.
  • Pre- and post-request rules governing address and content filtering, robots.txt behavior, request frequency and crawl depth.
  • 250 stored procedures, views and functions designed for use with SQL Server Analysis Services and other business intelligence software. These procedures and views address trending, popularity, term extraction, phrase extraction and many other common analysis and reporting needs.
  • Pre-configured with several SSIS procedures to extract and prepare key information from collected data for text mining and analysis.
  • Extract, store, and index all discoverable Exif data fields from discovered images.

[edit] Applications

Content Aggregation: Use for personal content aggregation, crawling intranets of any size or crawling the Internet as a whole. Discovered content is parsed and stored into multiple configurable forms and locations.

Research and Analysis: Extract, collect and parse downloaded content into multiple forms, including XML. SSIS packages and Common Language Runtime functions extract terms and phrases from text content, and provide over 250 stored procedures, views and functions to jumpstart SQL Server Analysis Services or other text mining applications.

Search: Discovered content is indexed and stored in Lucene indexes and can be searched through a familiar Web interface.

Text Mining: Extract words, phrases, tags and text from discovered content.

Education: Learn introductory to advanced crawling techniques, and features of the .NET Framework and SQL Server 2008, including full-text indexing, multi-threading, caching, reflection, interfaces, object-oriented concepts, SQL common language runtime functions and regular expressions.

[edit] External links




Product Results (view all...)

search wiki for    ?
web dir firms image gallery news pdf wiki shop video 



↑ top of page ↑about thumbshots