Technology

“GUI DEiXTo” feature list

  • user friendly graphical interface – no programming required
  • enhanced, tree based, extraction rules (wrappers)
  • HTML tag filtering (sometimes, ignoring some tags makes life easier)
  • can sustain structural variations in HTML source code of the record instances
  • fast, flexible and high performance tree pattern matching algorithm
  • most of the time, 100% precision and recall can be achieved
  • automatic simple form submission
  • multi-record, multi-page, many-urls extraction modes
  • regular expression support
  • can follow “Next Page” links with adjustable crawling depth
  • can create RSS feeds from any web source
  • can export results to XML and tab delimited formats
  • can extract text, URLs and html source code
  • XML encoded wrapper project files (.wpf) – can be executed at will
  • wrapper files are compatible with DEiXTo Executor
  • command line execution to schedule extraction tasks with MS Scheduler
  • last but not least, it’s freeware!

“DEiXTo Executor” feature list

  • portable, efficient and fast command line executor of GUI DEiXTo wrappers
  • provides options and flexibility that you cannot get with GUI DEiXTo
  • supports additional output formats such as CSV, Excel and OpenDocument Spreadsheet (.ods).
  • provides database support via DBI (the Database independent interface for Perl) and a dbconfig file
  • supports HTML output using an HTML template processor and an editable template file
  • command line options can override those in wpf files
  • overwrite, append and prepend output modes for all supported formats
  • proxy support
  • can be scheduled to execute wrappers automatically (e.g. using cron in GNU/Linux)
  • can sleep random time intervals between http requests to avoid making webmasters mad..
  • it is free and open source, distributed under the GNU General Public License (GPL) Version 3!

 

“Since we discovered DEiXTo, we realized this was the tool we needed to crawl the web in search of any piece of information related to our clients. Its simplicity makes it easy to learn and you get results from your first attempt. We have made a set of robots and via executor, they extract information from several web resources to create a custom report.”

gestiondereputacion.com

Comments are closed.