“GUI DEiXTo” feature list
- user friendly graphical interface – no programming required
- enhanced, tree based, extraction rules (wrappers)
- HTML tag filtering (sometimes, ignoring some tags makes life easier)
- can sustain structural variations in HTML source code of the record instances
- fast, flexible and high performance tree pattern matching algorithm
- most of the time, 100% precision and recall can be achieved
- automatic simple form submission
- multi-record, multi-page, many-urls extraction modes
- regular expression support
- can follow “Next Page” links with adjustable crawling depth
- can create RSS feeds from any web source
- can export results to XML and tab delimited formats
- can extract text, URLs and html source code
- XML encoded wrapper project files (.wpf) – can be executed at will
- wrapper files are compatible with DEiXTo Executor
- command line execution to schedule extraction tasks with MS Scheduler
- last but not least, it’s freeware!
“DEiXTo Executor” feature list
- portable, efficient and fast command line executor of GUI DEiXTo wrappers
- provides options and flexibility that you cannot get with GUI DEiXTo
- supports additional output formats such as CSV, Excel and OpenDocument Spreadsheet (.ods).
- provides database support via DBI (the Database independent interface for Perl) and a dbconfig file
- supports HTML output using an HTML template processor and an editable template file
- command line options can override those in wpf files
- overwrite, append and prepend output modes for all supported formats
- proxy support
- can be scheduled to execute wrappers automatically (e.g. using cron in GNU/Linux)
- can sleep random time intervals between http requests to avoid making webmasters mad..
- it is free and open source, distributed under the GNU General Public License (GPL) Version 3!
“The user interface is clean and efficient and the user manual looks more professional than most commercial user guides you see out there. With DEiXTo, I feel like I stumbled upon a hidden treasure!”