Welcome to DEiXTo!
DEiXTo (or ΔEiXTo) is a powerful web data extraction tool that is based on the W3C Document Object Model (DOM). It allows users to create highly accurate “extraction rules” (wrappers) that describe what pieces of data to scrape from a website. DEiXTo consists of three separate components:
- GUI DEiXTo, an MS Windows™ application implementing a friendly graphical user interface that is used to manage extraction rules (build, test, fine-tune, save and modify). This is all that you need for small-scale extraction tasks.
- DEiXToBot, a Perl module implementing a flexible and efficient Mechanize agent (essentially a browser emulator) capable of extracting data of interest using GUI DEiXTo generated patterns. It contains the best-of-breed Perl technology and allows extensive customization. Thus, it facilitates tailor-made solutions.
- DEiXTo CLE (Command Line Executor), a stand-alone, DEiXToBot-based, cross-platform utility that can massively apply an extraction rule on multiple target pages and produce structured output in a variety of formats.
“Quickly and easily pulled data from tables on the web. Used the GUI version to build a “model” that I could then use with a headless batch file using the CLE version. Saved me several days of programming! So easy to use! Fotis and Kostas were both so willing accommodate a special request I made. You guys are great! I can’t wait to try it on another project!”
Andrew F.
DEiXTo can contend with a wide range of websites with high precision and recall. It provides the user with an arsenal of features aiming at the construction of well-engineered extraction rules. Wrappers built with GUI DEiXTo can be scheduled to run automatically providing automated access to resources of interest and saving users a lot of time, energy, and repetitive effort.
DEiXTo is developed by Fotis Kokkoras and Kostas Ntonas.
ΔEiXTo is an acronym for Data Extraction Tool.
First of all, Δ is the equivalent of D in Greek. Now, you are probably wondering what is this “i” character all about. Well, in Greek “ΔEIXTO” (pron. dechto) is the imperative form of “point at” which is what the DEiXTo user does inside a browser window when he starts building a DEiXTo extraction rule. Now you know… 😉