PhantomJS and DEiXTo

Recently we experimented with PhantomJS, a headless WebKit browser which can serve a wide variety of purposes such as web browser automation, site scraping, website testing, SVG rendering and network monitoring. It’s a very interesting tool that could be used in combination with DEiXToBot. More specifically, it can enable DEiXToBot agents to access hard-to-reach web pages, such as those requiring interactivity to be reached and making heavy use of JavaScript and AJAX calls.
Check out the blog article entitled: “PhantomJS & finding pizza using Yelp and DEiXTo!

Posted in News | Tagged , , , | Leave a comment

DEiXTo & the Global Cancer Collaboratory project!

We are glad to announce that DEiXTo will be used by the Open Health Systems Laboratory (OHSL) in their Global Cancer Collaboratory project. This project aims at aggregating information from cancer center web sites into some central VIVO hubs, in order to bootstrap the creation of research networks. A good bit of their labor will involve using DEiXTo to extract structured text from existing web sites. They will then transform it a bit and map it to the set of ontologies contained in the VIVO platform.

Posted in News | Tagged , , | Leave a comment

Hello (scraping) world (again)!

Here we are!

We moved our main DEiXTo site on a WordPress platform, aiming at organizing here all DEiXTo material. DEiXTo Wiki and probably DEiXTo Blog will be integrated here soon, as well. We also plan to allow other users to post here and contribute material (guides, how tos, tutorials, etc).

Until everything is fixed and properly set, we apologize for any visible mesh or site’s misbehavior!

Posted in News | 1 Comment

A Stemmer for the Greek Language

We are glad to host a Javascript based Greek Stemmer. It’s not our own work but we do plan to use it extensively in the near future. We will definitely make available any useful “side-product”. Stay tuned!

Posted in News | Tagged , | Leave a comment

DEiXTo powers a Code Search Engine

DEiXTo powers a federated, open source code search engine called OCEAN. A paper explaining the way this was possible will be presented in the 38th Euromicro Conference on Software Engineering and Advanced Applications (SEAA 2012).

Posted in News | Tagged , | Leave a comment