If making extraction rules isn’t your 2nd nature, we can do it for you. Most of the time we do it for fun but if your extraction task is complex and takes much time, then we do charge a bit. For hard cases which require multiple patterns and cooperative wrappers, we could probably write customized DEiXToBot-based Perl scripts that can do the job. Tell us (with email) what exactly is the task, by providing the URL and describing the parts of the page you are interested in.
We can definitely help you to:
- make your legacy web-site smartphone-friendly
- quickly populate product catalogues with full specifications
- monitor prices of the competition
- transform the contents of a digital library into OAI-PMH or another suitable format
- get reliable feed from stock markets
- prepare large, focused datasets for scientific tasks (i.e. data mining)
- perform data mining tasks for you (classification, clustering, association rules, opinion mining, summarization, etc)
- build alerting web services that inform you when something, somewhere on the web, changes.
- build personalized information applications for you or your clients
- extract and summarize large volumes of text
- <your extraction task goes here!>
“Quickly and easily pulled data from tables on the web. Used the GUI version to build a “model” that I could then use with a headless batch file using the CLE version. Saved me several days of programming! So easy to use! Fotis and Kostas were both so willing accommodate a special request I made. You guys are great! I can’t wait to try it on another project!”
Getting data from many unstructured web pages, probably in a repetitive fashion with extensive copy-paste operations, is tedious and time consuming. Wouldn’t it be nice to define the content you want from a web page once and then have an application to do the laborious job for you? We, at deixto.com, can help you accomplish this or even better, do it for you for free or at a small cost!
We can provide you with either the extraction rules to gather the data by yourself or with the clean, structured data. Check out some happy DEiXTo users!