Sunday, March 20, 2011

Use ScraperWiki to Help Turn Web Pages Into Usable Data [Programming]

Source: http://lifehacker.com/#!5783602/use-scraperwiki-to-help-turn-web-pages-into-usable-data

Use ScraperWiki to Help Turn Web Pages Into Usable DataA scraper is a program written to take content off of a webpage or other data source and turn it into some kind of usable format, usually an RSS feed or by entering it directly into a database. Designing a scraper can be tricky as each site is different, ScraperWiki aims to fix this by creating a repository of these scripts with a goal to ease the pain of designing them.

An example use of a scraper: let's say a government entity releases daily information regarding finances, and you want to graph or otherwise track this data for personal or business use. Going to the website each day and entering the data manually is certainly one labor-intensive way to do it, but as with any good hacker will tell you - if you have to do anything more than once it is better to automate it.

ScraperWiki is a centralized location for these custom built scrapers. Instead of writing your own from scratch, you can search their database to see if a scraper has already been written for a source.

One of the functions of ScraperWiki is to support open government initiatives. The Big Clean is actually being held today with the goal of opening local government data with the help of scrapers and data processors.

Scrapers are categories by language - PHP, Python and Ruby - and the site is currently in beta.

Use ScraperWiki to Help Turn Web Pages Into Usable DataScraperWiki