2017年1月19日星期四

9 FREE Web Scrapers That You Cannot Miss

(from http://www.octoparse.com/blog/9-free-web-scrapers-that-you-cannot-miss/)

(picture from gifaom.com)



I want to share with you in this post some best free web scrapers for non-programmers who want to gain insight from large data-set online at low cost. I know that there are many many web scrapers that are available online but the freewares I would mention below is what I think the best for scraping web pages. These freewares are easy to use and help a lot for my web scraping tasks.

In this scraping process, I realized that it’s necessary to know about XPath expressions to better select the elements on the web pages, or at least to know how to copy the XPath of the elements by using browsers’ developer tools.


Web Scraping Plugins/Extension

  • Data Scraper(Chrome)

Data Scraper is very simple yet useful to scrape data from tables and lists of a single page into CSV or XLS files.
It’s free for most of its features and has paid subscription plans to suit your needs. No coding required. But you need to use Google Chrome browser for installing the plugin. I will show you how to use this simple web scraping tool.

1. Open this link and add the plugin to Chrome.
After you successfully installed the extension, you can begin to scrape the data from the web page.

For instance, let's say we scrape the list of LinkedIn and retrieve some search results by searching for Marketing Associate jobs.

1) Right click on the URL and select "DataMiner - Get similar" from the context menu

2) The table will be automatically parsed.
3) You need to create a free Data Miner account to continue scraping the web page.
4) You can adjust the table/column selector with XPath.
5) Export your data to CSV file and open it in Excel or Google Sheets.

Learn more information about Data Scraper by visiting homepage https://data-miner.io/.

  • Web scraper (Chrome)

Web scraper is a great web scraper that is also available for Google Chrome browser for web scraping. It allows you to create a sitemap (plan) on how a website should be navigated and what data from the site should be scraped.
Just add the extension to your Chrome, and find it in the Developer tools.

You need to set up or import a sitemap(plan) to tell Web Scraper what kind of data you want to scrape according to the sitemap. It’s easy to use and quite simple once you use Chrome and get used to it. This freeware allows you to scrape data from multiple web pages and handle dynamic web pages, but it may not have many built-in automation functions.

It takes awhile for you to learn how to use this freeware, but you will get clean data from Web Scraper and can export the extracted data to a CSV file.

Visit homepage to learn more from the tutorials: http://webscraper.io/.

  • Scraper (Chrome)

Scraper is another easy-to-use screen web scraper that can easily extract data from online table into clean data. 
Just select some text in a table or a list, right click on the selected text and choose "Scrape Similar" from the browser menu. Then you will get the data and extract other content by adding new columns using XPath or JQuery. You can copy or export the clean data to XSL file or Google Docs.
The interface is similar to Data Scraper. You can add the extension here. https://chrome.google.com/webstore/detail/scraper/mbigbapnjcgaffohmbkdlecaccepngjd?authuser=2


Web Scraper Client

  • Octoparse

Octoparse is an easy-to-use yet powerful web scraper that can handle both static and dynamic websites with AJAX, JavaScript, cookies and etc. You can download the client and create a scraping task to extract data from any website such as site that required login, paginated sites, etc. All the information that is visible online and those hidden content that can be found in the source code could be extracted by Octoparse. Besides, it provides extraction services to help you create the scraping task or get the data you want. What’s more, the cloud services enables you to balk extract huge amounts of data within a short time by many cloud servers and you can get the data via Octoparse API.

This freeware only supports Windows system and is not available for other systems.

Octoparse has three modes - Smart, Wizard and Advanced.
  • Smart mode - Simply enter a URL and Smart it, you can get results in few seconds.
  • Wizard mode - Just follow the guideline and select the data that you want from the web pages by simple point-and-click.
  • Advanced mode - Find as much data as you can from the website that you want by creating a scraping workflow.

The basic plan (The free plan) enables you to create 10 scraping tasks and run 2 tasks in parallel in your machine, thus you can get a lot of data by using a suitable scraping task. You can import or download a scraping task from the tutorials(链接) to directly collect the data in your laptop.

The paid subscription plans will offer more sophisticated features such as API and many anonymous IP proxies that will faster your extraction and fetch large volume of data in real time.

Know more about how to get data in hand by visiting http://www.octoparse.com/


  • ParseHub

Parsehub is a great web scraper that supports collecting data from websites that use AJAX technologies, JavaScript, cookies and etc. Its machine learning technology can read, analyze and then transform web documents into relevant data.
The desktop application of Parsehub supports systems such as windows, Mac OS X and Linux, or you can use the web app that is built within the browser.
As a freeware, you can set up no more than five publice projects in Parsehub. The paid subscription plans allows you to create at least 20 private projects for scraping websites.

There are plenty of tutorials for you to handle Parsehub and you can get more information from the homepage. https://parsehub.com/

 
(picture from ugamarkj.blogspot.com)


  • Visual Scraper

VisualScraper is another great free web scraper with simple point-and-click interface and could be used to collect data from the web. You can get real-time data from several web pages and export the extracted data as CSV, XML, JSON or SQL files.

The freeware, which is available for Windows, enables you to scrape data from up to 50,000 web pages for only one user. Its premium plans starting from at least $10 per month allow you to scrape more than 100,000 web pages.

Besides the SaaS, VisualScraper offer web scraping service such as data delivery services and createing software extractors services.

  • Outwit Hub 

Outwit hub is a Firefox extension that can be easily downloaded from the Firefox add-ons store. Once installed and activated, it gives web scraping capabilities to your browser. Out of the box, it has data points recognition features that can make your scraping job easier. Extracting data from sites using Outwit hub doesn’t demand programming skills. The set up is fairly easy to learn. You can refer to our guide on using Outwit hub to get started with web scraping using the tool. As it is free of cost, it makes for a great option if you need to scrape some data from the web quickly.

Web-based Scraping Application

  • Dexi.io(formerly known as CloudScrape)

As a browser-based web scraper, Dexi.io allows you to scrape data based on your browser from any website and provide three types of robot for you to create a scraping task - Extractor, Crawler and Pipes. After you sign in to Dexi.io, you will turn to https://app.dexi.io/#/ and come to the simple user interface. Click on the “New robot” to start setting up your scraping robot and extract data in real time. It may also takes you awhile to get used to its web scraping robot. Check out their homepage to learn more about the knowledge base

The freeware provides anonymous web proxy servers for your web scraping and your extracted data will be hosted on Dexi.io’s servers for two weeks before the data is archived, or you can directly export the extracted data to JSON or CSV files. It offers paid services to meet your needs for getting real-time data.

  • Webhose.io

Webhose.io enables you to get real-time data from scraping online sources from all over the world into various, clean formats. The web scraper enables you to scrape data by keywords in many different languages using multiple filters and you can save the scraped data in XML, JSON and RSS formats.
 

The freeware offers a free subscription plan for you to make 1000 HTTP requests per month and paid subscription plans to make more HTTP requests per month to suit your web scraping needs.

Visit the homepage https://webhose.io/ to know more about their services, sign in an account with your company email and start scraping huge amounts of data now.


Author: The Octoparse Team



Download Octoparse Today


For more information about Octoparse, please click here.
Sign up today!

Author's Picks


标签: , , ,

0 条评论:

发表评论

订阅 博文评论 [Atom]

<< 主页