2017年9月1日星期五

Web Scraping Service vs. Automatic Web Scraper: Which is the best option for web scraping?


What is web scraping?
Web scraping aka. web extraction or web crawling refers to the process of obtaining various unstructured information from any websites and turn it into structured, clean data such as xls, csv, or txt or populate the captured data to a database directly. Some common uses of web scraping include lead generation, data collection for academic researches, price monitoring from competitors’ websites, product catalogue scraping and many more. For all kinds of good reasons people turn to web scraping and can get pretty confused about which is the best path to go. In this article, I will try to walk through the Pro’s and Con’s of both web scraping service and automatic web scraper.  

What are some web scraping options?
When it comes to web scraping, there are two major kinds of providers available in the market, scraping tool provider and scraping service provider. Product provider basically refers to the many so called web scrapers or web extractors, examples are import.io, Octoparse, Scrapy and others. Some of these products are easier to handle for non-technical users such as Octoparse and Import.io. Some require more programming background such as Scrapy and Content Grabber. For those running on a service model, they are commonly known as DaaS, short for Data as Service. These companies do all the scraping work themselves and will provide the data to you in any formats you like in any frequencies; they will even provide weekly/monthly data feeds to you via API if needed. A few well known ones are Scrapinghub, Datahen and Data Hero etc. Among these there are also companies that provides scraping tool and provide scraping service at the same time, Mozenda scraping service and Octoparse Scraping Service. Just because they offer self-customizable scraper doesn’t mean their scraping service is any less proficient than those only do scraping service. In fact, data service provided by crawler companies can be a lot more cost efficient and are much more friendly to one-time scrapes because obviously they have the edge in owning a customizable scraping tool and only minimum manual intervention will be required.

So what it the essential difference between using a DIY web scraper and seeking help from a web scraping company? While there are many the most critical ones are,
  1. Cost
  2. Willingness to learn
  3. Deadline
  4. Complexity of the scraping project

If you are a student looking to scrape some public data to support your thesis research with a tight budget, a scraping tool will be the best way to go; If you are an enterprises looking to outsource a brand monitoring project running on a tight schedule, data scraping service will provide you with what you need. While these are only two obvious examples of how people of different groups will find themselves at more advantages using one product/service over another, they should give you a general feeling of how to approach this question by going through your specific demands, budget, schedule, project complexity and etc.

Comparing web scraping alternatives: 

Web Scraper SaaS Service
Professional Data Service (DaaS)
Data Service provided by Crawler Company
Pricing
$60 ~ $200 per month
$350 ~ $2500 per project +
$60 ~ $500 monthly maintenance fee if applicable
$100 ~ $2500 per project +
$60 ~ $300 monthly maintenance fee if applicable
Turnaround
depending on your 
 efforts
3 ~ 10 business days
1 ~ 10 business days
Format of data delivery
Most supports export to  xls, csv, html, txt, Json, xml
Most support csv, html, Json, xml
Most support csv, html, Json, xml
Database, API supported
Depends on the specific product
Yes
Yes
Dealing w/ Complex Website
(java script, ajax etc)
depends on the specific tool
Supported most of the time
Supported most of the time
Mass scale scraping
good volume for low cost if you can get what you need with the scraper
Scalable scrape but cost increases as volume goes up
Scalable scrape but cost increases as volume goes up
Support Customized Request
Self help
Highly Flexible
Highly flexible most of the time
One-time Request Friendly
Yes, pay as you go
Mostly No
Yes
Customer Support
Busy support, some are really helpful
depending on the product
Pretty responsive most of the time
High Priority Support

 
Are you ready to scrape?
Just like everything else, there are Pro’s and Con’s with either a web scraping service or a data scraping tool. Whichever is the better option will largely depend on the specific schema, data application and project budget. Do go through your request thoroughly, carry out the necessary research on the products/services available in the market - all these will be essential to finding the best web scraping solution tailoring to your scraping needs.

That's all I have for now. Feel free to drop a message here if you have any specific questions with any web scraper or service. Cheers!

Related Reading:

0 条评论:

发表评论

订阅 博文评论 [Atom]

<< 主页