2016年12月29日星期四

Web Scraping|Scrape Booking Reviews

 
(picture from www.luxurybackpacker.com)

Collecting online customer reviews, including star ratings, comments, likes, dislikes, images, videos, share channels and etc, can help an online retailer to better understand if the product sold is a good purchase and popular among customers, thus to adjust marketing strategies. There are many web scraping tools available online to live up to your expectations to scrape data from websites.
In this article we will talk about the key points to scrape customer reviews about the hotels in Tbilisi City from booking.com with Octoparse. We won’t provide specific steps for making the scraping task and if you want to learn how to make such a scraping task or want to get other types of customer reviews from booking.com, we offer the extraction services for you to suit the needs. Please contact us via support@octoparse.com.

We’ve made the scraping task and you can directly download the .otd file (What is an OTD. file?) to begin collect the hotel reviews from booking.com. (Download my extraction task of this article HERE just in case you need it.) 
The OTD. file is available only in Octoparse. You can Download Octoparse before downloading the scraping task.

Please click HERE to open the website URL we used.
The data fields include hotel name, hotel address, star rating, customer name and comments posted by the customer.
The scraping task we’ve made in Octoparse is looked like this.
We will go to the detail page of each hotel and get the reviews under the “Read all trusted reviews” tab.


Since sometimes the actual number of reviews are more than what is shown on the detail page,  we will need to get all the reviews from all the countries displayed. Therefore, we clicked the plus button to display all the countries in which the consumer were located.

In Octoparse, we will create a list of items to extract all the countries. The Xpath for the loop will extract extra elements from the web page so we need to modify the XPath and let the XPath expression to select the elements correctly.

Since we all know that all the elements will be extracted by clicking the elements when you create a list of items in Octoaprse, and the booking.com website will select the first country in the pop-up window by default, thus the first country will therefore be unselected when you create a loop for these countries.
In this case, we need to select the first country by clicking the checkbox of the first country and Octoparse will generate a “Click Item” action in the rule.

All the customer reviews about the hotel will be extracted by countries.
Since there are anonymous customer accounts and reviews, so the extraction output will have duplicate data records. You can export the data by choosing only the valid data.



Author: The Octoparse Team
- See more at: Octoparse Blog

标签: , , ,

0 条评论:

发表评论

订阅 博文评论 [Atom]

<< 主页