2018年1月4日星期四

Extract Reviews - Dealing with "Show More" Buttons

Product reviews are importance resources for both sellers and buyers. Sellers find about how their products are rated by users while buyers generally spend much time wading through pages of reviews in order to find out whether a product is worth buying. 
Many Octoparse users are extracting reviews on daily basis. One of the most frequently asked question is how to deal with "load more" button when it is required to make visible of the full review content instead of the first few lines.  
It is actually extremely easy to solve this problem in Octoparse: just make a loop to click those "load more" buttons one by one before extracting the reviews.

Let’s look at an example for Walmart (example URL):
Looking through the reviews on Walmart.com, you can easily spot the “Read More” button showing right below some of the reviews. 


What we need to do is really to have the program click open all the "Read more" button all together, so we'll have the complete version of all the reviews. Then, we'll proceed with an extract action for all the reviews. Follow the steps below, 
  • Drag a Loop Item into the workflow after opening the webpage in Octoparse
  • Choose "Single element" in Loop Mode
  • Enter the XPath of "Read More" button (//BUTTON[text()='Read more'])*
  • Click "Save"
*Notice the XPath used here only applies to this particular example. User should find out the suitable XPath to use for different webpages.  The selected XPath must be capable of locating all the "Read More" buttons on the page (click here to learn more about XPath)
  • Drag a Click Item into the Loop Item
  • Tick "Click items in the loop"
  • Tick "Load the page with AJAX" and select a proper AJAX timeout
  • Click "Save"

  • Next, make a loop list of all the review sections. 
  • Drag the review loop item out of first Loop item, then re-position it to right below the first loop
  • Click on "Extract Data" action, then click to extract any sub-elements (such as reviewer, review date, comment etc) from the first review section outlined in the built-in browser. 
  • Rename the data field if needed

In this way, Octoparse will click all the "Read More" button before extracting the reviews to make sure all reviews contents are captured completely. 

To learn more about scraping reviews, refer to these tutorials:


Author: The Octoparse Team

0 条评论:

发表评论

订阅 博文评论 [Atom]

<< 主页