Octoparse offers the most convenient way to scrape data from websites. Although few programming knowledge is required, some still claim that they have no ideas about how to use Octoparse. Thus this post aims to help our lovely new users to settle into Octoparse smoothly.
Below you will find links to 10 of the most helpful tutorials that will support you to make a first step in Octoparse. These guides will not only help you in scraping different kinds of website structures, they will also show you some tips to make Octoparse more user-friendly, how you can move forward better with it.
- What is Octoparse?
Let’s start at the very beginning. This tutorial Octoparse Introduction
introduces you the elements that make Octoparse so great. It would help you understand the functions and familiarize yourself with the workflow built into creating a crawler in Octoparse yourself. Click further on the link and watch the video tutorial patiently to each section and menu to discover the specific actions that are available to configure a task in Octoparse.
- How to scrape websites with pagination?
Pagination feature is usually used to divide records into pages when the amount of records shown on a website is extremely huge. Therefore, it is quite common for you to handle with the pagination in data extraction. This tutorial Scrape Data from Websites with Pagination (Query Strings)
shows you exactly how to add actions in the Workflow Designer to get the information from websites with pagination.
- How to scrape websites requiring login?
- How to scrape websites by automatically searching the key words?
- How to get information from drop-down menus?
Drop-down menus are usually used in websites, where the contents are dynamically linked to what you choose in the drop-down list. This tutorial Scrape Web Data from A Drop-Down Menu
shows you values from drop-down menus could also be extracted in Octoparse.
- How to get data in seconds?
Octoparse Smart Mode allows users to get data in seconds by lowering the barrier to entry for anyone who need data. The tutorial Octoparse Smart Mode -- Get Data in Seconds
will show you how to extract all of the data without having to configure an extraction rule in Octoparse. But before you move on, remember that Octoparse now is only available to those websites with list information.
- How to extract data with certain URLs?
Sometimes you may just want to scrape information with several certain URLs. Octoparse allows you to do that like the actions you take in the search engines. The only thing you need to do is to follow the tutorial Extract Data from A List of URLs with Similar Web Content Layouts
to learn how to get the data you want from URL lists. It’s absolutely simple!
- How to extract information from detail web pages?
You would find that most of websites select a record in one page and display other related information on another page, namely, the detail page. For example, you would find that you should double click the link of the product to go into the detail page on Amazon. To precisely get these information, you could follow this tutorial List & Detail Web Page - Advanced Mode
- Why and how to manually modify the XPath?
XPath is usually required in Octoparse in certain cases. One concerns the missing data solution in Octoparse. This tutorial Modify XPath Manually in Octoparse
could give you guides in extracting data using XPath.
- How to be precise with configuration rule?
The visual workflow interface allows you to maximum configure your own rule in Octoparse. When you want to make sure your rule stays accurate, all you need to do is to manually check the rule in the workflow interface. Follow this tutorial Check The Extraction Rule When Errors Occur
to make sure you’ll never have issues with running the task.
Ready for more? Visit the Octoparse Tutorial
for useful how-to guides on all things to learn how you can get the most out of using Octoparse.
Author: The Octoparse Team
- See more at: Octoparse Tutorial
标签： Big data, Business, Octoparse, tutorial, Web scraping