2017年1月5日星期四

Scraping Stock Data on Yahoo Finance

Octoaprse enables you to scrape finance data from financial websites. There're two parts for getting the real-time data in Octoparse - Make a scraping task and schedule a task to run it in Octoparse cloud.

In this web scraping tutorial we will scrape the stock data - such as most active stocks, stock gainers and stock losers on Yahoo Finance with Octoparse.
The website URLs we will use are as follows.
http://finance.yahoo.com/most-active
http://finance.yahoo.com/gainers
http://finance.yahoo.com/losers
The data fields include company name, company symbol, last price, change, ratio of change, volumes, average of volumes for 3 months, market capitalization and the extraction time.

You can directly download the task (The OTD. file) to begin collect the data. Or you can follow the steps below to make a scraping task to scrape stock data from Yahoo Finance. (Download my extraction task of this tutorial HERE just in case you need it.)

Part 1. Make a scraping task in Octoparse

Step 1. Set up basic information.
Click "Quick Start" ➜ Choose "New Task (Advanced Mode)" ➜Complete basic information ➜ Click "Next".
 

Step 2. Open the website URLs in the built-in browser orderly by creating a loop.

Drag a "Loop Item" into the workflow  choose "list of URLs" in the "Loop mode" ➜ Paste a list of website URLs you want to scrape in the " list of URLs" box ➜ Click "OK"  Click "Save".
A "Go To Web Page" action will be generated inside the loop and Octoparse will open the first website URL. When you click the loop box, the list of website URLs will be shown on the “Loop Item” box.
 

Step 3. Move your cursor over the section within the table, where you would extract stock data from many companies.

Navigate to the "Go To Web Page" action and wait until the website URL has completely loaded before creating a loop to extract all the companies.

Click the first company ➜ Click the "Expand the selected area" button to select the whole row ➜ Create a list of sections with similar layout. Click "Create a list of items" (sections with similar layout). ➜ "Add current item to the list".

Then the first company has been added to the list. ➜ Click "Continue to edit the list".

Click the second company ➜ Click the "Expand the selected area" button to select the whole row ➜ Click "Add current item to the list" again (Now we get all the companies with similar layout) ➜ Click "Finish Creating List" ➜ Click "loop" to process the list for extracting the detailed information from these companies.

Step 4. Extract the stock information from the table.

Right click the company symbol ➜ Select "Extract text". Other contents can be extracted in the same way. 
All the content will be selected in Data Fields. ➜ Click the "Field Name" to modify. Then click "Save".
 

Note: Right click the content to prevent from triggering the hyperlink of the content if necessary.

Step 5. Click "Save" to save your configuration. Then click "Next" ➜ Click "Next" ➜ Click "Local Extraction" to run the task on your computer. Octoparse will automatically extract all the data selected.


Step 6. The data extracted will be shown in "Data Extracted" pane. Click "Export" button to export the results to Excel file, databases or other formats and save the file to your computer.
 



Part 2. Schedule a task and run it on Octoparse's cloud platform.

After you perfectly made the scraping by following the steps in this web scraping tutorial, you can schedule your task to run it in Octoparse cloud.

Step 1. Find out the task you've just made ➜ double click the task to open it ➜ keep clicking "Next" until you are in the "Done" step ➜ Select the option “Schedule Cloud Extraction Settings” to begin the scheduling process. 


Step 2. Set the parameters. 

In the “Schedule Cloud Extraction Settings” dialog box, you can select the Periods of Availability for the extraction of your task and the Run mode - running your periodic tasks to collect data with varying intervals.
 · Periods of Availability - The data extraction period by setting the Start date and End date.
 · Run Mode - Once, Weekly, Monthly, Real Time 
We can set a suitable time interval to collect the stock and click "Start" to schedule your task.

After you click "OK" in the Cloud Extraction Scheduled window, the task will be added to the waiting queue and you can check the status of the task.




Author: The Octoparse Team
- See more at: Octoparse Tutorial

标签: , ,

0 条评论:

发表评论

订阅 博文评论 [Atom]

<< 主页