Knowledgebase: Create Client Reports
Speeding Up Data Gathering in Places Scout
Posted by Mark Kabana on 21 August 2013 03:04 PM
This article will cover how to speed up the data gathering process in Places Scout, as many users have expressed concerns about the speed at which Places Scout gathers data.

 

Background - How Places Scout Gathers Data
Places Scout gathers data in a human-like fashion, using a real browser with random delays between interactions to simulate human-like browsing activity. This ensures that your IP never gets banned or flagged as a bot by google, and also so that the data is 100% accurate. The downside to this approach is that it does take longer to gather data, but i do it this way because of lessons learned in the past.

When i first released Places Scout, i had it gathering data in parallel hammering away at Google and returning results with lightening quick speed, but they soon caught on and started returning inaccurate results compared to what you see in the browser when manually verifying, which is why i had to change the code to use a real browser and simulate human browsing activity.

From an SEO perspective, accuracy is more important than speed, which is why i chose that route. And doing this way i can tell you that your IP won't get banned and the data will be 100% accurate, which is more important than getting the data quicker and having IP's get banned and the results not being accurate. I have the skills to write parallel code that gathers data super fast, but again, its not the right way to do it.

With that being said, there are a few things you can do a few things to speed up the data gathering:

Data Gathering Settings


  1. Reduce the Random Google Delay upper / lower bound settings to the lowest allowed of 750 / 1000 ms.

  2. Uncheck the 'Use Real Browser to Gather Data' setting. By doing this, Places Scout won't use a real browser which saves lots of time and overhead with controlling the browser, and thus reduces the amount of time it takes to gather data...though i don't recommend doing large data gathering operations with this setting unchecked, as using a real browser is the only way to ensure that you pass all of Google's security checks and make the data gathering look as human-like as possible.


Find Local Clients Specific Notes


  • Ensure that you only gather the data that you need, as the more data items you select under the 'Select Additional Data to Gather' drop down menu button, the longer it will take to gather all of the data you have selected for each result. It is not wise to just blindly choose to Select All the Additional Data items unless you really need this data

 

  • Some Additional Data items take longer than others. Specifically, the following data items take the longest to gather (in order of longest to shortest):

 

1) Perform Citation Analysis for Business Website - Takes the longest out of any data item

2) Gather Reputation Ratings for Business - This is an extremely intensive data gathering process, and does take a long time to get this data. We recommend only choose the sources that you need - Selecting all reputation rating sources will take an extremely long time to gather this data for each result

3) Gather Business Owner Name and Business Data - Gathering Business Owner name data from Manta is a lengthy process for each result. The 'Manta Delay Between Request' setting influences the speed as well. Having a higher setting (3000 - 5000) will help your proxies stay alive longer, but will greatly slow down the process.
Get Email Address for Business - This one takes a while because we check up to 6 different data sources for an email address per result.
 
4) Gather SEOMoz Competiton Data for Business Website - Getting SEOMoz data takes a while as they have mandatory delays between requests, no way around this.

5) Get Facebook Fan Page Data for Business - This one times some time because we have to find the business' fanpage and then make sure its the right one.

 

Keep the above notes in mind about each data item when choosing what additional data to gather, as the above items will greatly increase the amount of time it takes to gather this data for each result if selected.

 

Bear in mind too that the number of keywords you are searching for and the number of results to gather setting influence the speed as well. If you have 5 keywords and choose the top 100 results per keyword, we have to gather all the selected data for 500 results.

 

If you are having problems with getting the data in a timely fashion with multiple keywords, it is advisable to split the data gathering into smaller runs of 1 keyword at at time.

 

Rank Tracker Specific Notes


  • The most influential settings that affect how long it takes to run reports is the 'Number of Organic Results to Gather' and 'Number of Places Results to Gather' settings under the 'Report Settings' tab when creating / editing a report.

    • These settings tell Places Scout how many pages of SERP results to gather for each keyword, and have a dramatic effect on how long it takes to gather SERP data

    • For example, if you choose to gather the top 100 organic and places results versus the top 10 organic and places results, it will take 20 times longer per keyword to gather the SERP data, as we now have to crawl 10 pages of results in both the organic and places section (20 total SERP pages) per keyword instead of just the first page. And this is for every keyword in the report, so if you have 50 keywords in the report, gathering the top 100 organic and places results means we have to gather 50 keywords x 20 SERP result pages per keyword = 1000 SERP result pages, which is alot of data gathering and does take time to complete. In comparison, choosing to only gather the top 10 results means we gather 2 SERP result pages per keyword, so 50 keywords x 2 = 100 SERP result pages, which is 10 times less than choosing 100 top results.

    • What is recommended is that you only gather the number of results necessary to properly track your client's rankings. What you can do is the first time you run the report, track the rankings in the top 100 organic and places results. This first run will take a little longer, but it will give you a nice benchmark as to what you should set these settings to. Once the results come back, analyze the rankings for each keyword in the organic and places. Take note of the client's lowest organic and places ranking for all keywords. For example, the client's lowest organic ranking is 36 and lowest places ranking is 18 out of all the keywords tracked. Since they do not rank beyond the top 40 organic results, gathering 100 organic results is overkill. You can safely change the 'Number of Organic Results to Gather' to say 50 or 60 incase they do drop, give your self a little buffer. Similarly for the Places Results, they do not rank beyond the top 20 places results, so gathering 100 places results is overkill. You can safely change the 'Number of Places Results to Gather' to say 30 or 40 incase they do drop, give your self a little buffer.

 

Keeping the above in mind should help you configure Places Scout in the most optimal way to gather data as quickly as possible.

(2 vote(s))
This article was helpful
This article was not helpful

Comments (0)
Help Desk Software by Kayako Fusion