November Linkscape Update is Live; Binary Files Issue Dramatically Reduced

On Thursday (November 3rd) of this past week, Linkscape’s index updated (in record time – just 3 weeks). New link data’s once again available in OpenSiteExplorer, via the SEOmoz API and in the Mozbar. Here are the stats for this latest index update (our 46th index update):

  • 43,077,387,028 (43 billion) URLs
  • 480,597,551 (480 million) Subdomains
  • 105,570,741 (105 million) Root Domains
  • 356,255,241,471 (356 billion) Links
  • Followed vs. Nofollowed
    • 2.18% of all links found were nofollowed
    • 58.21% of nofollowed links are internal, 41.79% are external
  • Rel Canonical – 10.46% of all pages now employ a rel=canonical tag
  • The average page has 77.28 links on it (down .19 from last index)
    • 64.86 internal links on average
    • 12.42 external links on average

Since August, we’ve been struggling with the particularly devious problem of binary files in the index messing up link counts and showing links that Google + Bing probably are not counting. In September’s crawl, we put a black list on these files and saw a reduction of ~40% in binary files. This time, we’ve made even more progress (though it’s tough to know exactly how much – we’re continuing to evaluate) and you should see a signifcant reduction in these binary files.

ccea8 ose binary fix November Linkscape Update is Live; Binary Files Issue Dramatically Reduced

In part because of the reduction in these files, processing time for the Linkscape index was reduced, enabling us to produce a much faster index update. However, we’re planning in December to produce a much larger index and thus anticipate processing time to rise back up. On the plus side, this will mean a lot more link data. In 2012, we’re aiming to reach into the 100billion+ URL index size, closer to what we’ve heard Bing + Google keep in their main indices (~120-140 billion URLs).

As always, feedback on the new index is greatly appreciated – if you’re seeing stuff we’ve missed, files we shouldn’t have crawled or metrics that feel wrong, please let us know. Our engineers would love to hear from you.

Article source: http://feedproxy.google.com/~r/seomoz/~3/Ep6xK0yJKns/november-linkscape-update-is-live

This entry was posted in Website Marketing and tagged , , . Bookmark the permalink.

Comments are closed.