Google

Who will dare to call Google’s bluff?

As Matt Brittin, the new head of Google UK, said in an interview in the NMA back in May (which, ironically, given the subject of this post, you can’t read online unless you subscribe, so no link I am afraid) – it is always disappointing to see newspaper publishers spit their bile at Google for ‘stealing’ their content when Google works with them as partners to help them drive traffic to their sites.

Of course Google make money from this but so do the newspapers. Equally, it has been well publicized that the free content model isn’t making enough revenue for papers but I, for one, don’t think that is Google’s fault. Google shouldn’t be made victims of their own success when the UK has too many newspapers and too few of those are able to realise the way the media landscape is changing and that they don’t have the monopoly on content that people want to read any more.

So Google have pointed out that if publishers don’t want their content to be read for free they can adjust their robots.txt file to stop their sites being crawled.

http://www.guardian.co.uk/media/pda/2009/jul/16/google-newspapers-robots-block-suggest

Will anyone call Google’s bluff? I hope not. As Google News Manager Josh Cohen says in the above article "Some proposals we’ve seen from news publishers are well-intentioned, but would fundamentally change – for the worse – the way the web works."

Some old newspaper barons just need to learn that their time is swiftly drawing to a close. Charging for specialist content is one thing but anything further is only going to see users simply go somewhere else to find what they want.

As for the MPs talking of Google as a monopoly – this implies that users don’t have a choice when it comes to search which is obviously rubbish. After all, their competitors are only ever a click away …

Tags:

0 comments Add This

Google Toolbar PageRank update 23rd June

Google seems to be switching from 3-4 month gaps between Toolbar PageRank updates to a much shorter interval – we’ve just had another update, less than a month after the last one.

Google updated its “Toolbar PageRank” (the PageRank values shown in the Google Toolbar) on the 23rd of June. This is the second update in a row to come earlier than expected – the previous Toolbar PageRank update happened less than two months after the one preceding it. The gap this time around is even smaller – less than one month since the last such update.

Historically, Google has updated its Toolbar PageRank values roughly every 3-4 months (prior to the last update, which came earlier than expected, the previous PageRank updates happened on April 1st, 31st December and, before that, in September). Are we seeing the start of a shift towards more frequent updates?

Please note that the Toolbar PageRank does not necessarily reflect the current standing of a page – Google continually updates PageRank values internally (at least every day), but only provides a “snapshot” every so often.

Tags: ,

0 comments Share

How old are Toolbar PageRank values?

Google only updates the PageRank values seen in its toolbar every few months, but calculates new values internally much more frequently.

In this piece of research we try to answer the question “how old are the PageRank values shown when they are published?”, and uncover something surprising in the process.

Please note: The web pages used in this article are used for reference only. LBi does not endorse any of the pages linked to from this article.

Google updates the PageRank values shown in the Google Toolbar every 3-4 months (and sometimes more often). However, Google also calculates the PageRank values that it uses internally much more frequently (at least daily). The PageRank shown in the Google Toolbar is therefore a "snapshot" of values at some point in time.

A commonly asked question when Google updates the PageRank values displayed within its toolbar is "How old are these new PageRank values?" – are they fresh, up-to-date values which have just been calculated, or are they several months old? Although, in general, we would recommend not obsessing about Google’s green bar too much, knowing the answer to this question has several implications – for example, if you know how recent the values are, you can determine whether any recent linkbuilding activity is being accounted for within the new PageRank values.

Methodology

The methodology for this experiment is fairly simple – to know how old the values are, we need to establish what what the length of time was between pages last being given PageRank and the PageRank update. Therefore, we need to find:

  • The most recent page possible which has a PageRank value
  • The earliest possible mention of the recent PageRank update

Oldest mentions of PageRank update

For the purposes of finding the earliest possible date that a PageRank update was mentioned, we have looked at a number of different SEO discussion sites in order to find the earliest mention by a member of their community. We’ll convert all times into British Summer Time (GMT+1) for comparison.

  • Digital Point forums – many posts here, but the earliest is dated "May 28th 2009, 1:01 am". Times are GMT-7, so this is 9:01 BST on May 28
  • High Rankings Forum – there is a post at "7:38pm" – the forum appears to be 6 hours behind BST, so the time of the post is 01:38 BST on May 28
  • SEORoundTable – the first forum post is 06:12 AM on 28th May – as this time is GMT-5, the time is 12:12 BST on May 28
  • WebmasterWorld – the earliest post is "10:12pm UTC" – this is 23:12 BST on May 27

There are lots of other sites, but we’ve picked a selection of the earliest posts. The earliest one seems to be the WebmasterWorld thread, with a time of 23:12 BST on May 27th.

Newest articles with PageRank

The next step requires finding the most recent page possible which has a PageRank value. Please note that this does not mean the most recent page with a PageRank of 1 or more – a PageRank value of "zero" also constitutes a page having a PageRank value assigned to it. A PageRank of zero simply means that, on the sliding scale used by Google, the page falls into the set of pages with the lowest PageRank values. This is different from having no PageRank value at all.

The best place to look for recent pages which may have PageRank is to look for a high-PageRank, high-traffic site which is frequently updated and which uses web feeds to ensure that new pages are rapidly indexed. News sites are ideal for this. We’ve picked The Guardian because the website includes detailed date information, including both the original publication date and the date that the articles were last updated, whereas many other online newspapers don’t include the original article publication dates.

Here are a few of the most recent articles found, along with their dates. These articles are all PageRank zero.

We have not listed articles with no PageRank values at all (to narrow down the interval further) as Google may have simply not crawled these pages yet.

Hang on… what’s this?

Having looked around a number of articles, we suddenly stumbled across this article, which has a PageRank value assigned (zero). The "article history" says:

"This article was first published on guardian.co.uk at 00.01 BST on Thursday 28 May 2009. It appeared in the Guardian on Thursday 28 May 2009 on p35 of the Editorials & reply section. It was last updated at 00.05 BST on Thursday 28 May 2009."

This poses something of a puzzle – here we have an article which has a PageRank score and which was apparently posted 49 minutes after the PageRank update started happening. Thinking caps on! Here are the possible causes of this seemingly paradoxical situation.

Theory 1 – The dates are wrong

This is the simplest explanation. Either the date on the WebmasterWorld thread is wrong, or the date on the rogue Guardian article is wrong.

Theory 2 – Datacenters, datacenters, datacenters

"Datacenters" – the standard fall-back answer to many a Google puzzle. As we know that different datacenters will start showing updated PageRank values at different times, it could be that the datacenter currently serving up the PageRank values that we are seeing is different to the one which first served new results to the poster who started the WebmasterWorld thread listed above.

This theory has interesting implications – given the time gap it would mean that different datacenters calculate PageRank independently of each other.

Theory 3 – Rolling PageRank update

Another possibility is that the PageRank update happens in a number of stages or over a period of time – this would mean that the update had begun when it was first noticed but had not yet been completed by the time that Google found the aforementioned Guardian article.

Conclusion

When Google performs a Toolbar PageRank update it would appear that the values are fresh and up-to-date.

Additionally, there may be an additional mechanism at work which can sometimes result in PageRank values being assigned to some pages shortly after the Toolbar PageRank update has occurred.

Got any comments about this research piece? Let us know in the comments field below!

Tags: , ,

0 comments Share

Google Toolbar PageRank update 27th May

Google has updated its Toolbar PageRank on the 27th of May. This is slightly unusual as it comes around a month earlier than expected.

Google has updated its “Toolbar PageRank” (the PageRank values shown in the Google Toolbar) on the 27th of May. The timing of the recent update has caught many by surprise as it comes less than two months after the last update.

Typically, Google updates its Toolbar PageRank values roughly every 3-4 months (the previous PageRank updates happened on April 1st, 31st December and before that, in September).

Please note that the Toolbar PageRank does not necessarily reflect the current standing of a page – Google continually updates PageRank values internally (at least every day), but only provides a “snapshot” every so often.

Tags: ,

0 comments Share

What do quantum disorder and Google have in common?

Could random matrix theory, as used to analyse disorder in quantum systems, be the next thing to challenge Google?

In the April 4th edition of New Scientist, there was an article entitled "Quantum mathematics could boost keyword searches" – although the website article bears a slightly more provocative title: "Could quantum mathematics shake up Google?". It reports on a mathematical technique called random matrix theory, used by one Pedro Carpena in the analysis of disorder in quantum systems, that might just be the next big thing in search.

What it boils down to is this. Critical words to the subject of a text tend to cluster in certain areas within the copy. When a concept is introduced and explored, key words are used frequently, and then drop off in frequency as the text evolves. Conversely, common, yet irrelevant, words (what some people refer to as stop words or sentence glue) tend to be scattered through the text fairly evenly. As a result, analysing the clustering of words gives a better picture than frequency or density analysis.

Now, modern search engines are not using anything as simple as keyword density analysis these days, but could this, as the article’s title rather sensationalist asks, "shake up Google"? The results produced seem a little hit an miss, with both "you" and "I" appearing in the top-five for both The Odyssey and Moby Dick. It does however, seem to generate some interesting results with all the spaces removed from the text, but that is a different discussion.

While Carpena’s method may be good at pulling relevance from a unbiased text, how good is it at pulling actual relevance from a biased text? Compiling a list of relevant words from a text isn’t the hard part, search engines are already pretty good at identifying text that is relevant to a search; the difficulty is pulling relevance from a text that is deliberately misleading. New analysis algorithms will just force people to develop new ways of gaming the system. The real challenge is in the separation of the wheat from the chaff.

To my mind, this is where many journalists fall down; too many ask if the latest clever method of discerning relevance is the next Google killer, but few look at what Google is actually struggling to achieve. Let’s face it, they have text analysis down pat – while it may not be as elegant as some sophisticated quantum analysis technique, Google will return pages with text that is fairly relevant to your search words. What it struggles with though, is matching the meaning of the search with the intent of the content.

We have all done it. We have been looking for customer reviews on our next intended purchase to see if it has been well-received by its current users, only to find that the search results are cluttered with pages selling the product and somewhere on each is an unpopulated review section. Another scenario is the "this mp3 player isn’t an ipod" style ebay listings.

There are plenty of pages out there that mislead or misrepresent, and there is nothing more frustrating than wading through piles of valueless results that promise the Earth. It is advances toward filtering out these from the short-list of relevant pages that will bring the next quantum-leap in search.

Tags: , , ,

0 comments Share