I post at SearchCommander.com now, and this post was published 5 years 7 months 4 days ago. This insustry changes FAST, so blindly following the advice here *may not* be a good idea! If you're at all unsure, feel free to hit me up on Twitter and ask.
Inside of Google Webmaster Tools, there is an option called “Fetch As Googlebot” that is supposed to go crawl that page and return what it sees.
Until this moment, I’ve never had much use for this, but that’s not the case now!
I discovered a problem when a website which had been hacked and then fixed was still showing the polluted snippet in the description on the results page.
A client reported seeing pharmaceuticals mentioned on their listing on a SERPS page (never good) and I repeated the search, saw the same thing, then visited the page to view the source. I verified that yes, the page is clean, and there’s no drugs mentioned there….
Normally there’s no way to get Google to update the cache faster unless you link to it, but I decided to update the XML site map anyway, and decided to take a shot at “Fetch as Googlebot” figuring that it couldn’t hurt.
What I found, surprised me…
Was Google not “fetching” this live? Was it pulling some outdated version of the page? I thought for sure that the site was clean, so either Google is pulling from cache, or the site isn’t really fixed, right?
I decided to do a quick test by changing some words in a blog post on my own site, and then I did a quick fetch.
I went to Google Webmaster Tools – Chose to Fetch as Googlebot – and yes, Google DID fetch it live!
So, this is the dilemma… Is the site still hacked? Is there something really insidious that makes it look okay to us, but not to others?
Obviously something is wrong, but what is it?
I’m posting this now without a solution, because I DON’T KNOW THE ANSWER – but as I look into it further I’ll update this post.
In the meantime, got any ideas?
For the record –
- There are no warnings about Malware in Google WMT
- “view source” showing correct title tag etc., and no WP info
- Visiting the site though a proxy shows it’s fine too
- The server response code thrown back by the URL is a 200 OK
***Update – July 9th, 9 am***
Ok, so this site is NOT WordPress, but thanks to @blafrance we think we’re on the right track after he sent me a link to the WordPress Pharma hack –
While we exhibit none of the same afflictions, the end result is the same – a hack visible only to the search engines.
Last night their programmer found some malicious code in a .php file that he removed, but it’s still fetching the bad info. Here’s the code in it’s entirety, although not much help…
***Update July 10, 2010***
The solution was of course that the site was still hacked, and there was another piece of malicious code like what is shown above in the header file.
I think the most interesting and scary thing about this pharma hack was that it did not affect every page. It’s going to become harder and harder to protect yourself…