Help the Googlebot understand your web site

A list of recommendations.

Google is the best search engine on the 'net right now. The Googlebot is Google's indexing software. The Googlebot visits billions of web sites over time and records their contents, which makes them available to search. The Googlebot is very smart and works really well. But, like everyone, it could use a little help from its friends.

When authoring a web site, keep in mind that the Googlebot is software, which means it has a set of capabilities and limitations and algorithms it uses to index content. There are lots of effective ways to trip up the Googlebot and make it impossible for it to index your content. Alternately, the Googlebot can index your site well, and then people will find it when searching for words it contains.

As a web site author, there are a few simple things you can do to help the Googlebot understand your web site as fully as possible.

Here's a list.**

Make every single page on your site accessible via a text-based link - as opposed to Javascript, Flash, DHTML, etc. The Googlebot only speaks text.

Keep the number of links on a given page less than 100.

Give every single page on the site a complete and meaningful <title>. Google offers the allintitle syntax, which lets users search only text that appears in a page title. There are over 3 million results returned for Untitled Document.
Avoid frames. Avoid frames like the plague.
Use URLs with query strings sparingly, if at all. When using dynamic URLs, like
keep in mind that the shorter the list of query string parameters, the better.
Make sure that the title and alt tag attributes exist and are complete and meaningful in each page's markup. For example, the markup for that picture of your goldfish should be something like
<img src="/imgs/goldie.jpg" alt="my beloved goldfish, Goldie" />
Make all relevant information on a page textual. Don't embed page content into images or objects like Flash movies. Did I mention the Googlebot only speaks text?

Make sure your web server supports the If-Modified-Since HTTP header. This feature allows your web server to tell Google whether your content has changed since the Googlebot last crawled your site. Supporting this feature saves you bandwidth and overhead.

Use robots.txt and meta robots tags to show the Googlebot around your site. These standard mechanisms for directing well-behaved robots like the Googlebot will allow you to specify important things like whether or not Google will cache your page content and/or images, and whether or not the Googlebot will index content on pages that maybe you don't want available to the searching public.

Webloggers: use the meta tags to help the Googlebot index only your permalinks, not your constantly changing front page. To do this, use
<meta name="robots" content="noindex,follow" >
on your front page and
<meta name="robots" content="index,follow" >
on your posts' permanent locations.

Use meaningful text inside your tags so the Googlebot can associate that text with that href link. Meaning, if I am going to link my pictures from the war protest, I should say "Take a look at my photos from the war protest" instead of "My war protest pictures are here." Now, Google doesn't explicitly recommend this. But I have a friend named Martin who has a weblog which I link with the word "Martin" on my Bookmarks list. If you do a Google search for the word Martin, this weblog is the third result. So what, you say? Well, Martin doesn't mention his name anywhere on his site.

So don't use link text like read more or go here or download it or, God help us, click here. Don't click here.

Webloggers: take heed of this when you display the permanent link for a post. You should link the title of a post which presumably contains words which indicate what the post is about instead of a [+] or the word permalink or, common amongst Blogger users, the date and time.
Include a
<meta name="description" content="[insert your site's description here]">
tag in your page header to summarize your site; even better, include descriptive text on the site's front page where users can actually read it, like, "Scribbling.net is a self-documentation project, occasionally interrupted by misdirected attempts at explaining the vaguely technical." This text will appear as the description for the site in Google results.
Forget <meta name="keywords"> ever existed. Really. It's meaningless.
Place more important content higher in the markup than less important content in a page.
Don't try to fool the Googlebot with hidden links or duplicate content or irrelevant pages of words like "sex" and "hot girls." The Googlebot doesn't like being played. The Googlebot will make you sorry.
Every few days Your forum could be ripe with new content, just waiting and wanting to be indexed and searched. Your forum trembles with anticipation for it's weekly-or-so Googlebot visit, and when the big G arrives, let me tell you, it's like a well-choreographed dance. The Googlebot and Your forum have all the elements of a healthy relationship: love, trust, respect, honesty and understanding. It's beautiful, really. You too can know this kind of bliss .

