Back to web resources page
Back to department page
Ivan Brunetti has compiled a handy list of techniques and tips for people building websites they'd like search engines to remember and archive. He got many of his ideas from http://scribbling.net/help_the_googlebot_understand_your_web_site.
Google is the best and I would guess most popular search engine on the Internet (based on the fact that "Googling" is a verb now) . The Googlebot is Google's indexing software. This means that the Googlebot visits billions of web sites over time and records their contents, which makes them available to search. While the Googlebot is very smart and works really well, it needs some help from you and your website.
Keep in mind that the Googlebot is software. There are lots of ways to trip up the Googlebot and make it impossible for it to index your content. As a web site author, there are a few simple things you can do to help the Googlebot understand your web site as fully as possible.
Make every single page on your site accessible via a text-based link - as opposed to Javascript, Flash, DHTML, etc. The Googlebot only speaks text. As well, make all relevant information on a page textual. Don't embed important page content into images or objects like Flash movies.
Images on a web document, while meaningful to human eyes, are actually just a collection of 1's and 0's to search engine indexing software and non-graphical browsers. Make sure all of the information on your site exists in a text format. For example, if your site has a masthead which is an image that contains the title of your site in it, make sure you set the alt attribute to describe the content of the image. You should even ensure that the most relevant information on a page appears first in your markup, and make other elements (navigation, etc) follow.
Make sure that the title and alt tag attributes exist and are
complete and meaningful in each page's markup. For example, the markup
for that picture of your goldfish should be something like
This is
easy as pie in Dreamweaver. Just select the image, and insert your
text for the alt tag in the Properties Window (in the field marked
"Alt").
A good test to see what your site looks like to an indexing robot or a non-graphical browser is to turn off images in your browser. If you're using Internet Explorer, to do this, in the Tools menu choose Options, and on the Advanced tab go to Multimedia, and uncheck "Show pictures." In Mozilla, go to Tools, Image Manager, and choose "Block Images from this Site." Then view your site, and make sure that without images, all information is adequately represented. This same concept applies to all objects (like Flash movies and Java applets.)
Remember that search engine robots do not execute Javascript. If you're handy with HTML editing, and any navigation elements on your site use Javascript, set the onclick attribute of the a element to the Javascript call, and the href attribute to the destination of the link. This way Javascript-enabled browsers will execute the script, and the link will still be usable to non-Javascript-enabled clients.
Keep the number of links on a given page less than 100. Why, you ask? See Google's Webmaster Guidelines: http://www.google.com/webmasters/guidelines.html
Give every single page on the site a complete and meaningful
Avoid frames. Avoid frames like the plague.
Use meaningful text inside your tags so the Googlebot can associate that text with that href link. Meaning, if I am going to link my pictures from the war protest, I should say "Take a look at my photos from the war protest" instead of "My war protest pictures are here." Don't use link text like read more or go here or download it or click here.
For the more technical minded out there: Use robots.txt and meta robots tags to show the Googlebot around your site. These standard mechanisms for directing well-behaved robots like the Googlebot will allow you to specify important things like whether or not Google will cache your page content and/or images, and whether or not the Googlebot will index content on pages that maybe you don't want available to the searching public.
For Bloggers:
Use the meta tags to help the Googlebot index only your permalinks, not your constantly changing front page. To do this, use
<meta name="robots" content="noindex,follow" >
on your front page and
<meta name="robots" content="index,follow" >
on your posts' permanent locations.
More recommended reading for everyone:
http://scribbling.net/nine_things_you_can_do_to_make_your_web_site_better.