Crawler/Spider Considerations

 

Also, consider technical factors. If a site has a slow connection, it might time-out for the crawler. spider

Very complex pages, too, may time out before the crawler can harvest the text.

 

If you have a hierarchy of directories at your site, put the most important information high, not deep.  Some search engines will presume that the higher you placed the information, the more important it is. And crawlers may not venture deeper than three or four or five directory levels.

 

Above all remember the obvious – full-text search engines such index text. You may well be tempted to use fancy and expensive design techniques that either block search engine crawlers or leave your pages with very little plain text that can be indexed.  Don’t fall prey to that temptation.

 

Ranking Rules Of Thumb

The simple rule of thumb is that content counts, and that content near the top of a page counts for more than content at the end. In particular, the HTML title and the first couple lines of text are thumbupthe most important part of your pages. If the words and phrases that match a query happen to appear in the HTML title or first couple lines of text of one of your pages, chances are very good that that page will appear high in the list of search results.

 

A crawler/spider search engine can base its ranking on both static factors (a computation of the value of page independent of any particular query) and query-dependent factors.

 

Values

 

  • Long pages, which are rich in meaningful text (not randomly generated letters and words).

 

  • Pages that serve as good hubs, with lots of links to pages that that have related content (topic similarity, rather than random meaningless links, such as those generated by link exchange programs or intended to generate a false impression of “popularity”).

 

  • The connectivity of pages, including not just how many links there are to a page but where the links come from: the number of distinct domains and the “quality” ranking of those particular sites. This is calculated for the site and also for individual pages. A site or a page is “good” if many pages at many different sites point to it, and especially if many “good” sites point to it.

 

  • The level of the directory in which the page is found. Higher is considered more important. If a page is buried too deep, the crawler simply won’t go that far and will never find it.

 

These static factors are recomputed about once a week, and new good pages slowly percolate upward in the rankings. Note that there are advantages to having a simple address and sticking to it, so others can build links to it, and so you know that it’s in the index

Query-Dependent Factors

 

  • The HTML title.

 

  • The first lines of text.

 

  • Query words and phrases appearing early in a page rather than late.wonder

 

  • Meta tags, which are treated as ordinary words in the text, but like words that appear early in the text (unless the meta tags are patently unrelated to the content on the page itself, in which case the page will be penalized)

 

  • Words mentioned in the “anchor” text associated with hyperlinks to your pages. (E.g., if lots of good sites link to your site with anchor text “breast cancer” and the query is “breast cancer,” chances are good that you will appear high in the list of matches.)

 

Blanket Policy On Doorway Pages And Cloaking

Many search engines are opposed to doorway pages and cloaking. They consider doorway and cloaked pages to be spam and encourage people to use other avenues to increase the relevancy of their pages. We’ll talk about doorway pages and cloaking a bit later.

 


Meta Tags (Ask.Com As An Example)

 

Though Meta tags are indexed and considered to be regular text, Ask.com claims it doesn’t give them priority over HTML titles and other text. Though you should use meta tags in all your pages, some webmasters claim their doorway pages for Ask.com rank better when they don’t use them. If you do use Meta tags, make your description tag no more than 150 characters and your keywords tag no more than 1,024 characters long.

 

Keywords In The URL And File Names

 

It’s generally believed that Ask.com gives some weight to keywords in filenames and URL names. If you’re creating a file, try to name it with keywords.

 

Keywords In The ALT Tags

 

Ask.com indexes ALT tags, so if you use images on your site, make sure to add them. ALT tags should contain more than the image’s description. They should include keywords, especially if the image is at the top of the page. ALT tags are explained later.

 

Page Length

 

There’s been some debate about how long doorway pages for AltaVista should be. Some webmasters say short pages rank higher, while others argue that long pages are the way to go. According to AltaVista’s help section, it prefers long and informative pages. We’ve found that pages with 600-900 words are most likely to rank well.

 

Frame Support

 

AltaVista has the ability to index frames, but it sometimes indexes and links to pages intended only as navigation. To keep this from happening to you, submit a frame-free site map containing the pages that you want indexed. You may also want to include a “robots.txt” file to prohibit AltaVista from indexing certain pages.