Indexing options

Top  Previous  Next

What to index

You can specify the parts of a page that should be included or excluded from indexing here. This includes the page title, content, and filename. Meta information can also be indexed such as meta descriptions, keywords, and author information. By excluding certain sections of pages, you can make index data files smaller and the indexing procedure faster, and less memory intensive. It may also help make your searches more accurate by including or excluding only the relevant sections of a page.

URL domain will index the domain name, such that when indexing the page "http://www.mysite.com/section1/index.html", we will index the words "www mysite com" (if dots are not enabled for word rules). The full domain "www.mysite.com" would be indexed as one word if dots are enabled for joining words.

URL path will index the path name as well such that when indexing the page "http://www.mysite.com/section1/index.html", we will index "section1".

Dublin Core meta data can also be indexed. By enabling this option, Zoom will index DC.Title, DC.Subject, and DC.Identifier meta tags as described by the Dublin Core Metadata Initiative (DCMI).

Note that "Link text" and "ALT text for images" only affect the indexing of these elements for the target or destination file. That is, if a text link appears on "pageA.html" to "imageB.jpg", with the link text (or ALT text) "picture of my pets", then these words will be indexed for the file "imageB.jpg", and NOT for "pageA.html".

Indexing word rules

This allows you to specify which characters should be allowed to act as a join character between two words. Otherwise, these characters will act as separators of words (for example, if the ‘dash/hyphen’ character is a join character, words such as “web-based” will be indexed as one word. Otherwise, it would be split into two words, “web” and “based”). Note that the character must be immediately preceded and followed by another valid character to be indexed.

A list of the characters available for this option:

Name

Character

Example words indexed

Dots

.

F.B.I.

.NET

www.mysite.com

32.10

Hyphens

-

web-site

Underscores

_

temporary_name

Apostrophes

'

John's

Hash sign

#

#3218B

Serial#

ID#

Dollar sign

$

$50

Comma

,

60,000

Colon

:

Ref:A

Exhibit:A

Ampersand

&

A&B

&var

Slashes

/

\

either/or

12/5/2007

\myfiles\pages.txt

@ sign

@

bob@mycompany

bob@mycompany.com (with Dots enabled)

Rewrite links

This option allows you to rewrite the indexed URLs of the pages indexed. This can be useful if you are spidering a development version of your site on a test server (eg. http://test.mycompany.com/) and creating index files to go on the live server (eg. http://www.mycompany.com/). You would do this by specifying rewrite options to replace all instances of "http://test.mycompany.com/" in the indexed URLs with "http://www.mycompany.com/".

You could also use this option to change all the search result links to be relative rather than absolute by replacing the domain (eg. "http://www.mysite.com/") with a relative path (eg. "./" or "../"). We only recommend this for users who are very familiar with relative linking and understand that the linking would only work if the generated search files are placed in an appropriate folder on the server.

lightbulb

Note: Using the Rewrite Links option disables the ability to use incremental indexing on the produced set of index files. This means you will not be able to perform an incremental update, or add/remove pages from the index without re-indexing your site entirely. For more information on these features, see "Incremental indexing".

lightbulb

Note: Sitemaps are also affected by the Rewrite Links option. This means that your text and XML sitemaps will contain the URL as they would be after applying the rewrite link rules, and not the URLs as indexed. Be careful with XML sitemaps, since you need to specify a Sitemap Base URL for which the pages must fall within, or otherwise it will be filtered out. You will need to make sure that the Rewrite Link settings do not change the URL such that it does not satisfy the Sitemap Base URL any more. For more information, please see "Sitemaps".