This article describes how to use the Tagged Image File
Format (TIFF) IFilter, noise word files, thesaurus files, and the Robots.txt
file to customize SharePoint Portal Server 2003 and SharePoint Server 2007.
Overview of IFilters
Note In SharePoint Server 2007, the TIFF filter feature is
removed.
To crawl documents that have proprietary file extensions, you
have to register the IFilter for that file type in SharePoint Portal Server
2003. When you configure a content source, you can specify the file types that
you want to include in the content index. For example, you might want to
include files that have an .
xyz extension and a
.
yyy extension in the content index. The inclusion
of a file type applies only to content that is stored outside the portal site
and that is included in the content index by using content sources. The
inclusion of a file type does not apply to content that is stored in the portal
site.
If a file type has an IFilter that is associated with that file
type, you have to register the IFilter for a particular file type on the
SharePoint Portal Server 2003 computer that crawls that file type. After you
register the IFilter, SharePoint Portal Server 2003 can crawl documents that
use that file type and include those documents in the content index. If you add
a file type, and you do not register the IFilter for that file type, SharePoint
Portal Server 2003 only includes the file properties in the content
index.
The steps that you follow to register an IFilter vary according
to the IFilter that you want to register. For more information about how to
register an IFilter, see the documentation that is included with the IFilter
that you want to register. SharePoint Portal Server 2003 includes filters for
the following items:
- Microsoft Office documents, including Microsoft Publisher
documents and Microsoft Visio documents
- HTML files
- TIFF files
- Text files
SharePoint Portal Server 2003 also accepts third-party IFilters
for custom file types.
The TIFF IFilter
When you install SharePoint Portal Server 2003, the Setup program
automatically installs an IFilter for TIFF files. The TIFF filter handles both
the .tif extension and the .tiff extension. The following sections explain how
to do the following tasks:
- Enable optical character recognition (OCR) for TIFF
files
- Change the TIFF file size limit
- Enable automatic file rotation
- Log TIFF error messages to the application event
log
Note After you edit registry entries that are associated with TIFF
files, you have to restart the Microsoft Search service.
How to enable optical character recognition in TIFF Files
When SharePoint Portal Server 2003 crawls TIFF files, SharePoint
Portal Server 2003 only looks at the file properties. If you enable optical
character recognition, SharePoint Portal Server scans the TIFF file and tries
to recognize characters in the document so that additional information can be
included in the index.
To enable optical character recognition in TIFF
files, use one of the following methods.
Method 1: Manually edit the registry
Add the PerformOCR registry entry to the following registry
subkey, and then set the PerformOCR registry entry to a value of 1:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\MSPaper
To enable optical character recognition in TIFF files, follow
these steps.
Important This section, method, or task contains steps that tell you how to modify the registry. However, serious problems might occur if you modify the registry incorrectly. Therefore, make sure that you follow these steps carefully. For added protection, back up the registry before you modify it. Then, you can restore the registry if a problem occurs. For more information about how to back up and restore the registry, click the following article number to view the article in the Microsoft Knowledge Base:
322756Â
(http://kbalertz.com/Feedback.aspx?kbNumber=322756/
)
How to back up and restore the registry in Windows
- Click Start, and then click
Run.
- In the Open box, type
regedit, and then click OK.
- Locate and then click the following registry subkey:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\MSPaper
- On the Edit menu, point to
New, and then click DWORD Value.
- Type PerformOCR, and then press
ENTER.
- On the Edit menu, click
Modify.
- To enable optical character recognition, type
1 in the Value data box, and then click
OK.
Note To disable optical character recognition, set the PerformOCR
registry entry to 0 (zero). - Quit Registry Editor.
- Restart the Microsoft Search service. To do this, follow
these steps:
- Click Start, point to
Administrative Tools, and then click
Services.
- Right-click Microsoft Search, and then
click Restart.
Method 2: Use the Tiff_ocr_on.reg file
Use the Tiff_ocr-on.reg file to add the PerformOCR registry entry
to the following registry subkey:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\MSPaper
Important This section, method, or task contains steps that tell you how to modify the registry. However, serious problems might occur if you modify the registry incorrectly. Therefore, make sure that you follow these steps carefully. For added protection, back up the registry before you modify it. Then, you can restore the registry if a problem occurs. For more information about how to back up and restore the registry, click the following article number to view the article in the Microsoft Knowledge Base:
322756Â
(http://kbalertz.com/Feedback.aspx?kbNumber=322756/
)
How to back up and restore the registry in Windows
- Locate the Support\Tools folder on the SharePoint Portal
Server 2003 CD, and then double-click the Tiff_ocr_on.reg
file.
- Restart the Microsoft Search service. To do this, follow
these steps:
- Click Start, point to
Administrative Tools, and then click
Services.
- Right-click Microsoft Search, and then
click Restart.
How to change the TIFF file size limit
By default, when optical character recognition is enabled,
SharePoint Portal Server 2003 does not include any single-page TIFF files that
are larger than 1 megabyte (MB) in the content index. To change the size limit
for TIFF files, change the MaxImageSize registry entry in the following
registry subkey:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\MSPaper
Important This section, method, or task contains steps that tell you how to modify the registry. However, serious problems might occur if you modify the registry incorrectly. Therefore, make sure that you follow these steps carefully. For added protection, back up the registry before you modify it. Then, you can restore the registry if a problem occurs. For more information about how to back up and restore the registry, click the following article number to view the article in the Microsoft Knowledge Base:
322756Â
(http://kbalertz.com/Feedback.aspx?kbNumber=322756/
)
How to back up and restore the registry in Windows
- Click Start, and then click
Run.
- In the Open box, type
regedit, and then click OK.
- Locate and then click the following registry subkey:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\MSPaper
- Right-click MaxImageSize, and then click
Modify.
- Type 100,000 in the Value
data box, and then click OK.
Note A value of 100,000 is equal to a 1-MB file size limit.
- Quit Registry Editor.
- Restart the Microsoft Search service. To do this, follow
these steps:
- Click Start, point to
Administrative Tools, and then click
Services.
- Right-click Microsoft Search, and then
click Restart.
How to enable automatic file rotation
If you enable optical character recognition, and if some TIFF
files are oriented upside down or sideways, you can enable automatic file
rotation to increase scanning accuracy.
If you enable optical
character recognition, you can also enable automatic file rotation. If you
enable automatic file rotation, the filter rotates TIFF files that are oriented
upside down or sideways. The filter also rotates the TIFF file in memory before
the filter scans the TIFF file. Although rotating the file uses resources, the
results from scanning a file that is oriented upside down or sideways may be
poor. If you know that all your TIFF files are oriented upright, you do not
have to enable this option.
To enable automatic file rotation, set the
AutoRotation registry entry in the following registry subkey to a value of 1:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\MSPaper
By default, automatic file rotation is enabled when you install
SharePoint Portal Server 2003. However, if the PerformOCR registry entry is set
to 0 (zero) or does not exist, the AutoRotation registry entry has no
effect.
To enable automatic file rotation, follow these
steps.
Important This section, method, or task contains steps that tell you how to modify the registry. However, serious problems might occur if you modify the registry incorrectly. Therefore, make sure that you follow these steps carefully. For added protection, back up the registry before you modify it. Then, you can restore the registry if a problem occurs. For more information about how to back up and restore the registry, click the following article number to view the article in the Microsoft Knowledge Base:
322756Â
(http://kbalertz.com/Feedback.aspx?kbNumber=322756/
)
How to back up and restore the registry in Windows
- Click Start, and then click
Run.
- In the Open box, type
regedit, and then click OK.
- Locate and then click the following registry subkey:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\MSPaper
- Right-click AutoRotation, and then click
Modify.
- Type 1 in the Value
data box, and then click OK.
Note To disable automatic file rotation, set the AutoRotation registry
entry to 0 (zero). - Quit Registry Editor.
- Restart the Microsoft Search service. To do this, follow
these steps:
- Click Start, point to
Administrative Tools, and then click
Services.
- Right-click Microsoft Search, and then
click Restart.
How to log TIFF error messages to the application event log
By default, SharePoint Portal Server 2003 logs error messages that
are associated with TIFF files in the gatherer log. If you want SharePoint
Portal Server 2003 to log error messages that are associated with TIFF files in
the application event log, set the LoggingLevel registry entry in the following
registry subkey to the value that you want:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Eventlog\Application\Microsoft Office Document Imaging
You can set the
LoggingLevel
registry entry to one of
the following values:
- To disable logging, set the LoggingLevel registry entry to
a value of 0 (zero). This setting is the default setting.
- To log information messages and error messages, set the
LoggingLevel registry entry to a value of 1.
- To log warning messages and error messages, set the
LoggingLevel registry entry to a value of 2.
- To log all messages, set the LoggingLevel registry entry to
a value of 3.
- To log only error messages, set the LoggingLevel registry
entry to a value of 4.
To enable logging of TIFF file messages in the application event
log, follow these steps:
- Click Start, and then click
Run.
- In the Open box, type
regedit, and then click OK.
- Locate and then click the following registry subkey:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Eventlog\Application\Microsoft Office Document Imaging
- Right-click LoggingLevel, and then click
Modify.
- Type the value that you want in the Value
data box, and then click OK.
- Quit Registry Editor.
- Restart the Microsoft Search service. To do this, follow
these steps:
- Click Start, point to
Administrative Tools, and then click
Services.
- Right-click Microsoft Search, and then
click Restart.
Noise word files
A noise word is a word that is not useful in a search. For
example, the following words are noise words:
A list of noise words for a language is stored in the noise
word file for that language. SharePoint Portal Server 2003 and SharePoint
Server 2007 include noise word files for the following languages:
- Chinese-Simplified (Noisechs.txt)
- Chinese-Traditional (Noisecht.txt)
- Czech (Noisecsy.txt)
- Dutch (Noisenld.txt)
- English-International (Noiseeng.txt)
- English-US (Noiseenu.txt)
- Finnish (Noisefin.txt)
- French (Noisefra.txt)
- German (Noisedeu.txt)
- Hungarian (Noisehun.txt)
- Italian (Noiseita.txt)
- Japanese (Noisejpn.txt)
- Korean (Noisekor.txt)
- Polish (Noiseplk.txt)
- Portuguese (Brazil) (Noiseptb.txt)
- Russian (Noiserus.txt)
- Spanish (Noiseesn.txt)
- Swedish (Noisesve.txt)
- Thai (Noisetha.txt)
- Turkish (Noisetrk.txt)
If a noise word list does not exist for a language, SharePoint
Portal Server 2003 and SharePoint Server 2007 use the neutral Noiseneu.txt
noise word file. The word breaker for the language parses noise
words.
By default, SharePoint Portal Server 2003 noise word files are
stored in the following location on the server:
Drive:\Program Files\SharePoint Portal Server\Data\Config
If you installed SharePoint Portal Server 2003 in a location that
is different from the default location, the Data folder is located in a
different folder on your server.
By default, SharePoint Server 2007
stores noise word files in the following location on the server:
Drive:\Program Files\Microsoft Office Servers\12.0\Data\Config
You can change the noise word file. If you add noise words, the
accuracy of your searches may decrease. However, the size of the content index
also decreases. A smaller content index helps increase performance. You can
delete noise words if you want searches to return those words.
If you
remove words from the noise word file, the changes do not take effect until you
reset the content indexes and perform a full update of the content indexes in
SharePoint Portal Server 2003 and in SharePoint Server 2007. If you remove
words from the noise word file, the words are removed from files before the
files are included in an index. You must update the content index after you
modify the noise word list. Otherwise, documents that contain the removed noise
words are not returned in queries.
Do not delete noise word files. If
you do not want noise words removed during an update or a query, remove those
specific entries from the file. If you delete the noise word file, all single
characters are removed as noise words. If you remove
all noise words from your noise word file, you will experience errors
during crawling. Therefore, you must have at least one noise word in the file,
even if the noise word is something as simple as a period
character.
By default, noise word files in SharePoint Portal Server
2003 are copied to the following folder:
Drive\Program Files\SharePoint Portal Server\DATA\Applications\ProgramUID\Config
By default, noise word files in SharePoint Server 2007 are copied
to the following folder:
Drive:\Program Files\Microsoft Office Servers\12.0\Data\Applications\ProgramUID\Config
You can specify noise words at the program level instead of at
the server level or at the server farm level. For example, if SharePoint Portal
Server 2003 or SharePoint Server 2007 and Microsoft SQL Server are installed on
the same server, you can specify one noise word list for SharePoint Portal
Server 2003 or for SharePoint Server 2007 and a different noise word list for
SQL Server.
How to change the noise word file
To change the noise word file, follow these steps:
- Start Notepad, and then open the noise word
file.
- Add or delete the words that you want.
- Save the noise word file, and then exit Notepad.
- In SharePoint Portal Server 2003, restart the Microsoft
SharePointPS Search service. In SharePoint Server 2007, restart the Windows
SharePoint Services Search service. To do this, follow these steps:
- Click Start, point to
Administrative Tools, and then click
Services.
- Right-click Microsoft SharePointPS
Search or Windows SharePoint Services Search, and
then click Restart.
- Perform a full update of the content index.
Note When you search the portal site, SharePoint Portal Server 2003
and SharePoint Server 2007 may discard some query terms as noise words even if
the query term itself is not a noise word. This behavior occurs in situations
when the query term is an inflectional form of the noise word. For example, if
the noise word file contains the word "be," and if you search for the word
"am," the word "am" is treated as a noise word because it is a form of "be."
Thesaurus files
The thesaurus is a query-expansion search feature in SharePoint
Portal Server 2003 and in SharePoint Server 2007. The thesaurus permits you to
type a phrase in a search query and to receive results for words that are
related to the phrase that you typed. For example, you can search for the word
"run" and receive results that contain either the words "run" or "jog" if the
two terms are related in the thesaurus. Additionally, the thesaurus permits the
server farm administrator to configure search rankings by assigning different
weights to words. SharePoint Portal Server 2003 and SharePoint Server 2007
include thesaurus files for the following languages:
- Chinese-Simplified (Tschs.xml)
- Chinese-Traditional (Tscht.xml)
- Czech (Tscsy.xml)
- Dutch (Tsnld.xml)
- English-International (Tseng.xml)
- English-US (Tsenu.xml)
- Finnish (Tsfin.xml)
- French (Tsfra.xml)
- German (Tsdeu.xml)
- Hungarian (Tshun.xml)
- Italian (Tsita.xml)
- Japanese (Tsjpn.xml)
- Korean (Tskor.xml)
- Polish (Tsplk.xml)
- Portuguese (Brazil) (Tsptb.xml)
- Russian (Tsrus.xml)
- Spanish (Tsesn.xml)
- Swedish (Tssve.xml)
- Thai (Tstha.xml)
- Turkish (Tstrk.xml)
The thesaurus files contain inactive sample content. The neutral
Tsneu.xml thesaurus file is applied to queries that do not have a thesaurus
file that is associated with the query language. The neutral thesaurus file is
always applied to queries, even when there is a specific thesaurus file that is
associated with the query language.
By default, SharePoint Portal
Server 2003 stores thesaurus files in the following folder on the server:
Drive:\Program Files\Microsoft Office Servers\12.0\Data\Office Server\Applications
If you installed SharePoint Portal Server 2003 in a location that
is different from the default location, the Data folder is located in a
different folder on your server.
Note The path to the correct thesaurus file can be found
as the value for "DefaultApplicationsPath" in the registry at
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office
Server\12.0\Search\Global\Gathering Manager
.
By
default, SharePoint Server 2007 stores thesaurus files in the following folder
on the server:
Drive:\Program Files\Microsoft Office Servers\12.0\Data\Config
Thesaurus files for SharePoint Portal Server 2003 are also copied
to the following folder:
Drive\Program Files\SharePoint Portal Server\Data\Applications\Application UID\Config
This occurs for each instance of the Microsoft Search service or
of the Microsoft SharePointPS Search service.
Thesaurus files for
SharePoint Server 2007 are also copied to the following folder:
Drive:\Program Files\Microsoft Office Servers\12.0\Data\Applications\Application UID\Config
This occurs for each instance of the Microsoft Search service or
of the Windows SharePoint Services Search service.
You can modify the
thesaurus at the program level instead of at the server level or at the server
farm level. For example, if SharePoint Portal Server 2003 or SharePoint Server
2007 and SQL Server are installed on the same server, you can specify one
thesaurus file for SharePoint Portal Server 2003 or for SharePoint Server 2007
and a different thesaurus file for SQL Server.
You can change the
thesaurus entries by changing the thesaurus file in a text editor. The
thesaurus file must use well-formed XML that contains matching opening and
closing tags around each entry. If the XML is malformed, SharePoint Portal
Server 2003 and SharePoint Server 2007 log an error in the application event
log.
When you change the thesaurus file, make sure that you do not
change the case of the tags. Only the XML tags use uppercase letters. All other
tags use lowercase letters. For example, the <replacement> tag must use
lowercase letters.
Important A file that is named Tsschema.xml is installed together with the
thesaurus files. Do not modify the Tsschema.xml file.
Thesaurus files
contain two types of thesaurus entries. These types are replacement sets and
expansion sets. Thesaurus files also permit you to configure the word weighting
and word stemming options in a replacement set or an expansion set.
Important From a performance perspective, it is important to be aware of
how many entries are defined in the thesaurus file. Additionally, it is
important to be mindful not to exceed the recommendation of 1,000/10,000
(typical/max) entries as outlined in the capacity planner. To view the capacity
planner, visit the following Microsoft Web site:
Additionally, be aware that each <sub> and <pat> tag
counts as an entry that goes against the recommended values.
Replacement sets
A replacement set specifies a pattern that is replaced by one or
more substitutions in a search query. For example, you can add a replacement
set where
W2K is the pattern and where
Windows 2000 is the substitution. If you query the term
W2K, SharePoint Portal Server 2003 and SharePoint Server 2007 return
only search results that contain the term
Windows 2000. You do not receive items in the search results that contain the
term
W2K.
Each replacement set is enclosed in a
<replacement> tag. In the replacement tag, you specify one or more
patterns by enclosing the patterns in a <pat> tag. You specify one or
more substitutions by enclosing the substitutions in a <sub> tag.
Patterns and substitutions can contain a word or a sequence of words. For
example, to add a replacement set where
W2K is the pattern and
Windows 2000 is the substitution, use the following:
<replacement>
<pat>W2K</pat>
<sub>Windows 2000</sub>
</replacement>
You can have more than one substitution for each pattern that you
specify. By default, patterns are case sensitive. For example, if your
thesaurus file contains the term
W2K, and if a user searches for the term
w2k, SharePoint Portal Server 2003 and SharePoint Server 2007 do not
return search results that contain the term
Windows 2000. SharePoint Portal Server 2003 and SharePoint Server 2007 do not
recognize the term
w2k as being the same as the term
W2K because the case of the text is different.
You can
specify patterns to be case sensitive or not to be case sensitive if you add a
tag to the thesaurus file for your language. For example, if you specify that
patterns are not case sensitive, the <pat> and <sub> terms match
query terms regardless of the case of the query term.
When you query
by using the CONTAINS FORMSOF syntax, the thesaurus works as described
previously. For more information about the CONTAINS FORMSOF syntax, see the
Microsoft SharePoint Products and Technologies 2003 Software Development
Kit.
By default, a portal site uses the FREETEXT query type. FREETEXT
queries automatically open the thesaurus. However, if you type your search
terms in quotation marks, SharePoint Portal Server 2003 and SharePoint Server
2007 disable the FREETEXT query and do not use the thesaurus. Therefore,
SharePoint Portal Server 2003 and SharePoint Server 2007 return results that
are based on the exact search term or terms that are enclosed by the quotation
marks. If the thesaurus replaces one word of a phrase with another word, a
FREETEXT query returns results for the new version of the whole
phrase.
For the replacement set where the term
Windows 2000 replaces the term
W2K, the following table shows the results that occur based on
different user input from the search interface on the portal site. This example
assumes that the thesaurus is set as case sensitive and that the search is not
case sensitive.
Collapse this tableExpand this table
| User input | Whether a thesaurus is used | Text
in documents that are returned in the search results |
|---|
| w2k | Yes. A FREETEXT query.
| W2k, W2K, w2k, or w2K.
No results are returned for Windows
2000 because the pattern in the thesaurus is uppercase W2K. |
| "w2k" | No | W2K, w2k, W2k, or w2K. |
| W2K | Yes. A FREETEXT
query. | Windows 2000, windows 2000, w2k, W2k, w2K, or case combinations such as wInDows 2000.
No results are returned for W2K. |
| "W2K" | No | W2K, w2k, W2k, or w2K. |
| W2K Server | Yes. A FREETEXT
query. | Windows 2000, windows 2000, and case combinations such as wInDows 2000; w2k, W2k, or w2K; Server, server, and case combinations such as SeRvEr; W2K Server and case combinations of that term.
No results are
returned for W2K operating system. |
| "W2K Server" | No | W2K Server, w2k Server, W2k Server, w2K Server, W2K server, w2k server, W2k server, or w2K server. |
Note In each of the previous examples in the table, the
case-sensitivity setting for search is specified as false. If the
case-sensitivity setting is specified as true, all the case differences are
significant when pattern matching is performed. If two replacement sets that
have similar patterns are being matched, the longer of the two replacement sets
takes precedence. For example, if you have the following two replacement sets,
the term
Internet Explorer takes precedence over the term
Internet:
<replacement>
<pat>Internet</pat>
<sub>intranet</sub>
</replacement>
<replacement>
<pat>Internet Explorer</pat>
<sub>IE</sub>
<sub>IE 5</sub>
</replacement>
For this replacement set, the following table shows the results that
occur based on user input from the search interface on the portal site.
Collapse this tableExpand this table
| User input | Whether a thesaurus is used | Text
in documents that are returned in the search results |
|---|
| Internet | Yes. A FREETEXT
query. | Intranet, intranet, or case combinations such as iNtranEt. No results are returned for IE or IE 5. |
| Internet Explorer | Yes. A
FREETEXT query.
| IE, IE 5, and case combinations such as iE or Ie 5. No results are returned for Internet, Internet Explorer, or intranet. |
Expansion sets
An expansion set is a group of substitutions that are synonyms of
each other. Queries that contain matches in one substitution are expanded to
include all other substitutions in the expansion set. For example, you can add
an expansion set where the following substitutions are synonyms:
If you query the term
author, SharePoint Portal Server 2003 and SharePoint Server 2007 also
return search results that contain the term
writer and the term
journalist.
Each expansion set is enclosed in an <expansion>
tag. In the expansion tag, you specify one or more substitutions that are
enclosed by a <sub> tag. For the example that is described earlier, add
the following lines:
<expansion>
<sub>writer</sub>
<sub>author</sub>
<sub>journalist</sub>
</expansion>
Word stemming
Word stemming maps a linguistic stem to all matching words. You
can specify word stemming in pattern entries and substitution entries. For
example, in English, the stem
buy matches the following:
You can specify word stemming by adding two asterisks to the
end of the string. SharePoint Portal Server 2003 and SharePoint Server 2007
then return matches for variations of the word. For example, you might want to
create queries for the term
run that also return the following terms:
To do this, modify the expansion set as follows:
<expansion>
<sub weight="0.5">run**</sub>
<sub weight="0.5">jog**</sub>
</expansion>
If you query the term
run or the term
running, the search results include the term
jog and the term
jogging. If you query the term
running, you receive the same search results that you receive when you
query the term
run.
For example, if your thesaurus file includes the
<pat>User1 ran to the store** </pat> pattern or the <sub>
User1 ran to the store**</sub> substitution, the query returns the
following strings, or search adds the following strings to the query:
- User1 runs to the store
- User1 running to the store
- User1 ran to the store
- User1 runs to the stores
- User1 running to the stores
- User1 ran to the stores
How to change a thesaurus file
To change the thesaurus file, follow these steps:
- Start Notepad, and then open the thesaurus file.
Note If the thesaurus file contains double-byte character set (DBCS)
characters, you must save the thesaurus file in Unicode format code before you
change the thesaurus file. - If you are changing the thesaurus file for the first time,
remove the following comment lines that appear at the beginning and the end of
the file:
- If you do not want the patterns to be case sensitive, add
the following tag at the beginning of the file:
<case caseflag="false"></case>
If you want the patterns to be case sensitive later in the file,
change the setting from "false" to "true" in the tag as follows: <case caseflag="true"></case>
- Make the changes that you want. Add, modify, or delete a
replacement set or an expansion set. Add, modify, or delete the weighting or
the stemming that is configured for a set.
Note The entries that you add to the thesaurus file cannot contain
only special characters or only noise words. However, you can have blank
entries. For example, if you want to make sure that queries for a specific term
return no results, change the entry. In the following example, queries for the
term windows do not return results:<replacement>
<pat>windows</pat>
<sub></sub>
</replacement>
- Save the thesaurus file, and then quit Notepad.
How to use the Robots.txt file and HTML tags to prevent access to content on the portal site
You can use a Robots.txt file to control where robots (Web
crawlers) can go on a Web site. You can also use the Robots.txt file to
indicate whether to exclude specific crawlers. Web servers use these rules to
control access to Web sites by preventing robots from accessing certain areas.
SharePoint Portal Server 2003 and SharePoint Server 2007 look for this file
when it crawls, and it obeys the restrictions that are contained in the
Robots.txt file.
You can prevent another server from crawling content
on the portal site by modifying the Robots.txt file. For example, you might
want to restrict a specific robot from accessing the server because the
frequency of requests from the robot is blocking the Web site. You may also
want to restrict all robots from certain areas on the
server.
SharePoint Portal Server 2003 and SharePoint Server 2007 do
not install a Robots.txt file. However, you can create a Robots.txt file and
put the Robots.txt file in the home directory of the default Web site on the
server. To determine the home directory of the default Web site on the server,
follow these steps:
- Start Internet Information Services (IIS)
Manager.
- Expand server
name, and then expand Web
Sites.
- Right-click Default Web Site, and then
click Properties.
- Click the Home Directory tab.
- Make a note of the path that appears in the Local
Path box, and then click Cancel.
Put the
Robots.txt file in the path that appears in the Local Path
box. For example, if the path is D:\Inetpub\Wwwroot, put the Robots.txt in the
D:\Inetput\Wwwroot folder on the server. To confirm that the Robots.txt file is
in the correct folder on the server, start your Web browser, and then type
http://server
name/robots.txt.
You can restrict access to certain documents by using HTML META
tags. HTML META tags tell the robot whether a document can be included in the
index and whether the robot can follow the links in the document by using the
INDEX/NOINDEX attribute and the FOLLOW/NOFOLLOW attributes in the tag. For
example, you can mark a document with the following if you do not want the
document crawled and you do not want links in the document followed:
<META name="robots" content= "NOINDEX, NOFOLLOW">
SharePoint Portal Server 2003 and SharePoint Server 2007
automatically obey the restrictions that are contained in the Robots.txt file.
Note for Microsoft Office SharePoint Server2007, you must
restart the Office SharePoint Server Search service before thesaurus updates
are applied to search queries. Also, changes to thesaurus files must be
manually copied to every server in the farm that is serving search queries. To
be thorough and allow for topology chagnes, you can copy the changes to all
servers in the farm.