This article lists the different crawl types that are available in Microsoft Office SharePoint Portal Server 2003. This article also identifies the types of crawl and the conditions where content is removed from a content index.
The following table lists conditions where content may be removed from a content index and contains information about whether content is removed from the content index for each crawl type.
Collapse this tableExpand this table
| Condition | Full crawl | Incremental crawl | Incremental-inclusive crawl | Adaptive crawl |
|---|
| An HTTP 300 error message is returned. | Content is not removed. | Content is not removed. | Content is not removed. | Content is not removed. |
| An HTTP 400 error message is returned. | Content is immediately removed. | Content is immediately removed. | Content is immediately removed. | Content is immediately removed. |
| An HTTP 500 error message is returned. | Content is not removed. | Content is not removed. | Content is not removed. | Content is not removed. |
| A Web page on the portal site page is deleted. | Content is removed after the third crawl. | Content is not removed. | Content is immediately removed. | Content is not removed. |
| A Web page on a Microsoft Windows SharePoint Services Web site is deleted. | Content is removed after the third crawl. | Content is not removed. | Content is immediately removed. | Content is not removed. |
| A rule is created to exclude content. | Content is immediately removed. | Content is immediately removed. | Content is immediately removed. | Content is immediately removed. |
| A content source is deleted. | Content is immediately removed. | Content is immediately removed. | Content is immediately removed. | Content is immediately removed. |
| A URL in the content index has no hits. | Content is removed after the third crawl. | Content is not removed. | Content is not removed. | Content is not removed. |
For conditions where content is removed after the third crawl, three full updates of the content index must occur before any previously crawled pages are removed from the content index.
The reason this logic exists for full crawls is that, generally, it is impossible to know exactly why a particular URL was "unvisited" during a full crawl because the crawler in SharePoint Portal Server 2003 does not keep track of all links between URLs. Therefore, this is a precautionary measure to prevent the unintended removal of content from the content index.
For conditions where content is immediately removed, the actual time that it takes for content to be removed may vary. The actual time depends on the time it takes for the crawl operation to complete and the time it takes for the index to propagate from the index management server to the search server.
Customers may see documents being removed with fewer than three full crawls. There may have been another crawl between the time that the document was deleted and the time that you started a full crawl, or between full crawls. However, the constant that keeps track of how many crawls the document is kept before it is deleted is set to three.
For more information about how to manage search settings in SharePoint Portal Server 2003, see the "Managing Search Settings" topic in the "Administration" chapter of the
Microsoft Office SharePoint Portal Server 2003 Administrator's Guide. The
Microsoft Office SharePoint Portal Server 2003 Administrator's Guide (Administrator's Help.chm) is located in the Docs folder in the root of the SharePoint Portal Server 2003 CD.
For more information about SharePoint Portal Server 2003, visit the following Microsoft Web site: