When you use Microsoft SharePoint Portal Server or Microsoft Office SharePoint Portal Server 2003 as a search
engine to create a search result catalog against a Microsoft Content
Management Server (MCMS) 2002 Web site, and you then update the search result
catalog incrementally (that is, you perform a SharePoint Portal Server incremental crawl), all incremental crawls that are performed against the MCMS 2002 site are performed as full crawls.
To resolve this issue, add code
to your MCMS 2002 page templates so that SharePoint Portal Server receives the
Last-Modified date and time stamp and the Microsoft Internet Information Services (IIS) response code that SharePoint Portal Server must have to determine whether the
posting must be re-catalogued. To do this, you must remove the output cache directive in the MCMS 2002
template code. The output cache directive is typically declared at the beginning of the MCMS 2002 template code-behind file (this file is Aspx.cs or Aspx.vb). After you remove the output cache directive from
the template, you can still use downlevel caching with
the sample code that this article contains.
The code first retrieves
the If-Modified-Since HTTP header value from the conditional HTTP GET request.
After the code retrieves the value, the code obtains the last modified value of
the posting, compares the two date and time stamps, and then returns the corresponding IIS return status code to the client. At the end of the code, your site can still use the
output case while the output cache directive is removed from the
template.
Sample Code
//Declare the variables that you need.
System.DateTime LastModifiedTime, MyModifiedTime, IncrementalIndexTime;
System.String MyString;
bool Return304 = false;
//Get the last modified time for the current MCMS posting.
LastModifiedTime = CmsHttpContext.Current.Posting.LastModifiedDate;
//Converting the time format for comparison
MyModifiedTime = CmsHttpContext.Current.Posting.LastModifiedDate.ToUniversalTime();
//Retrieving the If-Modified-Sinced HTTP header value from the HTTP GET request
MyString = HttpContext.Current.Request.Headers.Get("If-Modified-Since");
//Check to see if it is a conditional HTTP GET.
if (MyString != null)
{
//This is a conditional HTTP GET request. Compare the strings.
try
{
IncrementalIndexTime = Convert.ToDateTime(MyString).ToUniversalTime();
if(IncrementalIndexTime.ToString() == CmsHttpContext.Current.Posting.LastModifiedDate.ToString())
{
Return304 = true;
}
}
catch
{
}
}
if(Return304 == true)
{
Response.StatusCode = 304;
Response.End();
}
if(CmsHttpContext.Current.Mode==Microsoft.ContentManagement.Publishing.PublishingMode.Published)
{
//This is the code that causes ASP.NET to send the header.
Response.Cache.SetLastModified(CmsHttpContext.Current.Posting.LastModifiedDate.ToLocalTime());
//The following lines enable downlevel caching in proxy servers or browser cache.
Response.Cache.SetCacheability(System.Web.HttpCacheability.Public);
//Set the expiration time for the downlevel cache (5 minutes is used in this sample).
Response.Cache.SetExpires(System.DateTime.Now.AddMinutes(5));
Response.Cache.SetValidUntilExpires(true);
A SharePoint Portal Server incremental crawl relies on two factors that the IIS server returns:
- A response status code of either 304 (Not Modified) or 200 (OK)
to the condition HTTP GET request from SharePoint Portal Server.
- A Last-Modified date and time stamp for the posting. The Last-Modified date and time stamp is found in the Last-Modified HTTP header.
When SharePoint Portal Server starts an incremental crawl, SharePoint Portal Server sends HTTP GET requests
to all the postings on the Web site. If a record shows that the
posting has been previously catalogued, SharePoint Portal Server sends out a condition HTTP
GET request. A condition HTTP GET request is an HTTP GET request with the If-Modified-Since HTTP header. The
If-Modified-Since date and time stamp is the Last-Modified date and time stamp value
that is received from IIS when the posting is catalogued. With the value for
the If-Modified-Since HTTP header, IIS compares the last
modified date and time. If the last modified date
is earlier than or equal to the value that is received from the If-Modified-Since header, IIS returns a status code of 304, and SharePoint Portal Server skips the posting. If the last modified date
is not earlier than or equal to the value that is received from the If-Modified-Since header, IIS returns a status code of 200, and SharePoint Portal Server re-indexes the posting.
By design, a request to MCMS 2002 postings always yields
an IIS return status of 200 because MCMS 2002 postings are generated on the fly,
and there is no physical file that IIS can use to compare the last modified date and time value. Because of the by-design behavior of MCMS 2002, incremental
crawls against MCMS 2002 from SharePoint Portal Server are not successful; therefore, incremental
crawls against MCMS 2002 from SharePoint Portal Server cause a full index every
time. This may be very time-consuming on large sites. This behavior has not
been confirmed on search engines other than SharePoint Portal Server; however, this
may be an issue on other search engines that also rely on the IIS return
status code and the Last-Modified HTTP header value to perform incremental
indexing on a Web site. If this is an issue on other search engines, you can use the solution that this article describes to resolve
the issue.
When you perform a search against an MCMS 2002 Web site, you may also want to make sure that you are not using Microsoft Office Thicket files as resources or attachments on the MCMS 2002 postings.
For more information, click the following article number to view the article in the Microsoft Knowledge Base:
830718Â
(http://kbalertz.com/Feedback.aspx?kbNumber=830718/
)
Indexing takes a long time when an HTML resource exists in MCMS