如何使用lucene对html文件进行索引
1个回答
展开全部
1 和普通文件一样的操作,先正则过滤掉HTML标记,只取内容。索引
2 public virtual void Add(PageViewModel model)
{
try
{
EnsureDirectoryExists();
StandardAnalyzer analyzer = new StandardAnalyzer(LUCENEVERSION);
using (IndexWriter writer = new IndexWriter(FSDirectory.Open(new DirectoryInfo(IndexPath)), analyzer, false, IndexWriter.MaxFieldLength.UNLIMITED))
{
Document document = new Document();
document.Add(new Field("id", model.Id.ToString(), Field.Store.YES, Field.Index.ANALYZED));
document.Add(new Field("content", model.Content, Field.Store.YES, Field.Index.ANALYZED));
document.Add(new Field("contentsummary", GetContentSummary(model), Field.Store.YES, Field.Index.NO));
document.Add(new Field("title", model.Title, Field.Store.YES, Field.Index.ANALYZED));
document.Add(new Field("tags", model.SpaceDelimitedTags(), Field.Store.YES, Field.Index.ANALYZED));
document.Add(new Field("createdby", model.CreatedBy, Field.Store.YES, Field.Index.NOT_ANALYZED));
document.Add(new Field("createdon", model.CreatedOn.ToShortDateString(), Field.Store.YES, Field.Index.NOT_ANALYZED));
document.Add(new Field("contentlength", model.Content.Length.ToString(), Field.Store.YES, Field.Index.NO));
writer.AddDocument(document);
writer.Optimize();
}
}
catch (Exception ex)
{
if (!ApplicationSettings.IgnoreSearchIndexErrors)
throw new SearchException(ex, "An error occured while adding page '{0}' to the search index", model.Title);
}
}
2 public virtual void Add(PageViewModel model)
{
try
{
EnsureDirectoryExists();
StandardAnalyzer analyzer = new StandardAnalyzer(LUCENEVERSION);
using (IndexWriter writer = new IndexWriter(FSDirectory.Open(new DirectoryInfo(IndexPath)), analyzer, false, IndexWriter.MaxFieldLength.UNLIMITED))
{
Document document = new Document();
document.Add(new Field("id", model.Id.ToString(), Field.Store.YES, Field.Index.ANALYZED));
document.Add(new Field("content", model.Content, Field.Store.YES, Field.Index.ANALYZED));
document.Add(new Field("contentsummary", GetContentSummary(model), Field.Store.YES, Field.Index.NO));
document.Add(new Field("title", model.Title, Field.Store.YES, Field.Index.ANALYZED));
document.Add(new Field("tags", model.SpaceDelimitedTags(), Field.Store.YES, Field.Index.ANALYZED));
document.Add(new Field("createdby", model.CreatedBy, Field.Store.YES, Field.Index.NOT_ANALYZED));
document.Add(new Field("createdon", model.CreatedOn.ToShortDateString(), Field.Store.YES, Field.Index.NOT_ANALYZED));
document.Add(new Field("contentlength", model.Content.Length.ToString(), Field.Store.YES, Field.Index.NO));
writer.AddDocument(document);
writer.Optimize();
}
}
catch (Exception ex)
{
if (!ApplicationSettings.IgnoreSearchIndexErrors)
throw new SearchException(ex, "An error occured while adding page '{0}' to the search index", model.Title);
}
}
推荐律师服务:
若未解决您的问题,请您详细描述您的问题,通过百度律临进行免费专业咨询