This project is read-only.


Create Full Text Index

HubbleDotNet can build full text index for existing tables or views in relational database. The whole setup process is less than 5 minutes. We will use a few examples to demonstrate the process. In this document, we introduce how to build full text index in "Append Only" mode.
In order to create full text index for existing tables or views, we need to create a database in HubbleDotNet first. Please refer to "Create and Delete Database" for how to create database in HubbleDotNet.

After we create HubbleDotNet database, we can build full text index for tables or views in relational database.

Now we use News database as example to create full text index

Open query analyzer, Right click on "News" database and choose create table

 

Create a full text index for Chinese news

Configure HubbleDotNet database

 

As in above screen shot, we input "CNews" as table name, and the directory of full text index, Choose database adapter. Here we use SQLServer 2005 as database adapter. It can be used for SQL Server 2005 and later versions.

Configure connection string and click Next.

Choose Index Mode

Now we need to choose Index Mode

Build Index from exist table

In following textbox, we need to type in the name for the real table or view in database. so we input "News"

There are two options in "Incremental Mode"

Append Only model is used when only new data is added into database. This mode only requires that the index fields won't change. This mode uses less memory than Append/Delete/Update mode, and of course faster.

In order to use this mode, the table in database must have a DocId field, and this field must have a unique index (clustered index is better), and this index must self-increamental. This means new index value must larger than existing index values.

Append/Delete/Update model can archive add, delete and update to records. It uses more records then Append mode ( 4 bytes more for each record). Under this mode, database must not have a DocId field, but it must have an int id field. The name of this id field can be anything but "DocId".

We will discuss about when a table has non-int primary key later.

Now we introduce Append Only mode. Following diagram shows the structure of data tables.


Once it is configured, click next to setup fields

 

这里要特别注意的是,8.3.0 及以前版本,如果数据表中有一些特殊的数据类型,会出现一个 Tcp closed 的错误,这是一个Bug,请升级到8.3.0.1 以上版本,如何升级,见 HubbleDotNet 开源全文搜索数据库项目--如何升级 , 升级后将会出现正确提示,对特殊类型的处理,我将在后续篇幅中阐述。

Configure Index Fields

As in above diagram, HubbleDotNet will automatically list all indexed fields

Now we choose Title and Content as full text index fields. Because it is chinese fields, we use PanGuSegement as Tokenized field.

We choose Single Value Index and UnTokenized for Time field.

Choose None for Url field

The checkbox on left of each record is used for deleting fields

Click Next to last step

Finish index

 

This step will list the create sql statement. Click Finish to create index

Choose Yes to build table

When full text index is built, we can optimize it

 

Now we can search

 

Search News

Search Example 1

Search all the titles which has “北京” “大学” and order by ranking

The argument after 单词分量 are listed below

First argument is the weight of the 单词分量. It is 5000 in this example
Second argument is the position of the keyword in text

For Example “北京” is at position 0. "大学" is at position 2

Top 10 means first 10 records

 

Example 2

In this example, title field contains an argument "^2". This means the weight for title field is 2.

Weight value is from 0-9. This value can be used for pagination.

Example 3

Search using "Contains" for precise match. The number of records found by contains search is much less then previous examples.

This is because only the title who contains both “北京” and “大学” are returned.

Example 4

Search for records whose title contains “北京” “大学” and time is bewteen 2007.01.01 to 2007.08.16, order by time.

Example 4

Search for records whose title contains “北京” “大学” and time is bewteen 2007.01.01 to 2007.08.16, order by time and rank

 

 

</form>

Last edited Aug 19, 2011 at 5:20 AM by linkspeed, version 3

Comments

No comments yet.