安装IK分词插件
从GitHub
上下载项目(我下载到了/tmp
下),并解压
cd /tmp
wget https://github.com/medcl/elasticsearch-analysis-ik/archive/master.zip
unzip master.zip
进入elasticsearch-analysis-ik-master
cd elasticsearch-analysis-ik/
然后使用mvn
命令,编译出jar包,elasticsearch-analysis-ik-1.4.0.jar
,这个过程可能需要多尝试几次才能成功
mvn package
顺便说一下,mvn
需要安装maven
,在Ubuntu
上,安装maven
的命令如下
apt-cache search maven
sudo apt-get install maven
mvn -version
将elasticsearch-analysis-ik-master/
下的ik
文件夹复制到${ES_HOME}/config/
下
将elasticsearch-analysis-ik-master/target
下的elasticsearch-analysis-ik-1.4.0.jar
复制到${ES_HOME}/lib
下
在${ES_HOME}/config/
下的配置文件elasticsearch.yml
中增加ik
的配置,在最后增加
index:
analysis:
analyzer:
ik:
alias: [ik_analyzer]
type: org.elasticsearch.index.analysis.IkAnalyzerProvider
ik_max_word:
type: ik
use_smart: false
ik_smart:
type: ik
use_smart: true
index.analysis.analyzer.default.type: ik
同时,还需要在${ES_HOME}/lib
中引入httpclient-4.3.5.jar
和httpcore-4.3.2.jar
IK分词测试
创建一个索引,名为index
curl -XPUT http://localhost:9200/index
为索引index
创建mapping
curl -XPOST http://localhost:9200/index/fulltext/_mapping -d '
{
"fulltext": {
"_all": {
"analyzer": "ik"
},
"properties": {
"content": {
"type" : "string",
"boost" : 8.0,
"term_vector" : "with_positions_offsets",
"analyzer" : "ik",
"include_in_all" : true
}
}
}
}'
测试
curl -XGET 'localhost:9200/index/_analyze?analyzer=ik&pretty=true' -d '
{
测试Elasticsearch分词器
}'
{
"tokens" : [ {
"token" : "测试",
"start_offset" : 9,
"end_offset" : 11,
"type" : "CN_WORD",
"position" : 1
}, {
"token" : "elasticsearch",
"start_offset" : 11,
"end_offset" : 24,
"type" : "ENGLISH",
"position" : 2
}, {
"token" : "分词器",
"start_offset" : 24,
"end_offset" : 27,
"type" : "CN_WORD",
"position" : 3
}, {
"token" : "分词",
"start_offset" : 24,
"end_offset" : 26,
"type" : "CN_WORD",
"position" : 4
}, {
"token" : "词",
"start_offset" : 25,
"end_offset" : 26,
"type" : "CN_WORD",
"position" : 5
}, {
"token" : "器",
"start_offset" : 26,
"end_offset" : 27,
"type" : "CN_CHAR",
"position" : 6
} ]
}