elasticsearch的bulk操作

2002 查看

本文主要记录如何用curl进行es的bulk操作。

bulk请求

准备数据

vim documents.json
{ "index": {"_index": "library", "_type": "book", "_id": "1"}}
{ "title": "All Quiet on the Western Front","otitle": "Im Westen nichts Neues","author": "Erich Maria Remarque","year": 1929,"characters": ["Paul Bäumer", "Albert Kropp", "Haie Westhus", "Fredrich Müller", "Stanislaus Katczinsky", "Tjaden"],"tags": ["novel"],"copies": 1, "available": true, "section" : 3}
{ "index": {"_index": "library", "_type": "book", "_id": "2"}}
{ "title": "Catch-22","author": "Joseph Heller","year": 1961,"characters": ["John Yossarian", "Captain Aardvark", "Chaplain Tappman", "Colonel Cathcart", "Doctor Daneeka"],"tags": ["novel"],"copies": 6, "available" : false, "section" : 1}
{ "index": {"_index": "library", "_type": "book", "_id": "3"}}
{ "title": "The Complete Sherlock Holmes","author": "Arthur Conan Doyle","year": 1936,"characters": ["Sherlock Holmes","Dr. Watson", "G. Lestrade"],"tags": [],"copies": 0, "available" : false, "section" : 12}
{ "index": {"_index": "library", "_type": "book", "_id": "4"}}
{ "title": "Crime and Punishment","otitle": "Преступлéние и наказáние","author": "Fyodor Dostoevsky","year": 1886,"characters": ["Raskolnikov", "Sofia Semyonovna Marmeladova"],"tags": [],"copies": 0, "available" : true}

关闭refresh

curl -XPUT '192.168.99.100:9200/library -d '
{
    "settings":{
        "refresh_interval":"-1"
    }
}
'

发送请求

curl -s -XPOST '192.168.99.100:9200/_bulk' --data-binary @document.json
{"took":2603,"errors":false,"items":[{"index":{"_index":"library","_type":"book","_id":"1","_version":1,"_shards":{"total":2,"successful":1,"failed":0},"status":201}},{"index":{"_index":"library","_type":"book","_id":"2","_version":2,"_shards":{"total":2,"successful":2,"failed":0},"status":200}},{"index":{"_index":"library","_type":"book","_id":"3","_version":2,"_shards":{"total":2,"successful":2,"failed":0},"status":200}},{"index":{"_index":"library","_type":"book","_id":"4","_version":2,"_shards":{"total":2,"successful":2,"failed":0},"status":200}}]}%

refresh

更改回每隔1s将内存的segment刷回文件系统缓存

curl -XPUT '192.168.99.100:9200/library -d '
{
    "settings":{
        "refresh_interval":"1"
    }
}
'

或者再手动刷新一次

curl -XPOST '192.168.99.100:9200/_refresh

head插件安装

cd /usr/share/elasticsearch
./bin/plugin install mobz/elasticsearch-head

重启es

cd /etc/init.d
./elasticsearch restart
{
    "query": {
        "query_string": {
            "query": "title:crime"
        }
    }
}

要返回版本信息的话:

{
    "version": true, 
    "query": {
        "query_string": {
            "query": "title:crime"
        }
    }
}

返回指定字段:

{
    "fields": ["title","year"], 
    "query": {
        "query_string": {
            "query": "title:crime"
        }
    }
}

关于flush

refresh只是将内存的segment刷回到文件系统缓存(刷到文件系统缓存中lucene就可以检索这个segment),还没有到磁盘。es在将数据写入内存buffer同时,会写一份translog日志,refresh的时候,translog保持原样。
flush是真正把segment刷回到磁盘,更新commit文件(该文件用来记录索引中的所有segment)时,translog清空的过程。这个flush的频率默认是30分钟主动flush一次,或者translog大小大于512M时主动flush一次。

参考