ElasticSearch in 5 minutes

ElasticSearch makes it easy to run a full-featured search server. In fact, its so easy, I'm going to show you how in 5 minutes!

Installing and running ElasticSearch

For the purposes of this tutorial, I'll assume you're on a Linux or Mac environment.

You should also have JDK 6 or above installed.

wget https://github.com/downloads/elasticsearch/elasticsearch/elasticsearch-0.19.0.tar.gz
tar -zxvf elasticsearch-0.19.0.tar.gz
cd elasticsearch-0.19.0.tar.gz
bin/elasticsearch -f

You should see something like this in the terminal.

[2012-03-03 12:36:15,327][INFO ][node] [Ultra-Marine] {0.19.0}[9097]: initializing ...
[2012-03-03 12:36:15,337][INFO ][plugins] [Ultra-Marine] loaded [], sites []
...
[2012-03-03 12:36:20,509][INFO ][http] [Ultra-Marine] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/172.16.174.1:9200]}
[2012-03-03 12:36:20,509][INFO ][node] [Ultra-Marine] {0.19.0}[9097]: started
[2012-03-03 12:36:20,511][INFO ][gateway] [Ultra-Marine] recovered [0] indices into cluster_state

ElasticSearch is now running! You can access it at http://localhost:9200 on your web browser, which returns this:

{
  "ok" : true,
  "status" : 200,
  "name" : "Ultra-Marine",
  "version" : {
    "number" : "0.19.0",
    "snapshot_build" : false
  },
  "tagline" : "You Know, for Search"
}

Indexing Data

We're now going to index some data to our ElasticSearch instance. We'll use the example of a blog engine, which has some posts and comments.

curl -XPUT 'http://localhost:9200/blog/user/dilbert' -d '{ "name" : "Dilbert Brown" }'

curl -XPUT 'http://localhost:9200/blog/post/1' -d '
{
    "user": "dilbert",
    "postDate": "2011-12-15",
    "body": "Search is hard. Search should be easy." ,
    "title": "On search"
}'


curl -XPUT 'http://localhost:9200/blog/post/2' -d '
{
    "user": "dilbert",
    "postDate": "2011-12-12",
    "body": "Distribution is hard. Distribution should be easy." ,
    "title": "On distributed search"
}'


curl -XPUT 'http://localhost:9200/blog/post/3' -d '
{
    "user": "dilbert",
    "postDate": "2011-12-10",
    "body": "Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat" ,
    "title": "Lorem ipsum"
}'

To each of these requests, you should have received a response that verifies that the operation was successful, for example:

{"ok":true,"_index":"blog","_type":"post","_id":"1","_version":1}

Let's verify that all operations were successful.

curl -XGET 'http://localhost:9200/blog/user/dilbert?pretty=true'
curl -XGET 'http://localhost:9200/blog/post/1?pretty=true'
curl -XGET 'http://localhost:9200/blog/post/2?pretty=true'
curl -XGET 'http://localhost:9200/blog/post/3?pretty=true'

Note that there are 2 main ways of adding data to ElasticSearch:

  1. json over HTTP
  2. Native client

We'll explore these in greater detail in a subsequent tutorial.

Searching

Let's see if we can retrieve the documents we just added via search.

Find all blog posts by Dilbert:

curl 'http://localhost:9200/blog/post/_search?q=user:dilbert&pretty=true'

This returns the following JSON result:

{
  "took" : 85,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "blog",
      "_type" : "post",
      "_id" : "1",
      "_score" : 1.0, "_source" :
{
    "user": "dilbert",
    "postDate": "2011-12-15",
    "body": "Search is hard. Search should be easy." ,
    "title": "On search"
}
    }, {
      "_index" : "blog",
      "_type" : "post",
      "_id" : "2",
      "_score" : 0.30685282, "_source" :
{
    "user": "dilbert",
    "postDate": "2011-12-12",
    "body": "Distribution is hard. Distribution should be easy." ,
    "title": "On distributed search"
}
    }, {
      "_index" : "blog",
      "_type" : "post",
      "_id" : "3",
      "_score" : 0.30685282, "_source" :
{
    "user": "dilbert",
    "postDate": "2011-12-10",
    "body": "Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat" ,
    "title": "Lorem ipsum"
}
    } ]
  }
 

Nice!

All posts which don't contain the term search:

curl 'http://localhost:9200/blog/post/_search?q=-title:search&pretty=true'

Retrieve the title of all posts which contain search and not distributed:

curl 'http://localhost:9200/blog/post/_search?q=+title:search%20-title:distributed&pretty=true&fields=title'

A range search on postDate:

curl -XGET 'http://localhost:9200/blog/_search?pretty=true' -d '
{
    "query" : {
        "range" : {
            "postDate" : { "from" : "2011-12-10", "to" : "2011-12-12" }
        }
    }
}'

You'll learn more about the various URL query parameters in a separate tutorial.

The usual Lucene query syntax is available either through the JSON query language, or through the query parser.

Shutdown

To shutdown ElasticSearch, from the terminal where you launched elasticsearch, hit Ctrl+C. This will shutdown ElasticSearch cleanly.

ElasticSearch is fairly robust, so even in situations of OS or disk crashes, it is unlikely that ElasticSearch's index will become corrupted.

Where to from here?

  1. Check out one of the books about ElasticSearch below.
  2. Learn more about ElasticSearch's basic concepts

Popular books related to ElasticSearch and search