If youre an Elasticsearch person, I am narrowing my problem. If - TopicsExpress



          

If youre an Elasticsearch person, I am narrowing my problem. If youre not, please feel free to ignore this update. Im posting this far and wide, because I really need someone to see this and say, Dude, youre an idiot. Heres what youre doing wrong... Okay, lets attack this directly. We have a cluster of 6 machines (6 nodes). We have an index of just under 3.5 million documents. Each document represents an Internet domain name. We are performing queries against this index to see names that exist in our index. Most queries are coming back in the sub-50ms range. But a bunch are taking 600ms to 900ms and, thus, showing up in our slow query log. If they ALL were performing at this speed, Id wouldnt be nearly as confused, but it looks like only about 10% to 20% of the queries are slow. Thats clearly too much. Head reports that this index looks like this: aftermarket-2014-07-31_02-38-19 size: 424Mi (2.47Gi) docs: 3,428,471 (3,428,471) Here is the configuration for a typical node (theyre all pretty-much the same). We have 2 machines in a dev data center, 2 machines in a mesa data center and 2 machines in a phx data center. Each of the two machines in a data center has a node.zone tag set, and, as you can see, I have the cluster routing awareness set to see zone as its marching orders. The data pipes between the data centers are beefy, and while I acknowledge that cross-DC isnt something thats generally smiled-upon, it appears to work fine. Each machine has 96G of RAM. We start ES giving it 30G for the heap size. File descriptors are set at 64,000. Note that Ive selected the memory mapped file system. # # Server-specific settings for cluster domainiq-es # cluster.name: domainiq-es node.name: Mesa-03 node.zone: es-mesa-prod discovery.zen.ping.unicast.hosts: [dev2.glbt1.gdg, m1p1.mesa1.gdg, m1p4.mesa1.gdg, p3p3.phx3.gdg, p3p4.phx3.gdg] # # The following configuration items should be the same for all ES servers # node.master: true node.data: true index.number_of_shards: 5 index.number_of_replicas: 5 index.store.type: mmapfs index.memory.index_buffer_size: 30% index.translog.flush_threshold_ops: 25000 index.refresh_interval: 30s bootstrap.mlockall: true cluster.routing.allocation.awareness.attributes: zone gateway.recover_after_nodes: 4 gateway.recover_after_time: 2m gateway.expected_nodes: 6 discovery.zen.minimum_master_nodes: 3 discovery.zen.ping.timeout: 10s discovery.zen.ping.retries: 3 discovery.zen.ping.interval: 15s discovery.zen.ping.multicast.enabled: false And here is a typical slow query: [2014-07-31 07:35:31,530][WARN ][index.search.slowlog.query] [Mesa-03] [aftermarket-2014-07-31_02-38-19][2] took[707.6ms], took_millis[707], types[premium], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[], extra_source[{size:35,query:{query_string:{query:sld:petusies^20.0 OR tokens:(((pet^1.2 pets^1.0 *^1.0)AND(us^1.2 *^0.8)AND(ie^1.2 *^0.6)AND(s^1.2 *^0.4)) OR((pet^1.2 pets^1.0)AND(us^1.2)AND(ie^1.2))^3.0) AND tld:(com^1.001 OR in^0.99 OR co.in^0.941174367459617 OR net.in^0.8848832474555992 OR us^0.85 OR org.in^0.8397882862729736 OR gen.in^0.785829669672289 OR firm.in^0.7414549824163524 OR ind.in^0.7 OR org^0.6) OR _id:petusi.es^5.0-domaintype:partner,lowercase_expanded_terms:true,analyze_wildcard:false}}}], So note that I create 5 shards and 5 replicas, so that each node has all 5 shards at all times. I THOUGHT THIS MEANT BETTER PERFORMANCE. That is, I thought having all 5 shards on every node meant that a query to a node didnt have to ask another node for data. IS THIS NOT TRUE? Heres where it also gets interesting: I tried setting the number of shards to 2 (with 5 replicas) and my slow queries went to almost 2 seconds (2000ms). This is also terribly counter-intuitive! I thought fewer shards meant less lookup time. Clearly, I want to optimize for read here. I dont care if indexing is three times as slow, we need our queries to be sub-100ms.
Posted on: Thu, 31 Jul 2014 14:56:43 +0000

Trending Topics



Yourself-So-You-Can-Love-Others-Its-tough-to-enjoy-life-topic-310057172452327">Love Yourself So You Can Love Others Its tough to enjoy life
Its GAME DAY for the #Pistons and were going for our 8th straight
Es ist das gute Recht der Deutschen, verärgert über den
Hey All... just a reminder... Best Singer/Songwriter 80s Cover
This one gets in, then this ones gets in and the problem stays the
For the eyes of the Lord move to and fro throughout the earth that

Recently Viewed Topics




© 2015