ruby on rails - Elasticsearch fixed score based on content similarity -


i working on tool identify similar documents , mark them duplicated.

to so, using elasticsearch check on documents content elasticsearch take care of managing synomns , possible typos, haven't got come query reach goals.

so far came query:

{  "query":{     "filtered":{        "query":{           "more_like_this":{              "fields":[                 "description"              ],              "like_text":"lorem ipsum dolor sit amet, consectetur adipiscing elit.",              "min_term_freq":1,              "max_query_terms":999,              "min_doc_freq":1           }        }     }  },  "from":0,  "size":999,  "search_type": "dfs_query_then_fetch",  "sort":[     {        "_score":{           "order":"desc"        }     }  ] } 

but seems score gives me quite random, have score 100 contents equal while 0 different.

i see going, out of box, scoring going relevant particular query because based on term frequencies , position. score great results query, meaningless query query. so, wrap in constant score query.

if down putting each term in own query, can provide example of possibly solving multiple constant scores ina bool query inside bool query.


Comments

Popular posts from this blog

c++ - CryptStringToBinary API behavior -

java.util.scanner - How to read and add only numbers to array from a text file -

iphone - Three second countdown in cocos2d -