performance - MongoDB Given a list of keys, get all matching docs and create new docs for non-matching keys -


say i've got collections of user documents indexed email address. given list of email addresses, need to:

1. each user doc email in list 2. create new user doc each email in list no user exists. 

i can solve first problem $in query, hoping there way $in query return list of emails not found in db. can efficiently insert new docs. otherwise, have loop on docs find emails weren't picked up.

what's efficient way accomplish both of above tasks? there fast way batch insert new user docs set of unique emails?

i hoping there way $in query return list of emails not found in db.

you use $nin. unfortunately, $ne , $nin can't make use of indexes, might not best bet (but maybe worth try).

the best approach depends on 'cache-miss-rate', should work if number of existing matches isn't high (pseudocode)

var emails; var matchingmails = users.find({"email" : {$in : emails}}, {"email":1, "_id":0}); var newemails = emails.subtract(matchingmails); // set difference db.batchinsert(createusersfromemails(newemails)); 

i.e.

  1. find users matching email addresses using $in. make sure return email field query covered (i.e. mongodb looks @ indexes , doesn't have scan documents)

  2. remove emails list in database (simple string operation, fast)

  3. batch insert newly created users (i.e. create list or array of user objects client-side , send them db)

this limits number of round-trips database. since query index convered, very, fast unless ram exhausted , index doesn't fit ram anymore.

it's wise use unique index on email address , allow batch insert complete if individual inserts fail in case signed in between or there thread running code.

the number of elements $in query shouldn't high, somewhere 1,000 10,000 rule of thumb.


Comments

Popular posts from this blog

java.util.scanner - How to read and add only numbers to array from a text file -

rewrite - Trouble with Wordpress multiple custom querystrings -