aggregation<\/a> framework to search and return the duplicates. It\u2019s really quite slick.<\/p>\nFor example, in the question, Wall <\/strong>documents had a field called event_time<\/strong>. Here\u2019s one approach:<\/p>\ndb.Wall.aggregate([\n {$group : { _id: "$event_time" , count : { $sum: 1}}},\n {$match : { count : { $gt : 1 } }} ])<\/code><\/pre>\nThe trick is to use the $group pipeline operator to select and count each unique event_time. Then, match on only those groups that contained more than one match. <\/p>\n
While it\u2019s not necessarily as readable as the equivalent SQL statement potentially, it\u2019s still easy to read. The only really odd thing is the mapping of the event_time<\/strong> into the _id<\/strong>. As all documents pass through the pipeline, the event_time<\/strong> is used as the new aggregate document key. The $ sign is used as the field reference to a property of the document in the pipeline (a Wall<\/strong> document). Remember that the _id<\/strong> field of a MongoDB document must be unique (and this is how the $group pipeline operator does its magic).<\/p>\nSo, if the following event_time<\/strong>s were in the documents:<\/p>\n\n\n\nevent_time<\/strong><\/td>\n<\/tr>\n\n4:00am<\/td>\n<\/tr>\n | \n5:00am<\/td>\n<\/tr>\n | \n4:00am<\/td>\n<\/tr>\n | \n6:00pm<\/td>\n<\/tr>\n | \n7:00a<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n It would results in a aggregate set of documents:<\/p>\n \n\n\n_id<\/strong><\/td>\ncount<\/strong><\/td>\n<\/tr>\n\n4:00am<\/td>\n | 2<\/td>\n<\/tr>\n | \n5:00am<\/td>\n | 1<\/td>\n<\/tr>\n | \n6:00pm<\/td>\n | 1<\/td>\n<\/tr>\n | \n7:00am<\/td>\n | 1<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n Notice how the _id is the event_time. <\/strong>The aggregate results would look like this:<\/p>\n{\n "result"<\/span> : [\n {\n "_id"<\/span> : "4:00am"<\/span>,\n "count"<\/span> : 2\n }\n ],\n "ok"<\/span> : 1\n}<\/pre>\n | | | | |