+2 votes
by
The task is to process the raw data statistics.
Requests simple aggregate - Select SUM/AVG from a group by (AGE,SEX,DAY,SOURCE). (usually in the group by parameters 10-20 staging). Aggregated data are placed in a separate table and have it done the search with the WHERE in which the same 10-20 parameters.

Now all this case deals with mongodb(aggregation framework), don't like the performance. (Indexes are all in memory like, to optimize mongo is clearly not much more)
Maybe there's a database over sharpened under such tasks?

2 Answers

0 votes
by
Alternatively, you can use Impala or Hive on Tez on Hadoop cluster. Scalability will be 100%, same CDH or HDP fairly easy unfolding.
If You have a lot of money and CPU no problem, you can use Spark SQL on top of the same Hive.
+3 votes
by
ElasticSearch
https://www.elastic.co/guide/en/elasticsearch/refe...
https://www.elastic.co/guide/en/elasticsearch/refe...

From my own experience I will say that is a very good job. Now there is a small cluster 300+ GB statistics event, all works very fast.

Here are a few links in order to avoid common mistakes in the cluster configuration.
radar.oreilly.com/2015/04/10-elasticsearch-metrics...
https://www.loggly.com/blog/nine-tips-configuring-...
https://www.elastic.co/blog/found-optimizing-elast...

Just stumbled on the stone, described in this article:
https://www.elastic.co/blog/support-in-the-wild-my...
When you configure the mapping index indicates the option to not analyze fields:
"doc_values" : true
...