MapD Core cannot execute more than one sql at the same time?


#1

When I opend two mapdql and execute two sqls, it seems the second sql cannot be parellel processing with the first one.Is this right with MapD Core?


#2

Not at all; I executed several sql statement when i benchmarked the product and the wall clock time of the scripts I used was lower than a sum of serial executions so statements execututuon was overlapped


#3

@bladedk contrary to what @aznable said, this is actually true, and is alluded to in the docs:

http://docs-hoarder.mapd.com/latest/getting-started/performance/?highlight=single#parallel-gpus

“Parsing, optimization, and parts of rendering can overlap between queries, but most of the execution occurs single file.”

Which means that query parsing/planning can run in parallel, but when it comes time to actually execute a query, only ONE query can execute at a time in a serial fashion. I had several weeks of back and forth with MapD support over this very point, and not only did they confirm that this is how it works, I conducted extensive testing and verified it is indeed true. Even when running in distributed mode, the cluster as a whole can only execute one query at a time. Obviously in GPU mode the internals of that query execution is massively parralel, but you can still only execute one query at a time.

If you want to support multiple concurrent queries, you need to run multiple instances of MapD with identical data sets, and load balance between them. That’s what we do. In addition, you’ll want to tune your queries very carefully to make sure they have as little latency as possible.

Obviously the fine folks @ MapD can correct me if I’ve misrepresented anything here.


#4

@bploetz you are correct here. We deliberately do not overlap GPU execution as a) in general maximum throughput will be achieved with all computational resources focused on a single query, and b) overlapping can often cause memory thrashing if different queries need different data. This is very similar to the philosophy of VoltDB (albeit for transactional workloads).

For the future we see two key areas of improvement.

A) Query execution gets faster in general so single-file execution becomes even less of an issue because a single query can execute even quicker than today. (We’re fast now but theres still significant room for optimization)

B) Some queries involve significant reduction on CPU. Ideally we could invoke GPU execution of a query while still reducing the last query on CPU, increasing throughput and decreasing latency in certain cases. Doing this would require some refactoring of the executor code.

Hope this helps explain things a bit.