Why the utilization of cpu is so low?


#1

Hi Mapd experts,

I have a confusion about the cpu utilization of mapd. I generated 30G data by TPC-H dbgen tool, and run Q1 sql both in cpu mode and gpu mode, I notice a strange phenomenon:
In cpu mode, the cpu usage rate is about 600% (my machine cpu is Intel E5-2683 with 14 cores, full usage rate should be 2800%)
TIme cost of cpu mode is about 3.2 seconds. And gpu mode is about 0.6 second.

My machine hardware status:
CPU: 2xIntel E5-2683
MEM:128G
Video cards:4xGeForce GTX1080, each card has 8G memory installed

My question is, why the utilization of cpu is so low in mapd? Ideally, if all the cpu cores are fully used, the performance of cpu mode should be equals to gpu mode, right?

Thanks,
Mike


#2

HI Mike

Thanks for taking the time to look at MapD.

MapD on CPU attempts to divide its work up evenly between all CPU, so on a machine like yours with a E5-2683, MapD would by default try to use 28 threads.

BUT… there is a caveat in that in there is also a concept in MapD of a Fragment of data. A fragment of data is the smallest size work is parceled out in. By default each fragment is 32M.

So extrapolating from the info you gave, TPC-H query1 works on the table lineitem at SF30 (to generate a 30GB dataset) I believe there is about 180M rows in that table. Which by my math is only 6 Fragments (180M/32M) = 5.625 rounding up to 6.

So this mean we can only allocate 6 CPU’s to work on the task. Which aligns to the 600% you were observing.

When you have a setup where you intend to work on CPU and work with small dataset sizes you would need to specify the FRAGMENT_SIZE when creating the table. So in your case if you really want to utilize all of your CPU’s on a 180M dataset you would have to set the FRAGMENT_SIZE to (180/28) = 6.5M. This will allow MapD to use all your cores.

see: http://docs.mapd.com/latest/mapd-core-guide/tables/?highlight=fragment_size

regards


Changing fragment size
#3

Hi dwayneberry,

The information is helpful for us. Thanks a lot.
Accordding to your suggestion, I re-import 30G data and execute TPC-H Q1 query both under CPU mode and GPU mode. I got some statistics:


From the statistics table, we can see the fragment_size can affect mapd performance. The most proper fragment_size for cpu mode is row_count/cpu_core_count. But for GPU mode, I am not sure what fragment_size can obtain best performance, do you have any suggestion?

Another concern is the performance gap between cpu mode and gpu mode is quite small than my expection. The cpu mode has only 56 work threads while gpu can launch 10 thousand threads for computing, why the gpu mode is only two times fast than cpu mode?

Thanks,
Mike


#4

Hi @mikedimsf, Given that you have four GPUs you would likely get the best performance on the GPU by setting the fragment_size to be the dataset size in rows divided by the number of GPUs. So if there are 180M rows and 4 GPUs then 180M/4 = 45M.

There’s always tuning that can be done on GPU to make it faster but I’d point out that you are comparing two high-end enterprise CPUs with 4 mid-range gamer GPUs. If using gamer cards I’d suggest using Nvidia 1080 ti or Titan X cards as they have significantly more cores and memory bandwidth. And if you can use 8 that will be even faster.