GPU utilization at 0 - 5% during large operation


#1

I’ve been having a lot of problems with MapD recently, and I may have found out why. It doesn’t look like my GPU is getting used at all. I ran a query on a dataset of more than 600 million rows and the GPU never went above 10% in nvidia-smi before crashing with the broken pipe error I posted about here.

I’m really not sure how to begin troubleshooting this, output from nvidia-smi is below, I’ll post as I found out more

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81                 Driver Version: 384.81                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    Off  | 00000000:01:00.0  On |                  N/A |
| N/A   44C    P8    10W /  N/A |   3266MiB /  8105MiB |      7%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2976    C+G   /opt/mapd/bin/mapd_server                   2754MiB |
|    0     13211      G   ...-token=C98144F924CBD22A966EAB49D47FAC9B    58MiB |
|    0     13230      G   /usr/lib/xorg/Xorg                           297MiB |
|    0     14014      G   compiz                                       152MiB |
+-----------------------------------------------------------------------------+

#2

The output is because mapd isn’t using the gpu al all. Take a look to the state (powerstate p8 is idle, so you are using the 7% of a gpu consuming 10w capable of consuming 150w).
So from your previous post I guess you have disabled watchdog and you are doing operations like massive projections that are weighting on cpu and system memory/storage . Am I wrong?


#3

Hi,

As @aznable mentioned its likely your query is going to CPU not GPU

Please include your mapd_server.INFo log for review if you want to get to bottom of issue more quickly.

regards


#4

Sure, luckily I still have the server logs backed up for this issue

Log file created at: 2017/10/10 14:22:18
Running on machine: beast
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I1010 14:22:18.923414 13057 MapDServer.cpp:602] MapD started with data directory at '/var/lib/mapd/data'
I1010 14:22:18.923781 13057 MapDServer.cpp:609]  Watchdog is set to 1
I1010 14:22:18.923786 13057 MapDServer.cpp:631]  cuda block size 0
I1010 14:22:18.923789 13057 MapDServer.cpp:632]  cuda grid size  0
I1010 14:22:18.923790 13057 MapDServer.cpp:633]  calcite JVM max memory  1024
I1010 14:22:18.923792 13057 MapDServer.cpp:634]  MapD Server Port  9091
I1010 14:22:18.923794 13057 MapDServer.cpp:635]  MapD Calcite Port  9093
I1010 14:22:18.923825 13057 MapDHandler.cpp:152] MapD Server 3.2.3-20170922-fbc71bb
I1010 14:22:19.037272 13057 CudaMgr.cpp:127] Using 1 Gpus.
I1010 14:22:19.037317 13057 DataMgr.cpp:120] cpuSlabSize is 4096M
I1010 14:22:19.037328 13057 DataMgr.cpp:122] reserved GPU memory is 604.837M includes render buffer allocation
I1010 14:22:19.037708 13057 DataMgr.cpp:132] gpuSlabSize is 2048M
I1010 14:22:19.037974 13057 FileMgr.cpp:116] Read table metadata, Epoch is 0 for table data at '/var/lib/mapd/data/mapd_data/table_0_0/'
I1010 14:22:19.038203 13057 Calcite.cpp:156] Creating Calcite Handler,  Calcite Port is 9093 base data dir is /var/lib/mapd/data
I1010 14:22:19.038209 13057 Calcite.cpp:95] Running calcite server as a daemon
I1010 14:22:19.358710 13057 Calcite.cpp:124] Calcite server start took 300 ms 
I1010 14:22:19.358733 13057 Calcite.cpp:125] ping took 11 ms 
I1010 14:22:19.363212 13057 Calcite.cpp:281] [{"name":"Tan","ret":"double","args":["double"]},{"name":"Truncate__","ret":"float","args":["float","i32"]},{"name":"ln","ret":"double","args":["double"]},{"name":"Floor__2","ret":"i32","args":["i32"]},{"name":"Floor__3","ret":"i64","args":["i64"]},{"name":"rect_pixel_bin","ret":"float","args":["float","float","float","i32","i32"]},{"name":"Floor__1","ret":"i16","args":["i16"]},{"name":"degrees","ret":"double","args":["double"]},{"name":"approx_distance_in_meters","ret":"double","args":["float","float","float","float"]},{"name":"Ceil__2","ret":"i32","args":["i32"]},{"name":"Ceil__1","ret":"i16","args":["i16"]},{"name":"Ceil__3","ret":"i64","args":["i64"]},{"name":"Log10","ret":"double","args":["double"]},{"name":"Log","ret":"double","args":["double"]},{"name":"distance_in_meters","ret":"double","args":["double","double","double","double"]},{"name":"round_to_digit","ret":"double","args":["double","i32"]},{"name":"Atan2","ret":"double","args":["double","double"]},{"name":"Acos","ret":"double","args":["double"]},{"name":"Sin","ret":"double","args":["double"]},{"name":"Ceil","ret":"double","args":["double"]},{"name":"distance_in_meters__","ret":"double","args":["float","float","float","float"]},{"name":"Floor__","ret":"float","args":["float"]},{"name":"Truncate","ret":"double","args":["double","i32"]},{"name":"radians","ret":"double","args":["double"]},{"name":"Ceil__","ret":"float","args":["float"]},{"name":"ln__","ret":"double","args":["float"]},{"name":"reg_hex_vert_pixel_bin_y","ret":"float","args":["float","float","float","float","float","float","float","float","float","float","i32","i32"]},{"name":"reg_hex_vert_pixel_bin_x","ret":"float","args":["float","float","float","float","float","float","float","float","float","float","i32","i32"]},{"name":"Round","ret":"double","args":["double"]},{"name":"power","ret":"double","args":["double","double"]},{"name":"conv_4326_900913_x","ret":"double","args":["double"]},{"name":"reg_hex_horiz_pixel_bin_y","ret":"float","args":["float","float","float","float","float","float","float","float","float","float","i32","i32"]},{"name":"conv_4326_900913_y","ret":"double","args":["double"]},{"name":"Atan","ret":"double","args":["double"]},{"name":"reg_hex_horiz_pixel_bin_x","ret":"float","args":["float","float","float","float","float","float","float","float","float","float","i32","i32"]},{"name":"Floor","ret":"double","args":["double"]},{"name":"Truncate__1","ret":"i16","args":["i16","i32"]},{"name":"Truncate__2","ret":"i32","args":["i32","i32"]},{"name":"Truncate__3","ret":"i64","args":["i64","i32"]},{"name":"Cos","ret":"double","args":["double"]},{"name":"Log__","ret":"double","args":["float"]},{"name":"Log10__","ret":"double","args":["float"]},{"name":"Asin","ret":"double","args":["double"]},{"name":"Cot","ret":"double","args":["double"]},{"name":"Tan__","ret":"double","args":["float"]},{"name":"rect_pixel_bin_x","ret":"float","args":["float","float","float","float","float","i32"]},{"name":"rect_pixel_bin_y","ret":"float","args":["float","float","float","float","float","i32"]},{"name":"pi","ret":"double","args":[]},{"name":"Exp","ret":"double","args":["double"]}]
I1010 14:22:19.363812 13057 MapDHandler.cpp:197] Started in GPU mode
I1010 14:22:19.399488 13057 EglGLWindow.cpp:107] Window Setting: DRAWABLE_TYPE: PBUFFER.
I1010 14:22:19.399602 13057 EglUtils.cpp:61] EGL Setting: BITS_RGBA = 8.
I1010 14:22:19.399633 13057 EglUtils.cpp:84] EGL Setting: BITS_ALPHA = 8.
I1010 14:22:19.399698 13057 EglGLRenderer.cpp:101] Renderer Setting: USE_CORE_PROFILE: True.
I1010 14:22:19.438513 13057 EglGLRenderer.cpp:170] Renderer Setting: <OPENGL_MAJOR>.<OPENGL_MINOR>: 4.5.
I1010 14:22:19.496145 13057 QueryRenderManager.cpp:306] QueryRenderManager initialized for rendering. start GPU: 0, num GPUs: 1, Render cache limit: 500
I1010 14:22:32.417233 13112 MapDHandler.cpp:349] User mapd connected to database mapd
I1010 14:22:32.419075 13112 MapDHandler.cpp:561] sql_execute :VOiw3Ogzmh29nC7PP0V8KAzcYseA2FMo:query_str:CREATE TABLE tx_out_small AS (SELECT address, txid FROM tx_out);
I1010 14:22:32.419174 13112 MapDHandler.cpp:2908] passing query to legacy processor
I1010 14:22:32.426645 13112 Calcite.cpp:247] User mapd catalog mapd sql '(SELECT address, txid FROM tx_out);'
I1010 14:22:32.956732 13106 FileMgr.cpp:116] Read table metadata, Epoch is 1120 for table data at '/var/lib/mapd/data/mapd_data/table_1_21/'

#5

Hi,

Your log appears truncated but from what i can see the query being executed was a simple project with no filters SELECT address, txid FROM tx_out which would not use the GPU at all, as there is no need to pass this work over. The GPU will only be used in general when there are calculated columns, filters or aggregates.

regards