GPU utilization at 0 - 1% during large operation


#1

My computer GPU utilization at 0 - 1% during large operation.
gpu P40, 8 core. pre core use memory is 4GB. data rows is 500 million,
sql is : select event_id,sum(pv)as ppv,sum(uv) as uuv from (select dt, event_id,city, count(*) as pv, count(distinct device_id) as uv from product_unite_wide where dt>=‘2017-12-01’ and dt<=‘2017-12-07’ group by dt,city,event_id ) a group by event_id;
cost time : 58s, Not first execute need to load data. Is it normal? why cost so long time, but gpu is idle.
I use 50 node facebook presto to execute the sql , only cost 6s. Is there something i set gpu wrong?
can some guy help me ?


#2

@llz i see that @aznable is offering suggestions over in GPU utilization at 0 - 5% during large operation and I believe his first idea is a good one.


#3

@easy, I have check it. It has used GPU. Execution time: 43328 ms, Total time: 49043 ms
MapD Server CPU Memory Summary:
MAX USE ALLOCATED FREE
206140.09 MB 26142.83 MB 28672.00 MB 2529.17 MB

MapD Server GPU Memory Summary:
[GPU] MAX USE ALLOCATED FREE
[0] 22308.10 MB 3755.57 MB 6144.00 MB 2388.43 MB
[1] 22308.10 MB 3353.16 MB 4096.00 MB 742.84 MB
[2] 22308.10 MB 3337.22 MB 4096.00 MB 758.78 MB
[3] 22308.10 MB 3340.15 MB 4096.00 MB 755.85 MB
[4] 22308.10 MB 3351.90 MB 4096.00 MB 744.10 MB
[5] 22308.10 MB 3335.12 MB 4096.00 MB 760.88 MB
[6] 22308.10 MB 2685.55 MB 4096.00 MB 1410.45 MB
[7] 22308.10 MB 2629.40 MB 4096.00 MB 1466.60 MB


Fri Jun 15 10:18:22 2018
±----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48 Driver Version: 367.48 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Graphics Device Off | 0000:04:00.0 Off | 0 |
| N/A 29C P0 50W / 250W | 6333MiB / 22912MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Graphics Device Off | 0000:05:00.0 Off | 0 |
| N/A 29C P0 50W / 250W | 4285MiB / 22912MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 Graphics Device Off | 0000:08:00.0 Off | 0 |
| N/A 30C P0 50W / 250W | 4285MiB / 22912MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 3 Graphics Device Off | 0000:09:00.0 Off | 0 |
| N/A 31C P0 50W / 250W | 4285MiB / 22912MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 4 Graphics Device Off | 0000:85:00.0 Off | 0 |
| N/A 29C P0 49W / 250W | 4285MiB / 22912MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 5 Graphics Device Off | 0000:86:00.0 Off | 0 |
| N/A 28C P0 50W / 250W | 4285MiB / 22912MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 6 Graphics Device Off | 0000:89:00.0 Off | 0 |
| N/A 28C P0 49W / 250W | 4285MiB / 22912MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 7 Graphics Device Off | 0000:8A:00.0 Off | 0 |
| N/A 31C P0 50W / 250W | 4285MiB / 22912MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 435491 C /data/mapd/mapd/bin/mapd_server 6331MiB |
| 1 435491 C /data/mapd/mapd/bin/mapd_server 4283MiB |
| 2 435491 C /data/mapd/mapd/bin/mapd_server 4283MiB |
| 3 435491 C /data/mapd/mapd/bin/mapd_server 4283MiB |
| 4 435491 C /data/mapd/mapd/bin/mapd_server 4283MiB |
| 5 435491 C /data/mapd/mapd/bin/mapd_server 4283MiB |
| 6 435491 C /data/mapd/mapd/bin/mapd_server 4283MiB |
| 7 435491 C /data/mapd/mapd/bin/mapd_server 4283MiB |


GPU utilization at 0 - 5% during large operation
#4
The sql execute  plane is :
mapdql> explain select event_id,sum(pv)as ppv,sum(uv) as uuv from (select dt, event_id,city, count(*) as pv, count(distinct device_id) as uv from product_unite_wide where dt>='2017-12-01' and dt<='2017-12-07' group by dt,city,event_id ) a group by event_id;
Explanation
IR for the GPU:
===============

; Function Attrs: uwtable
define void @query_group_by_template(i8** nocapture readnone %byte_stream, i8* nocapture readonly %literals, i64* nocapture readnone %row_count_ptr, i64* nocapture readonly %frag_row_off_ptr, i32* %max_matched_ptr, i64* %agg_init_val, i64** %group_by_buffers, i64** %small_groups_buffer, i32 %frag_idx, i64* %join_hash_tables, i32* %total_matched, i32* %error_code) #21 {
.entry:
  %0 = getelementptr i8*, i8** %byte_stream, i32 0
  %1 = load i8*, i8** %0
  %2 = getelementptr i8*, i8** %byte_stream, i32 1
  %3 = load i8*, i8** %2
  %4 = getelementptr i8*, i8** %byte_stream, i32 2
  %5 = load i8*, i8** %4
  %6 = getelementptr i8*, i8** %byte_stream, i32 3
  %7 = load i8*, i8** %6
  %8 = load i64, i64* %row_count_ptr, align 8
  %9 = load i32, i32* %max_matched_ptr, align 4
  %crt_matched = alloca i32
  %old_total_matched = alloca i32
  %10 = call i32 @pos_start_impl(i32* %error_code)
  %11 = call i32 @pos_step_impl()
  %12 = call i32 @group_buff_idx_impl()
  %13 = sext i32 %10 to i64
  %14 = getelementptr i64*, i64** %group_by_buffers, i32 %12
  %15 = load i64*, i64** %14, align 8
  %16 = call i64* @init_shared_mem_nop(i64* %15, i32 0)
  %17 = icmp slt i64 %13, %8
  br i1 %17, label %.loop.preheader, label %.exit

.loop.preheader:                                  ; preds = %.entry
  %18 = sext i32 %11 to i64
  br label %.forbody

.forbody:                                         ; preds = %.forbody, %.loop.preheader
  %pos = phi i64 [ %13, %.loop.preheader ], [ %20, %.forbody ]
  %19 = call i32 @row_func(i64* %16, i64* null, i32* %crt_matched, i32* %total_matched, i32* %old_total_matched, i64* %agg_init_val, i64 %pos, i64* %frag_row_off_ptr, i64* %row_count_ptr, i8* %literals, i8* %1, i8* %3, i8* %5, i8* %7, i64* %join_hash_tables)
  %20 = add i64 %pos, %18
  %21 = icmp slt i64 %20, %8
  br i1 %21, label %.forbody, label %._crit_edge

._crit_edge:                                      ; preds = %.forbody
  br label %.exit

.exit:                                            ; preds = %._crit_edge, %.entry
  call void @write_back_nop(i64* %15, i64* %16, i32 0)
  ret void
}

; Function Attrs: alwaysinline
define i32 @row_func(i64* %group_by_buff, i64* %small_group_by_buff, i32* %crt_match, i32* %total_matched, i32* %old_total_matched, i64* %agg_init_val, i64 %pos, i64* %frag_row_off, i64* %num_rows_per_scan, i8* %literals, i8* %col_buf0, i8* %col_buf1, i8* %col_buf2, i8* %col_buf3, i64* %join_hash_tables) #22 {
entry:
  %0 = load i64, i64* %frag_row_off
  %1 = call i64 @fixed_width_int_decode(i8* %col_buf3, i32 4, i64 %pos)
  %2 = trunc i64 %1 to i32
  %3 = getelementptr i8, i8* %literals, i16 0
  %4 = bitcast i8* %3 to i64*
  %5 = load i64, i64* %4
  %6 = sext i32 %2 to i64
  %7 = call i8 @bit_is_set(i64 %5, i64 %6, i64 0, i64 8, i64 -2147483648, i8 -128)
  %8 = icmp sgt i8 %7, 0
  %9 = and i1 true, %8
  %10 = getelementptr i8, i8* %literals, i16 8
  %11 = bitcast i8* %10 to i64*
  %12 = load i64, i64* %11
  %13 = sext i32 %2 to i64
  %14 = call i8 @bit_is_set(i64 %12, i64 %13, i64 0, i64 8, i64 -2147483648, i8 -128)
  %15 = icmp sgt i8 %14, 0
  %16 = and i1 %9, %15
  br i1 %16, label %filter_true, label %filter_false

filter_true:                                      ; preds = %entry
  %17 = alloca i64, i32 3
  %18 = sext i32 %2 to i64
  %19 = getelementptr i64, i64* %17, i32 0
  store i64 %18, i64* %19
  %20 = call i64 @fixed_width_int_decode(i8* %col_buf2, i32 4, i64 %pos)
  %21 = trunc i64 %20 to i32
  %22 = call i64 @translate_null_key_int32_t(i32 %21, i32 -2147483648, i32 393)
  %23 = sext i32 %21 to i64
  %24 = getelementptr i64, i64* %17, i32 1
  store i64 %22, i64* %24
  %25 = call i64 @fixed_width_int_decode(i8* %col_buf1, i32 4, i64 %pos)
  %26 = trunc i64 %25 to i32
  %27 = sext i32 %26 to i64
  %28 = getelementptr i64, i64* %17, i32 2
  store i64 %27, i64* %28
  %29 = call i32 @perfect_key_hash(i64* %17)
  %30 = call i64* @get_matching_group_value_perfect_hash(i64* %group_by_buff, i32 %29, i64* %17, i32 3, i32 6)
  %31 = getelementptr i64, i64* %30, i32 0
  %32 = sext i32 %26 to i64
  call void @agg_id_shared(i64* %31, i64 %32)
  %33 = getelementptr i64, i64* %30, i32 1
  %34 = bitcast i64* %33 to i32*
  %35 = atomicrmw add i32* %34, i32 1 monotonic
  %36 = call i64 @fixed_width_int_decode(i8* %col_buf0, i32 4, i64 %pos)
  %37 = trunc i64 %36 to i32
  %38 = getelementptr i64, i64* %30, i32 2
  %39 = sext i32 %37 to i64
  %40 = bitcast i8* %literals to i64*
  %41 = getelementptr i64, i64* %40, i32 -1
  %42 = load i64, i64* %41
  %43 = bitcast i8* %literals to i64*
  %44 = getelementptr i64, i64* %43, i32 -2
  %45 = load i64, i64* %44
  call void @agg_count_distinct_bitmap_skip_val_gpu(i64* %38, i64 %39, i64 0, i64 -2147483648, i64 %42, i64 %45, i64 1, i64 27832)
  br label %filter_false

filter_false:                                     ; preds = %filter_true, %entry
  ret i32 0
}

#5

When i use \cpu, switch to cpu mode to execute the sql. it cost the same time.
Execution time: 44165 ms, Total time: 49736 ms
Avoid cache, I change the sql dt<=‘2017-12-07’ to dt<=‘2017-12-06’.
Execution time: 40840 ms, Total time: 45909 ms, cost time not big change.
CPU model and GPU model, almost same cost time.
Why?

I only find a error at the start mapd_server.
E0614 10:44:17.655951 509009 QueryRenderManager.cpp:448] There’s a synchronization issue between CUDA/OpenGL devices
E0614 10:44:17.658412 509009 MapDHandler.cpp:209] Backend rendering disabled: /home/jenkins-slave/workspace/mapd2-multi/compiler/gcc/gpu/cuda/host/centos/render/render/QueryRenderer/QueryRenderManager.cpp:448 There’s a synchronization issue between CUDA/OpenGL devices


#6

I have find some thing very interesting in the info log.

I0615 11:44:40.157446 496572 MapDHandler.cpp:574] sql_execute :noMnwBHIq8F8c73wzMByZxq0va2Bi5Ul:query_str:select event_id,sum(pv)as ppv,sum(uv) as uuv from (select dt, event_id,city, count() as pv, count(distinct device_id) as uv from product_unite_wide where dt>=‘2017-12-01’ and dt<=‘2017-12-05’ group by dt,city,event_id ) a group by event_id;
I0615 11:44:40.157991 496572 Calcite.cpp:277] User mapd catalog test sql 'select event_id,sum(pv)as ppv,sum(uv) as uuv from (select dt, event_id,city, count(
) as pv, count(distinct device_id) as uv from product_unite_wide where dt>=‘2017-12-01’ and dt<=‘2017-12-05’ group by dt,city,event_id ) a group by event_id;’
I0615 11:44:40.163408 496572 Calcite.cpp:290] Time in Thrift 0 (ms), Time in Java Calcite server 5 (ms)
I0615 11:44:40.163717 496572 RelAlgOptimizer.cpp:302] ID=5 (RelProject<139930978623648> (RexInput 0 139930978875648) (RexInput 2 139930978875648) (RexInput 1 139930978875648) (RexInput 3 139930978875648) (RexInput 4 139930978875648)) deleted!
I0615 11:44:40.173300 4400 BufferMgr.cpp:39] OOM trace:
runImpl:288
alloc_gpu_mem:33 : device_id 4, num_bytes 6316305408
I0615 11:44:40.173301 4398 BufferMgr.cpp:39] OOM trace:
runImpl:288
alloc_gpu_mem:33 : device_id 6, num_bytes 6316305408
I0615 11:44:40.173301 4399 BufferMgr.cpp:39] OOM trace:
runImpl:288
alloc_gpu_mem:33 : device_id 5, num_bytes 6316305408
I0615 11:44:40.173388 4397 BufferMgr.cpp:39] OOM trace:
runImpl:288
alloc_gpu_mem:33 : device_id 7, num_bytes 6316305408
E0615 11:44:40.173533 4400 ExecutionDispatch.cpp:534] SlabTooBig
E0615 11:44:40.173589 4399 ExecutionDispatch.cpp:534] SlabTooBig
E0615 11:44:40.173641 4398 ExecutionDispatch.cpp:534] SlabTooBig
E0615 11:44:40.173688 4397 ExecutionDispatch.cpp:534] SlabTooBig
I0615 11:44:40.173810 4401 BufferMgr.cpp:39] OOM trace:
runImpl:288
alloc_gpu_mem:33 : device_id 0, num_bytes 6316305408
I0615 11:44:40.173871 4402 BufferMgr.cpp:39] OOM trace:
runImpl:288
alloc_gpu_mem:33 : device_id 1, num_bytes 6316305408
I0615 11:44:40.173890 4403 BufferMgr.cpp:39] OOM trace:
runImpl:288
alloc_gpu_mem:33 : device_id 2, num_bytes 6316305408
E0615 11:44:40.173913 4401 ExecutionDispatch.cpp:534] SlabTooBig
I0615 11:44:40.173925 4404 BufferMgr.cpp:39] OOM trace:
runImpl:288
alloc_gpu_mem:33 : device_id 3, num_bytes 6316305408
E0615 11:44:40.174031 4403 ExecutionDispatch.cpp:534] SlabTooBig
E0615 11:44:40.174119 4404 ExecutionDispatch.cpp:534] SlabTooBig
E0615 11:44:40.174161 4402 ExecutionDispatch.cpp:534] SlabTooBig
I0615 11:44:40.181550 4405 BufferMgr.cpp:39] OOM trace:
runImpl:260
alloc_gpu_mem:33 : device_id 0, num_bytes 6316305408
I0615 11:44:40.181556 4406 BufferMgr.cpp:39] OOM trace:
runImpl:260
alloc_gpu_mem:33 : device_id 1, num_bytes 6316305408
I0615 11:44:40.181632 4409 BufferMgr.cpp:39] OOM trace:
runImpl:260
alloc_gpu_mem:33 : device_id 4, num_bytes 6316305408
I0615 11:44:40.181569 4408 BufferMgr.cpp:39] OOM trace:
runImpl:260
alloc_gpu_mem:33 : device_id 3, num_bytes 6316305408
E0615 11:44:40.181646 4405 ExecutionDispatch.cpp:534] SlabTooBig
I0615 11:44:40.181565 4407 BufferMgr.cpp:39] OOM trace:
runImpl:260
alloc_gpu_mem:33 : device_id 2, num_bytes 6316305408
I0615 11:44:40.181633 4411 BufferMgr.cpp:39] OOM trace:
runImpl:260
alloc_gpu_mem:33 : device_id 6, num_bytes 6316305408
I0615 11:44:40.181635 4410 BufferMgr.cpp:39] OOM trace:
runImpl:260
alloc_gpu_mem:33 : device_id 5, num_bytes 6316305408
I0615 11:44:40.181653 4412 BufferMgr.cpp:39] OOM trace:
runImpl:260
alloc_gpu_mem:33 : device_id 7, num_bytes 6316305408
I0615 11:44:40.181687 4413 BufferMgr.cpp:39] OOM trace:
runImpl:260
alloc_gpu_mem:33 : device_id 0, num_bytes 6316305408
I0615 11:44:40.181773 4414 BufferMgr.cpp:39] OOM trace:
runImpl:260
alloc_gpu_mem:33 : device_id 1, num_bytes 6316305408
E0615 11:44:40.181824 4406 ExecutionDispatch.cpp:534] SlabTooBig
I0615 11:44:40.181898 4417 BufferMgr.cpp:39] OOM trace:
runImpl:260
alloc_gpu_mem:33 : device_id 4, num_bytes 6316305408
I0615 11:44:40.181944 4415 BufferMgr.cpp:39] OOM trace:
runImpl:260
alloc_gpu_mem:33 : device_id 2, num_bytes 6316305408
I0615 11:44:40.182025 4416 BufferMgr.cpp:39] OOM trace:
runImpl:260
alloc_gpu_mem:33 : device_id 3, num_bytes 6316305408
I0615 11:44:40.182117 4419 BufferMgr.cpp:39] OOM trace:
runImpl:260
alloc_gpu_mem:33 : device_id 6, num_bytes 6316305408
E0615 11:44:40.182229 4409 ExecutionDispatch.cpp:534] SlabTooBig
E0615 11:44:40.182423 4415 ExecutionDispatch.cpp:534] SlabTooBig
I0615 11:44:40.182333 4420 BufferMgr.cpp:39] OOM trace:
runImpl:260
alloc_gpu_mem:33 : device_id 7, num_bytes 6316305408
E0615 11:44:40.182461 4407 ExecutionDispatch.cpp:534] SlabTooBig
I0615 11:44:40.182235 4418 BufferMgr.cpp:39] OOM trace:
runImpl:260
alloc_gpu_mem:33 : device_id 5, num_bytes 6316305408
E0615 11:44:40.182518 4419 ExecutionDispatch.cpp:534] SlabTooBig
E0615 11:44:40.182576 4410 ExecutionDispatch.cpp:534] SlabTooBig
E0615 11:44:40.182633 4412 ExecutionDispatch.cpp:534] SlabTooBig
E0615 11:44:40.182667 4413 ExecutionDispatch.cpp:534] SlabTooBig
E0615 11:44:40.182703 4414 ExecutionDispatch.cpp:534] SlabTooBig
E0615 11:44:40.182742 4417 ExecutionDispatch.cpp:534] SlabTooBig
E0615 11:44:40.182787 4408 ExecutionDispatch.cpp:534] SlabTooBig
E0615 11:44:40.182821 4411 ExecutionDispatch.cpp:534] SlabTooBig
E0615 11:44:40.182862 4420 ExecutionDispatch.cpp:534] SlabTooBig
E0615 11:44:40.182922 4416 ExecutionDispatch.cpp:534] SlabTooBig
E0615 11:44:40.183007 4418 ExecutionDispatch.cpp:534] SlabTooBig
I0615 11:44:40.183305 496572 RelAlgExecutor.cpp:1635] Query ran out of GPU memory, punt to CPU
I0615 11:45:24.000342 496572 MapDHandler.cpp:600] sql_execute-COMPLETED Total: 43842 (ms), Execution: 38710 (ms)

How can i fixed it ?


#7

so it’s obivious MAPD is falling back to CPU for execution of this query, and i guess is doing that because of the combination of a high cardinality goup by and the code path the software i taking to compute the distinct on device_id.

is there a special reson because the query cant be rewritten in this way?

select event_id,count() as ppv ,count(distinct product_id) as uuv
from product_unite_wide
where dt>=‘2017-12-01’ and dt<=‘2017-12-07’
group by event_id;

without knowing the cardinalities, the datatypes and the values on columns i cannot give you more tha generic reccomendetions


#8

Thanks @aznable, If i change the sql to:
select event_id,sum(pv)as ppv,sum(uv) as uuv from (select dt, event_id, count(*) as pv, count(distinct device_id) as uv from product_unite_wide where dt>=‘2017-12-01’ and dt<=‘2017-12-07’ group by dt,event_id ) a group by event_id;
Del the field city , it work ok ,can run on the GPU.
But i need some combination of a high cardinality goup sql, Can I change the default gpuSlabSize = 2G to gpuSlabSize = 8G to support it, and how to change it.
I have 8 core titanx, have gpu ram 22*8 ≈160G , how can i use it hight efficiency?


#9

is expected because you are lowering the cardinality of inner query so less groups, less the memory.

anyways is there a particular reasons forcing you to use this kind of inner query? it looks useless.

i dont know if there is a way to change the slab size with a configuration parameter (i guess there isnt); maybe you can change it modifying the souce code of Open Source mapd-core and compiling it, but i dont know if doing that will resolve your problem.

if you gave me datatypes, min/max values and distinct values of columns involved in inner query, i can try to reproduce the problem on my system and find a workaround


#10

Hi

@llz those are some interesting titan X’s you have there.

As @aznable has asked it will require a little more information from you for us to be able to help get this particular query running on your setup. We need to get some of the cardinalities of what the result set size is going to be.

As an aside, and as previously mentioned, your query

select
event_id,sum(pv)as ppv,sum(uv) as uuv
from
(
   select
   dt,
   event_id,
   city,
   count() as pv,
   count(distinct device_id) as uv
   from product_unite_wide
   where dt>=‘2017-12-01’
   and dt<=‘2017-12-05’
   group by dt,
   city,
   event_id
)
a
group by event_id
;

Doesn’t seem to need the inner subquery to get the result you need.

MapD has an approximate count distinct function called approx_count_distinct(column), this will return an approximate value for the count distinct but takes a significantly smaller memory footprint to do the query.

You can substitute this directly where you are using count(distinct column)

Assuming column dt is a DATE column. Is it possible to get the cardinality of city and event_id to see if any of these suggestions are going to help.

As far as increasing slab_size, it is not an exposed parameter. There is a current issue (https://github.com/mapd/mapd-core/issues/127) for this. We are looking at this internally.

Assuming the cardinalities are reasonable and you can not accept an approximation for the count distinct there is potentially scope to shard your DB across city or device_id and run the query sharded. This reduces the memory requirement per GPU by the number of GPU’s (in your example your shard size would be 8)

regards


#11

Thanks @dwayneberry,
I have try approx_count_distinct (column), It still back to cpu run, you are right, it run fast than before. I will try set small fragment_size.
I have other problem. I own a table, it has 600M rows and 70 columns. when I select *, it take very long time.
sql:
`select * from product_unite_wide where dt= ‘2017-12-06’ and msg_id = ‘20f93b9b-a2e9-424d-9dd0-41177f969cf9’;
It will load all dict, and ALLOCATION many cpu ram. The sql has only two field condition and one result. It seem to load all field. like has crash.

I0619 09:53:46.533706 177940 Catalog.cpp:1541] Time to load Dictionary 2_287 was 74ms
I0619 09:53:46.607033 177940 Catalog.cpp:1541] Time to load Dictionary 2_286 was 73ms
I0619 09:53:46.683928 177940 Catalog.cpp:1541] Time to load Dictionary 2_285 was 76ms
I0619 09:53:46.757357 177940 Catalog.cpp:1541] Time to load Dictionary 2_284 was 73ms
I0619 09:53:46.827996 177940 Catalog.cpp:1541] Time to load Dictionary 2_283 was 70ms
I0619 09:53:46.895598 177940 Catalog.cpp:1541] Time to load Dictionary 2_282 was 67ms
I0619 09:53:46.960875 177940 Catalog.cpp:1541] Time to load Dictionary 2_281 was 65ms
I0619 09:53:47.030984 177940 Catalog.cpp:1541] Time to load Dictionary 2_280 was 70ms
I0619 09:53:47.105504 177940 Catalog.cpp:1541] Time to load Dictionary 2_279 was 74ms
I0619 09:53:47.180802 177940 Catalog.cpp:1541] Time to load Dictionary 2_278 was 75ms
I0619 09:53:47.252830 177940 Catalog.cpp:1541] Time to load Dictionary 2_277 was 71ms
I0619 09:53:47.325551 177940 Catalog.cpp:1541] Time to load Dictionary 2_276 was 72ms
I0619 09:53:47.397735 177940 Catalog.cpp:1541] Time to load Dictionary 2_275 was 72ms
I0619 09:53:47.459309 177940 Catalog.cpp:1541] Time to load Dictionary 2_274 was 61ms
I0619 09:53:47.524336 177940 Catalog.cpp:1541] Time to load Dictionary 2_273 was 64ms
I0619 09:53:47.589589 177940 Catalog.cpp:1541] Time to load Dictionary 2_272 was 65ms
I0619 09:53:47.655309 177940 Catalog.cpp:1541] Time to load Dictionary 2_267 was 65ms
I0619 09:56:13.193397 177940 Catalog.cpp:1541] Time to load Dictionary 2_257 was 145537ms
I0619 09:57:18.723071 177940 Catalog.cpp:1541] Time to load Dictionary 2_256 was 65529ms
I0619 09:57:18.814143 177940 Catalog.cpp:1541] Time to load Dictionary 2_253 was 90ms
I0619 09:57:18.888624 177940 Catalog.cpp:1541] Time to load Dictionary 2_254 was 74ms
I0619 09:57:18.979218 177940 Catalog.cpp:1541] Time to load Dictionary 2_270 was 90ms
I0619 09:57:19.057875 177940 Catalog.cpp:1541] Time to load Dictionary 2_263 was 78ms
I0619 09:57:19.121873 177940 Catalog.cpp:1541] Time to load Dictionary 2_271 was 63ms
I0619 09:57:19.198206 177940 Catalog.cpp:1541] Time to load Dictionary 2_264 was 76ms
I0619 09:57:19.283110 177940 Catalog.cpp:1541] Time to load Dictionary 2_266 was 84ms
I0619 09:57:19.360023 177940 Catalog.cpp:1541] Time to load Dictionary 2_292 was 76ms
I0619 09:57:23.464715 177940 Catalog.cpp:1541] Time to load Dictionary 2_294 was 4104ms
I0619 09:57:25.081632 177940 Catalog.cpp:1541] Time to load Dictionary 2_295 was 1616ms
I0619 09:57:25.160776 177940 Catalog.cpp:1541] Time to load Dictionary 2_296 was 79ms
I0619 09:57:25.271080 177940 Catalog.cpp:1541] Time to load Dictionary 2_297 was 110ms
I0619 09:57:25.586798 177940 Catalog.cpp:1541] Time to load Dictionary 2_298 was 315ms
I0619 09:57:25.664965 177940 Catalog.cpp:1541] Time to load Dictionary 2_299 was 78ms
I0619 09:57:25.776162 177940 Catalog.cpp:1541] Time to load Dictionary 2_300 was 111ms
I0619 09:57:25.859582 177940 Catalog.cpp:1541] Time to load Dictionary 2_301 was 83ms
I0619 09:57:25.953483 177940 Catalog.cpp:1541] Time to load Dictionary 2_302 was 93ms
I0619 09:57:26.045961 177940 Catalog.cpp:1541] Time to load Dictionary 2_303 was 92ms
I0619 09:57:26.149528 177940 Catalog.cpp:1541] Time to load Dictionary 2_304 was 103ms
I0619 09:57:26.252046 177940 Catalog.cpp:1541] Time to load Dictionary 2_305 was 102ms
I0619 09:57:26.344897 177940 Catalog.cpp:1541] Time to load Dictionary 2_306 was 92ms
I0619 09:57:26.446760 177940 Catalog.cpp:1541] Time to load Dictionary 2_307 was 101ms
I0619 09:57:26.551580 177940 Catalog.cpp:1541] Time to load Dictionary 2_308 was 104ms
I0619 09:57:26.650851 177940 Catalog.cpp:1541] Time to load Dictionary 2_265 was 99ms
I0619 09:57:26.737419 177940 Catalog.cpp:1541] Time to load Dictionary 2_309 was 86ms
I0619 09:57:26.830494 177940 Catalog.cpp:1541] Time to load Dictionary 2_258 was 92ms
I0619 10:00:57.357743 177940 Catalog.cpp:1541] Time to load Dictionary 2_259 was 210527ms
I0619 10:00:57.418648 177940 Catalog.cpp:1541] Time to load Dictionary 2_260 was 60ms
I0619 10:00:57.476310 177940 Catalog.cpp:1541] Time to load Dictionary 2_255 was 57ms
I0619 10:00:57.547843 177940 Catalog.cpp:1541] Time to load Dictionary 2_262 was 71ms
I0619 10:01:07.851114 258180 BufferMgr.cpp:283] ALLOCATION slab of 8388608 pages (4294967296B) created in 1 ms CPU_MGR:0
I0619 10:01:10.761175 258181 BufferMgr.cpp:283] ALLOCATION slab of 8388608 pages (4294967296B) created in 1 ms CPU_MGR:0
I0619 10:01:14.625392 258174 BufferMgr.cpp:283] ALLOCATION slab of 8388608 pages (4294967296B) created in 0 ms CPU_MGR:0
I0619 10:01:18.890010 258174 BufferMgr.cpp:283] ALLOCATION slab of 8388608 pages (4294967296B) created in 0 ms CPU_MGR:0
I0619 10:01:24.174590 258175 BufferMgr.cpp:283] ALLOCATION slab of 8388608 pages (4294967296B) created in 1 ms CPU_MGR:0
I0619 10:01:28.224390 258178 BufferMgr.cpp:283] ALLOCATION slab of 8388608 pages (4294967296B) created in 0 ms CPU_MGR:0
I0619 10:01:43.117117 258176 BufferMgr.cpp:283] ALLOCATION slab of 8388608 pages (4294967296B) created in 0 ms CPU_MGR:0
I0619 10:02:07.270020 258181 BufferMgr.cpp:283] ALLOCATION slab of 8388608 pages (4294967296B) created in 0 ms CPU_MGR:0
I0619 10:03:00.468348 258181 BufferMgr.cpp:283] ALLOCATION slab of 8388608 pages (4294967296B) created in 0 ms CPU_MGR:0
I0619 10:03:29.395956 258177 BufferMgr.cpp:283] ALLOCATION slab of 8388608 pages (4294967296B) created in 0 ms CPU_MGR:0
I0619 10:03:55.831858 258179 BufferMgr.cpp:283] ALLOCATION slab of 8388608 pages (4294967296B) created in 0 ms CPU_MGR:0
I0619 10:04:28.303525 258177 BufferMgr.cpp:283] ALLOCATION slab of 8388608 pages (4294967296B) created in 0 ms CPU_MGR:0
I0619 10:04:54.000891 258175 BufferMgr.cpp:283] ALLOCATION slab of 8388608 pages (4294967296B) created in 0 ms CPU_MGR:0
I0619 10:05:16.812661 258178 BufferMgr.cpp:283] ALLOCATION slab of 8388608 pages (4294967296B) created in 0 ms CPU_MGR:0
I0619 10:05:40.251863 258178 BufferMgr.cpp:283] ALLOCATION slab of 8388608 pages (4294967296B) created in 0 ms CPU_MGR:0
I0619 10:05:58.476510 258177 BufferMgr.cpp:283] ALLOCATION slab of 8388608 pages (4294967296B) created in 1 ms CPU_MGR:0
I0619 10:06:25.745426 258180 BufferMgr.cpp:283] ALLOCATION slab of 8388608 pages (4294967296B) created in 0 ms CPU_MGR:0
I0619 10:06:50.540935 258181 BufferMgr.cpp:283] ALLOCATION slab of 8388608 pages (4294967296B) created in 0 ms CPU_MGR:0
I0619 10:07:18.107967 258177 BufferMgr.cpp:283] ALLOCATION slab of 8388608 pages (4294967296B) created in 0 ms CPU_MGR:0
I0619 10:07:38.652019 258179 BufferMgr.cpp:283] ALLOCATION slab of 8388608 pages (4294967296B) created in 0 ms CPU_MGR:0
I0619 10:08:02.250202 258174 BufferMgr.cpp:283] ALLOCATION slab of 8388608 pages (4294967296B) created in 1 ms CPU_MGR:0
I0619 10:08:28.772279 258179 BufferMgr.cpp:283] ALLOCATION slab of 8388608 pages (4294967296B) created in 0 ms CPU_MGR:0
I0619 10:08:53.891650 258175 BufferMgr.cpp:283] ALLOCATION slab of 8388608 pages (4294967296B) created in 0 ms CPU_MGR:0
I0619 10:09:17.678248 258181 BufferMgr.cpp:283] ALLOCATION slab of 8388608 pages (4294967296B) created in 0 ms CPU_MGR:0
I0619 10:09:37.245029 258176 BufferMgr.cpp:283] ALLOCATION slab of 8388608 pages (4294967296B) created in 0 ms CPU_MGR:0
I0619 10:09:56.860926 258175 BufferMgr.cpp:283] ALLOCATION slab of 8388608 pages (4294967296B) created in 2 ms CPU_MGR:0


I0619 10:22:07.257082 412512 BufferMgr.cpp:392] ALLOCATION failed to find 128000000B free. Forcing Eviction. Eviction start 7243825 Number pages requested 250000 Best Eviction Start Slab 41 CPU_MGR:0
I0619 10:22:07.331094 412513 BufferMgr.cpp:392] ALLOCATION failed to find 145382400B free. Forcing Eviction. Eviction start 7493825 Number pages requested 283950 Best Eviction Start Slab 41 CPU_MGR:0
I0619 10:22:07.439566 412512 BufferMgr.cpp:392] ALLOCATION failed to find 128000000B free. Forcing Eviction. Eviction start 7777775 Number pages requested 250000 Best Eviction Start Slab 41 CPU_MGR:0
I0619 10:22:07.529997 412512 BufferMgr.cpp:392] ALLOCATION failed to find 256000000B free. Forcing Eviction. Eviction start 0 Number pages requested 500000 Best Eviction Start Slab 42 CPU_MGR:0
I0619 10:22:07.881053 412512 BufferMgr.cpp:392] ALLOCATION failed to find 256000000B free. Forcing Eviction. Eviction start 500000 Number pages requested 500000 Best Eviction Start Slab 42 CPU_MGR:0
F0619 10:22:08.683652 177940 StringDictionaryProxy.cpp:91] Check failed: it != transient_int_to_str_.end() 

What i need to do, when big table, I have to select the detail row ?


#12

It loads all fields because the query needs all fields for the projection phase on CPU; so the two fields needed for filtering are loaded on GPU RAM , but all fields have to be loaded on system RAM for projection


#13

Thanks, @aznable.
I have other issue when i use SQLImporter tool.
2018-06-19 15:12:46 INFO SQLImporter:executeQuery:258 - Imported 1100000 records
2018-06-19 15:13:03 INFO SQLImporter:executeQuery:258 - Imported 1200000 records
2018-06-19 15:13:21 INFO SQLImporter:executeQuery:258 - Imported 1300000 records
2018-06-19 15:13:39 ERROR SQLImporter:executeQuery:282 - TException failed - org.apache.thrift.transport.TTransportException
org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
at com.mapd.thrift.server.MapD$Client.recv_load_table_binary_columnar(MapD.java:1374)
at com.mapd.thrift.server.MapD$Client.load_table_binary_columnar(MapD.java:1359)
at com.mapd.utility.SQLImporter.executeQuery(SQLImporter.java:251)
at com.mapd.utility.SQLImporter.doWork(SQLImporter.java:189)
at com.mapd.utility.SQLImporter.main(SQLImporter.java:59)

I have try many times. This issue random occurrence, when the issue occurrence, mapd_server Broken pipe, I have to reimport all data. my sql source is hive.


#14

Render thread exited normally, How can i fixed it. Can some guy help me?

 E0619 18:56:15.438980 121246 QueryRenderManager.cpp:448] There's a synchronization issue between CUDA/OpenGL devices
 I0619 18:56:15.439379 121355 QueryRenderManager.cpp:188] Render thread exited normally
 E0619 18:56:15.441013 121246 MapDHandler.cpp:209] Backend rendering disabled: /home/jenkins-slave/workspace/mapd2-multi/compiler/gcc/gpu/cuda/host/centos/render/render/QueryRenderer/QueryRenderManager.cpp:448 There's a synchronization issue between CUDA/OpenGL devices



    nvidia-smi  Type only C not C+G
    +-----------------------------------------------------------------------------+
    | Processes:                                                       GPU Memory |
    |  GPU       PID  Type  Process name                               Usage      |
    |=============================================================================|
    |    0     87780    C   /data/mapd/mapd/bin/mapd_server               2213MiB |
    |    1     87780    C   /data/mapd/mapd/bin/mapd_server               2213MiB |
    |    2     87780    C   /data/mapd/mapd/bin/mapd_server               2213MiB |
    |    3     87780    C   /data/mapd/mapd/bin/mapd_server               2213MiB |
    |    4     87780    C   /data/mapd/mapd/bin/mapd_server               2213MiB |
    |    5     87780    C   /data/mapd/mapd/bin/mapd_server               2213MiB |
    |    6     87780    C   /data/mapd/mapd/bin/mapd_server               2213MiB |
    |    7     87780    C   /data/mapd/mapd/bin/mapd_server               2213MiB |
    +-----------------------------------------------------------------------------+

#15

i am sorry @llz i havent never used the sql importer with hive.

about the back-end renderer disabled, it guess you have a driver issue; which kind of Titan X are you using? both Maxwell and Psscal Titan X have 12GB on board, not 22GB, so the driver isnt detecting correctly your GPUs?


#16

Thanks @aznable, Today, I use SqlImporter tool to import data from hive is ok, I guess the hive is instability.
you are right, I make a mistake, it is not Titan x, I use P40. Driver Version: 367.48, Cuda is V8.0.44.

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Sun_Sep__4_22:14:01_CDT_2016
Cuda compilation tools, release 8.0, V8.0.44

what i need to do for fixed the back-end renderer disabled issue?


#17

i am using the driver 384.130 and Cuda ct V8.0.61 on gtx 1080ti that uses a GP102 Nvidia chip, more or less the same of Tesla P40, but i cannot say the driver is causing the problem


#18

Hi,

@llz your driver is too old to support render. I think off top of my head it needs at least 375, ideally your should go to 384 to future proof for upcoming releases

regards


#19

Thanks @dwayneberry very much, I have update the nvidia Driver to 384.66, now it is ok.


#20

I create a table with 9 fields, 1000M rows. I have two sql.
sql 1: select count() from table1 where msg_id = ‘1fcee9f6-8497-4f8b-a33d-bfcb8049fa2a’;
it cost 6652 ms
sql 2: select count(
) from table1 where dt=‘2017-12-01’;
it cost 93 ms
Field msg_id is unique,it has 1000M value, field dt only has 30 different value.
Different filter field cause so such different cost time, why?