Help me. How to solve? failed: cuModuleLoadDataEx


#1

My computer has been using K40c & GTX970M.

nvidia-smi

±----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66 Driver Version: 375.66 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K40c Off | 0000:02:00.0 Off | 0 |
| 24% 46C P8 20W / 235W | 0MiB / 11439MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 GeForce GTX 970 Off | 0000:04:00.0 On | N/A |
| 41% 39C P8 22W / 200W | 185MiB / 4037MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 1 15018 G /usr/bin/X 88MiB |
| 1 15380 G /usr/bin/gnome-shell 92MiB |
| 1 16492 G /usr/lib64/firefox/plugin-container 2MiB |
±----------------------------------------------------------------------------+

I already add configuration.

mapd.conf

gpu = true
num-gpus = 0
start-gpu = 0

K40c has Kepler. But I can’t see result.
mapdql> select count(*) from flights_2008_7M;

error message
F0717 10:42:24.680974 17012 NvidiaKernel.cpp:93] Check failed: cuModuleLoadDataEx(&module_, image, num_options, options, option_vals) == CUDA_SUCCESS (300 vs. 0)
*** Check failure stack trace: ***
@ 0x19c2bba google::LogMessage::Fail()
@ 0x19c2b11 google::LogMessage::SendToLog()
@ 0x19c24dc google::LogMessage::Flush()
@ 0x19c5475 google::LogMessageFatal::~LogMessageFatal()
@ 0xfac2d8 GpuCompilationContext::GpuCompilationContext()
@ 0xfa6048 Executor::optimizeAndCodegenGPU()
@ 0xfa8656 Executor::compileWorkUnit()
@ 0xf5dd16 Executor::ExecutionDispatch::compile()
@ 0xf4e965 Executor::executeWorkUnit()
@ 0xfefad8 RelAlgExecutor::executeWorkUnit()
@ 0xff075d RelAlgExecutor::executeCompound()
@ 0xff30cd RelAlgExecutor::executeRelAlgStep()
@ 0xff38bf RelAlgExecutor::executeRelAlgSeq()
@ 0xff486d RelAlgExecutor::executeRelAlgQuery()
@ 0xe5f135 MapDHandler::execute_rel_alg()
@ 0xe659a5 MapDHandler::sql_execute_impl()
@ 0xe689e1 MapDHandler::sql_execute()
@ 0xdb7b0e MapDProcessor::process_sql_execute()
@ 0xda9697 MapDProcessor::dispatchCall()
@ 0xda354c apache::thrift::TDispatchProcessor::process()
@ 0x2f7026f apache::thrift::server::TConnectedClient::run()
@ 0x2f4b6b5 apache::thrift::concurrency::ThreadManager::Task::run()
@ 0x2f4bbf5 apache::thrift::concurrency::ThreadManager::Worker::run()
@ 0x2f6a59b apache::thrift::concurrency::PthreadThread::threadMain()
@ 0x7f79b0e2bdc5 start_thread
@ 0x7f79ae68f76d __clone
./startmapd: line 102: 16971 Aborted (core dumped) ./bin/mapd_server $MAPD_DATA $RO --port $MAPD_TCP_PORT --http-port $MAPD_HTTP_PORT --calcite-port MAPD_CALCITE_PORT *

$> nvidia-smi -q

==============NVSMI LOG==============

Timestamp : Mon Jul 17 10:49:24 2017
Driver Version : 375.66

Attached GPUs : 2
GPU 0000:02:00.0
Product Name : Tesla K40c
Product Brand : Tesla
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 1920
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0320415010475
GPU UUID : GPU-f139bc7b-144c-aac3-fd45-90bffd66d463
Minor Number : 0
VBIOS Version : 80.80.3E.00.02
MultiGPU Board : No
Board ID : 0x200
GPU Part Number : 900-22081-2250-000
Inforom Version
Image Version : 2081.0206.01.04
OEM Object : 1.1
ECC Object : 3.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization mode : None
PCI
Bus : 0x02
Device : 0x00
Domain : 0x0000
Device Id : 0x102410DE
Bus Id : 0000:02:00.0
Sub System Id : 0x098310DE
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 8x
Bridge Chip
Type : N/A
Firmware : N/A
Replays since reset : 0
Tx Throughput : N/A
Rx Throughput : N/A
Fan Speed : 25 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
Sync Boost : Not Active
Unknown : Not Active
FB Memory Usage
Total : 11439 MiB
Used : 0 MiB
Free : 11439 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 2 MiB
Free : 254 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0 ms
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Texture Shared : N/A
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Texture Shared : N/A
Total : 0
Aggregate
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Texture Shared : N/A
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Texture Shared : N/A
Total : 0
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending : No
Temperature
GPU Current Temp : 47 C
GPU Shutdown Temp : 95 C
GPU Slowdown Temp : 90 C
Power Readings
Power Management : Supported
Power Draw : 21.18 W
Power Limit : 235.00 W
Default Power Limit : 235.00 W
Enforced Power Limit : 235.00 W
Min Power Limit : 180.00 W
Max Power Limit : 235.00 W
Clocks
Graphics : 324 MHz
SM : 324 MHz
Memory : 324 MHz
Video : 405 MHz
Applications Clocks
Graphics : 745 MHz
Memory : 3004 MHz
Default Applications Clocks
Graphics : 745 MHz
Memory : 3004 MHz
Max Clocks
Graphics : 875 MHz
SM : 875 MHz
Memory : 3004 MHz
Video : 540 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes : None

GPU 0000:04:00.0
Product Name : GeForce GTX 970
Product Brand : GeForce
Display Mode : Enabled
Display Active : Enabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 1920
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-be407530-c665-e731-b9a0-f9183c6f8f07
Minor Number : 1
VBIOS Version : 84.04.84.00.51
MultiGPU Board : No
Board ID : 0x400
GPU Part Number : N/A
Inforom Version
Image Version : N/A
OEM Object : N/A
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization mode : None
PCI
Bus : 0x04
Device : 0x00
Domain : 0x0000
Device Id : 0x13C210DE
Bus Id : 0000:04:00.0
Sub System Id : 0x113110DE
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays since reset : 0
Tx Throughput : 0 KB/s
Rx Throughput : 4000 KB/s
Fan Speed : 44 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
Sync Boost : Not Active
Unknown : Not Active
FB Memory Usage
Total : 4037 MiB
Used : 159 MiB
Free : 3878 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 4 MiB
Free : 252 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 4 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0 ms
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending : N/A
Temperature
GPU Current Temp : 40 C
GPU Shutdown Temp : 96 C
GPU Slowdown Temp : 91 C
Power Readings
Power Management : Supported
Power Draw : 24.65 W
Power Limit : 200.00 W
Default Power Limit : 200.00 W
Enforced Power Limit : 200.00 W
Min Power Limit : 100.00 W
Max Power Limit : 250.00 W
Clocks
Graphics : 135 MHz
SM : 135 MHz
Memory : 324 MHz
Video : 405 MHz
Applications Clocks
Graphics : 1178 MHz
Memory : 3505 MHz
Default Applications Clocks
Graphics : 1178 MHz
Memory : 3505 MHz
Max Clocks
Graphics : 1519 MHz
SM : 1519 MHz
Memory : 3505 MHz
Video : 1397 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes
Process ID : 15018
Type : G
Name : /usr/bin/X
Used GPU Memory : 82 MiB
Process ID : 15380
Type : G
Name : /usr/bin/gnome-shell
Used GPU Memory : 75 MiB


#2

Hi,

Your config should be something more like

num-gpus = 1
start-gpu = 0

regards


#3

Hello dwayneberry.

I have changed configuration file.

But, I can’t see result.

This keeps the problem going.

Can you advice to me some more^^


#4

Hi,

It looks like you are using startmapd to start your server. startmapd does not use mapd.conf so you would need to add all the parameters directly to the startmapd command.

You may want to use systemd or at least mapd_server --config <configfile> so it is easier to confirm you are using the parameters you think you are.

Once you are running with correct config, if you still are having issues, please set GLOG_v=1 in your environment variables and rerun.

Then include the contents of your mapd_server.INFO log from your data\mapd_logs directory

regards


#5

Already. I have used "mapd_server --config "
I have added “GLOG_v=1” in my config file.
This is @mapd_server.INFO message
I0719 11:04:09.297863 17002 MapDHandler.cpp:471] User mapd connected to database mapd
I0719 11:04:11.016347 17002 MapDHandler.cpp:667] sql_execute :QRhdsf9XmwJX3ud97BeM7hDy6gdEFwuB:query_str:select count() from flights_2008_7M ;
I0719 11:04:11.017223 17002 Calcite.cpp:233] User mapd catalog mapd sql 'select count(
) from flights_2008_7M ;'
I0719 11:04:11.478029 17002 Calcite.cpp:274] Time in Thrift 12 (ms), Time in Java Calcite server 448 (ms)
I0719 11:04:11.478360 17002 FileMgr.cpp:116] Read table metadata, Epoch is 30 for table data at 'data/mapd_data/table_1_1/'
F0719 11:04:11.527390 17002 NvidiaKernel.cpp:93] Check failed: cuModuleLoadDataEx(&module_, image, num_options, options, option_vals) == CUDA_SUCCESS (300 vs. 0)


#6

mapd_server.INFO Message ALL
Log file created at: 2017/07/19 11:04:01
Running on machine: localhost.localdomain
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0719 11:04:01.992249 16954 MapDServer.cpp:589] MapD started with data directory at 'data’
I0719 11:04:01.992846 16954 MapDServer.cpp:596] Watchdog is set to 1
I0719 11:04:01.992859 16954 MapDServer.cpp:618] cuda block size 0
I0719 11:04:01.992866 16954 MapDServer.cpp:619] cuda grid size 0
I0719 11:04:01.992871 16954 MapDServer.cpp:620] calcite JVM max memory 1024
I0719 11:04:01.992893 16954 MapDHandler.cpp:261] MapD Server 3.1.1-20170626-45a6fa8
I0719 11:04:02.479895 16954 CudaMgr.cpp:127] Using 2 Gpus.
I0719 11:04:02.480026 16954 DataMgr.cpp:120] cpuSlabSize is 4096M
I0719 11:04:02.480046 16954 DataMgr.cpp:122] reserved GPU memory is 604.837M includes render buffer allocation
I0719 11:04:02.480054 16954 DataMgr.cpp:132] gpuSlabSize is 2048M
I0719 11:04:02.480059 16954 DataMgr.cpp:132] gpuSlabSize is 2048M
I0719 11:04:02.480104 16954 FileMgr.cpp:116] Read table metadata, Epoch is 0 for table data at 'data/mapd_data/table_0_0/'
I0719 11:04:02.480181 16954 Calcite.cpp:161] Creating Calcite Handler, Calcite Port is 9093 base data dir is data
I0719 11:04:02.480190 16954 Calcite.cpp:128] Running calcite server as a daemon
I0719 11:04:04.480365 16954 Calcite.cpp:136] slept 2 before checking server
I0719 11:04:04.502960 16954 Calcite.cpp:150] ping took 22 ms
I0719 11:04:04.512583 16954 Calcite.cpp:340] [{“name”:“Tan”,“ret”:“double”,“args”:[“double”]},{“name”:“Truncate__”,“ret”:“float”,“args”:[“float”,“i32”]},{“name”:“ln”,“ret”:“double”,“args”:[“double”]},{“name”:“distance_in_meters__”,“ret”:“double”,“args”:[“float”,“float”,“float”,“float”]},{“name”:“Floor__”,“ret”:“float”,“args”:[“float”]},{“name”:“Floor__2”,“ret”:“i32”,“args”:[“i32”]},{“name”:“Floor__3”,“ret”:“i64”,“args”:[“i64”]},{“name”:“Truncate”,“ret”:“double”,“args”:[“double”,“i32”]},{“name”:“Floor__1”,“ret”:“i16”,“args”:[“i16”]},{“name”:“radians”,“ret”:“double”,“args”:[“double”]},{“name”:“degrees”,“ret”:“double”,“args”:[“double”]},{“name”:“Ceil__”,“ret”:“float”,“args”:[“float”]},{“name”:“ln__”,“ret”:“double”,“args”:[“float”]},{“name”:“approx_distance_in_meters”,“ret”:“double”,“args”:[“float”,“float”,“float”,“float”]},{“name”:“Ceil__2”,“ret”:“i32”,“args”:[“i32”]},{“name”:“Ceil__1”,“ret”:“i16”,“args”:[“i16”]},{“name”:“Round”,“ret”:“double”,“args”:[“double”]},{“name”:“Ceil__3”,“ret”:“i64”,“args”:[“i64”]},{“name”:“power”,“ret”:“double”,“args”:[“double”,“double”]},{“name”:“conv_4326_900913_x”,“ret”:“double”,“args”:[“double”]},{“name”:“conv_4326_900913_y”,“ret”:“double”,“args”:[“double”]},{“name”:“Atan”,“ret”:“double”,“args”:[“double”]},{“name”:“Floor”,“ret”:“double”,“args”:[“double”]},{“name”:“Log10”,“ret”:“double”,“args”:[“double”]},{“name”:“Truncate__1”,“ret”:“i16”,“args”:[“i16”,“i32”]},{“name”:“Truncate__2”,“ret”:“i32”,“args”:[“i32”,“i32”]},{“name”:“Log”,“ret”:“double”,“args”:[“double”]},{“name”:“Truncate__3”,“ret”:“i64”,“args”:[“i64”,“i32”]},{“name”:“Cos”,“ret”:“double”,“args”:[“double”]},{“name”:“Log__”,“ret”:“double”,“args”:[“float”]},{“name”:“Log10__”,“ret”:“double”,“args”:[“float”]},{“name”:“Asin”,“ret”:“double”,“args”:[“double”]},{“name”:“Cot”,“ret”:“double”,“args”:[“double”]},{“name”:“Tan__”,“ret”:“double”,“args”:[“float”]},{“name”:“distance_in_meters”,“ret”:“double”,“args”:[“double”,“double”,“double”,“double”]},{“name”:“Atan2”,“ret”:“double”,“args”:[“double”,“double”]},{“name”:“Acos”,“ret”:“double”,“args”:[“double”]},{“name”:“pi”,“ret”:“double”,“args”:[]},{“name”:“Sin”,“ret”:“double”,“args”:[“double”]},{“name”:“Ceil”,“ret”:“double”,“args”:[“double”]},{“name”:“Exp”,“ret”:“double”,“args”:[“double”]}]
I0719 11:04:04.512909 16954 MapDHandler.cpp:307] Started in GPU mode
I0719 11:04:04.521603 16954 EglGLWindow.cpp:107] Window Setting: DRAWABLE_TYPE: PBUFFER.
I0719 11:04:04.521641 16954 EglUtils.cpp:61] EGL Setting: BITS_RGBA = 8.
I0719 11:04:04.521654 16954 EglUtils.cpp:84] EGL Setting: BITS_ALPHA = 8.
I0719 11:04:04.535286 16954 EglGLRenderer.cpp:100] Renderer Setting: USE_CORE_PROFILE: True.
I0719 11:04:04.604212 16954 EglGLRenderer.cpp:166] Renderer Setting: <OPENGL_MAJOR>.<OPENGL_MINOR>: 4.5.
I0719 11:04:04.624897 16954 EglGLWindow.cpp:107] Window Setting: DRAWABLE_TYPE: PBUFFER.
I0719 11:04:04.624938 16954 EglUtils.cpp:61] EGL Setting: BITS_RGBA = 8.
I0719 11:04:04.624987 16954 EglUtils.cpp:84] EGL Setting: BITS_ALPHA = 8.
I0719 11:04:04.625025 16954 EglGLRenderer.cpp:100] Renderer Setting: USE_CORE_PROFILE: True.
I0719 11:04:04.674034 16954 EglGLRenderer.cpp:166] Renderer Setting: <OPENGL_MAJOR>.<OPENGL_MINOR>: 4.5.
I0719 11:04:04.723846 16954 QueryRenderManager.cpp:309] QueryRenderManager initialized for rendering. start GPU: 0, num GPUs: 2, Render cache limit: 500


#7

Hi,

Your log is still reporting

I0719 11:04:02.479895 16954 CudaMgr.cpp:127] Using 2 Gpus.

It is stil trying to use both gpu’s

please share the entire contents of your config file.

The command you show "mapd_server --config " is incomplete, you actually need to give the command the name of your config file mapd_server --config <myconfig file>

regards


#8

Thank You Sir^^

I’m very happy. I can see result.

num-gpus = 2 // It’s my mistake.

Because. Already I did tried like this(below). But It didn’t work. so Today. I have tried again install. may be …five times.

I forgot change it. " num-gpus = 2 " …I have tied so many different value.

I’m so sorry. Thank you very much. your help.

my config file. //Now It’s No problem.
GLOG_v = 1
gpu = true
num-gpus = 1
start-gpu = 0
port = 9091
http-port = 9090
data = “/var/lib/mapd/data”

[web]
port = 9092
frontend = “/root/MapD-GPU/frontend”


#9

I have a problem. again

How can I solve the problem?

The Problem like this.

[root@localhost MapD-GPU]# ./startmapd &
[4] 18011
[3] Terminated ./startmapd
[root@localhost MapD-GPU]# Backend TCP: localhost:9091
Backend HTTP: localhost:9090
Frontend Web: localhost:9092
Calcite TCP: localhost:9093

  • sleeping for 5s while server starts
    Thrift: Wed Jul 19 22:01:55 2017 TServerSocket::listen() BIND 9091
    Thrift: Wed Jul 19 22:01:55 2017 TServerSocket::listen() BIND 9090
    E0719 22:01:55.863275 18059 MapDServer.cpp:153] Exception: Could not bind: Transport endpoint is not connected
    E0719 22:01:55.863415 18060 MapDServer.cpp:153] Exception: Could not bind: Transport endpoint is not connected
    F0719 22:01:55.869799 18015 QueryBuffer.cpp:193] Check failed: result == CUDA_SUCCESS CUDA error code=999
    *** Check failure stack trace: ***
    @ 0x19c2bba google::LogMessage::Fail()
    @ 0x19c2b11 google::LogMessage::SendToLog()
    @ 0x19c24dc google::LogMessage::Flush()
    @ 0x19c5475 google::LogMessageFatal::~LogMessageFatal()
    @ 0x11fa7aa QueryRenderer::QueryBuffer::checkCudaErrors()
    @ 0x11fbb1a QueryRenderer::QueryBuffer::~QueryBuffer()
    @ 0x11fc354 QueryRenderer::QueryResultVertexBuffer::~QueryResultVertexBuffer()
    @ 0xd77495 std::_Sp_counted_base<>::_M_release()
    @ 0x11820a2 std::_Sp_counted_ptr<>::_M_dispose()
    @ 0xd77495 std::_Sp_counted_base<>::_M_release()
    @ 0x1185945 std::_Sp_counted_ptr<>::_M_dispose()
    @ 0xd77495 std::_Sp_counted_base<>::_M_release()
    @ 0x1184740 std::_Sp_counted_ptr<>::_M_dispose()
    @ 0xd77495 std::_Sp_counted_base<>::_M_release()
    @ 0x1176882 QueryRenderer::QueryRenderManager::~QueryRenderManager()
    @ 0xe470c0 MapDHandler::~MapDHandler()
    @ 0xe47299 MapDHandler::~MapDHandler()
    @ 0xd88f59 boost::detail::shared_count::~shared_count()
    @ 0xd10704 main
    @ 0x7fe1be374b35 __libc_start_main
    @ 0xd75d05 (unknown)
    ./startmapd: line 100: 18015 Aborted (core dumped) ./bin/mapd_server $MAPD_DATA $RO --config /root/MapD-GPU/systemd/mapd.conf --port $MAPD_TCP_PORT --http-port $MAPD_HTTP_PORT --calcite-port MAPD_CALCITE_PORT *
    ^C
    [4]- Terminated ./startmapd

#10

Hi,

You appear to be using startmapd. Please include the full mapd_server.INFO, there is not enough info here to help you.

I would not recommend you use startmapd please either use systemd services or mapd_server and mapd_web_server due to your odd machine set up with varying generation of gpu cards.

regards


#11

Ok I see.
Now. It’s worked perfect.
Thank you Dwayneberry ^^