No GPUs detected, falling back to CPU mode


#1

Just setup MapD community edition and upgraded Nvidia drivers and Cuda. Everything seems up to date with the MapD requirements and have 3 different GPUs in the machine, but MapD seems to be rejecting all of them. A TitanX and Quadro K4000 are among the cards. Sample output:

root@dell13:/home/ageis# nvidia-smi
Wed May 17 04:41:32 2017
±----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66 Driver Version: 375.66 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 NVS 315 Off | 0000:03:00.0 N/A | N/A |
| 30% 46C P8 N/A / N/A | 0MiB / 964MiB | N/A E. Process |
±------------------------------±---------------------±---------------------+
| 1 GeForce GTX TIT… Off | 0000:04:00.0 Off | N/A |
| 22% 35C P8 15W / 250W | 2MiB / 12207MiB | 0% E. Process |
±------------------------------±---------------------±---------------------+
| 2 Quadro K4000 Off | 0000:81:00.0 Off | N/A |
| 30% 38C P8 11W / 87W | 2MiB / 3017MiB | 0% E. Process |
±------------------------------±---------------------±---------------------+

root@dell13:/home/ageis# nvcc -V
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

root@dell13:/home/ageis# /opt/mapd/bin/mapd_server /Volumes/Local_SSD/mapd/ --gpu
E0517 04:46:09.547808 11153 MapDHandler.cpp:281] No GPUs detected, falling back to CPU mode
E0517 04:46:09.547983 11153 MapDHandler.cpp:310] Backend rendering disabled: running in CPU mode
Thrift: Wed May 17 04:46:09 2017 TServerSocket::listen() BIND 9091
E0517 04:46:09.550012 11241 MapDServer.cpp:134] Exception: Could not bind: Address already in use
Thrift: Wed May 17 04:46:09 2017 TServerSocket::listen() BIND 9090
E0517 04:46:09.550271 11242 MapDServer.cpp:134] Exception: Could not bind: Address already in use

Any help is greatly appreciated,


#2

Hi,

Thanks for exploring the world of MapD :slight_smile:

We do not normally work with heterogeneous cards in one machine. We generally assume a balance between the memory available on each card.

Thats said I do not think that is your immediate issue.

I believe the issue may be that when we are checking at startup the capabilities etc of the GPU’s in your box one of you cards is failing that process, probably the NVS 315.

If it is possible I would recommend you try with just the Titan X. I think if you try adding the flags
--start-gpu=1 --num-gpu=1
it should try to start only using the Titan.

Regards


#3

Many thanks! We’re using this as our testing machine in the hopes of getting a working version to play around with before building a production machine so to say, as the prospect of MapD is pretty new and exciting to us. I tried your recommended flags and had interesting results. It does indeed look like one of the cards is causing an issue but interestingly enough, unless I’m missing something with the GPU numbering, we think it’s the TitanX. I’ve included a screenshot of the ID output from nvidia-smi and the results we got from trying to launch MapD from each of the GPUs (2 out of 3 were successful). The error on the third goes for quite a while, but figured the top of output would suffice for now. My gut is telling me that the GPU numbering may not correspond correctly, unless there is actually just an issue with the Titan X?

Again, the help is greatly appreciated!


#4

As a followup with some light research, we’ve confirmed the Quadro K4000 and the NVS315 are Kepler, but the TitanX we have is technically maxwell (not listed in MapD requirement specs). I did read an article somewhere about a guy doing TitanX maxwell with MapD but I’m obviously thinking I should take that with a grain of salt.

Would that be enough to cause problems, and if so should we do our testing with the K4000 and look to upgrade to a TitanXp? We’re most interested in high GPU ram to price ratio.


#5

Hi,

There are issues with maxwell so we do not officially support, but it should still run there.

To get further detail about cards on start you can set env variable GLOG_v=1 This will give you further card details on startup.

Of the 3 cards you have I would recommend you only use the Maxwell Titan for now.


#6

We wrote a script to check Cuda’s device info output and found it is indeed ordering, so the NVS315 is indeed the offending card. We’ll be testing with the TitanX. Thanks again for all the help!

root@dell13:/home/ageis/cuda/devices# ./devices
Device Number: 0
Device name: GeForce GTX TITAN X
Memory Clock Rate (KHz): 3505000
Memory Bus Width (bits): 384
Peak Memory Bandwidth (GB/s): 336.480000

Device Number: 1
Device name: NVS 315
Memory Clock Rate (KHz): 875000
Memory Bus Width (bits): 64
Peak Memory Bandwidth (GB/s): 14.000000

Device Number: 2
Device name: Quadro K4000
Memory Clock Rate (KHz): 2808000
Memory Bus Width (bits): 192
Peak Memory Bandwidth (GB/s): 134.784000


#7

Hi,

I think i’m having a similar issue with GPU detection, with a different scenario: we only have one GPU. I’ve already reinstalled and upgraded nvidia and cuda drivers, triple-checked PATHS and ENV Variables with no luck…are any tests that I could perform to debug why isn’t my card detected by mapd?

OS: Ubuntu 16.04.2
Card: Kepler QUADRO K2200

output of startmapd:

$MAPD_PATH/startmapd --data $MAPD_DATA
Backend TCP:  localhost:9091
Backend HTTP: localhost:9090
Frontend Web: localhost:9092
E0614 10:15:52.172863  4836 MapDHandler.cpp:281] No GPUs detected, falling back to CPU mode
E0614 10:15:52.193634  4836 MapDHandler.cpp:310] Backend rendering disabled: running in CPU mode

Output of nvidia-smi:

Wed Jun 14 10:25:51 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66                 Driver Version: 375.66                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K2200        Off  | 0000:02:00.0      On |                  N/A |
| 42%   28C    P8     1W /  39W |    320MiB /  4039MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+

output of nvcc -V check:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

Thanks!


#8

Hi

Could you please check that you see the same issue immediately after fresh power off then reboot of the machine. I have seen issues where a card gets oddly lost over time. Suspend related I believe.

Regards