Mapd_server not running, no errors


#1

New to this forum. I installed mapd per CentOS7 instructions, apparently no problems. But then I tried to insert sample data and that’s when I got

Thrift: Thu Aug 23 14:01:44 2018 TSocket::open() connect() <Host: localhost Port: 9091>Connection refused

firewalld/iptables are stopped, there is nothing running at port 9091

[root@n37 mapd]# systemctl start mapd_server
[root@n37 mapd]# systemctl status mapd_server
â mapd_server.service - MapD database server
Loaded: loaded (/usr/lib/systemd/system/mapd_server.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2018-08-23 13:48:00 EDT; 4s ago
Main PID: 17643 (mapd_server)
CGroup: /system.slice/mapd_server.service
ââ17643 /opt/mapd/bin/mapd_server --config /var/lib/mapd/mapd.conf

Aug 23 13:48:00 n37 systemd[1]: Started MapD database server.
Aug 23 13:48:00 n37 systemd[1]: Starting MapD database server…

[root@n37 mapd]# ps -efl | grep 17643
0 S root 19981 6315 0 80 0 - 28176 pipe_w 13:51 pts/1 00:00:00 grep --color=auto 17643

No errors in /var/lib/mapd/data/mapd_log/mapd_server.INFO, it reports
Using 4 GPUs …

[root@n37 mapd]# lsof -i:9093
[root@n37 mapd]# lsof -i:9092
[root@n37 mapd]# lsof -i:9091
[root@n37 mapd]# lsof -i:9090
[root@n37 mapd]#

Where to look for a fix?

-Henk


#2

Hi @hmeij ,

Could you please send us the output from the following commands:
nvidia-smi
nvidia-smi --query-gpu=gom.current --format=csv,noheader

Regards,
Veda


#3

Sure. But why would the type of gpu matter? Also in 18 hours since reboot there are close to 10,000 log files in mapd_log, is that normal?

[root@n37 ~]# grep -i error /var/lib/mapd/data/mapd_log/*

[root@n37 ~]#

root@n37 ~]# grep -i warning /var/lib/mapd/data/mapd_log/*

[root@n37 ~]#

-Henk

[root@n37 ~]# nvidia-smi

Fri Aug 24 07:31:41 2018


#4

Ah. This explains the mass of log files. mapd_server is trying to start, over and over again.

[root@n37 systemd]# while true; do top -u mapd -b -n 1 | grep mapd; lsof -i:9091;sleep 10; done
13772 mapd      20   0  351316  25272  18536 R  23.5  0.0   0:00.96 mapd_server
 9042 mapd      20   0  349660   8436   3788 S   0.0  0.0   0:00.02 mapd_web_server
 9042 mapd      20   0  349660   8436   3788 S   0.0  0.0   0:00.02 mapd_web_server
13941 mapd      20   0  352192  25416  18616 D   0.0  0.0   0:00.37 mapd_server
14020 mapd      20   0  280.8g 231752 211416 R  94.4  0.1   0:01.63 mapd_server
 9042 mapd      20   0  349660   8436   3788 S   0.0  0.0   0:00.02 mapd_web_server
14180 mapd      20   0  351144  27216  18448 D  41.2  0.0   0:00.64 mapd_server
 9042 mapd      20   0  349660   8436   3788 S   0.0  0.0   0:00.02 mapd_web_server
14288 mapd      20   0   35.4g  47360  12672 S 175.0  0.0   0:00.35 java
 9042 mapd      20   0  349660   8436   3788 S   0.0  0.0   0:00.02 mapd_web_server
14268 mapd      20   0  280.9g 341856 310712 S   0.0  0.1   0:01.85 mapd_server
 9042 mapd      20   0  349660   8436   3788 S   0.0  0.0   0:00.02 mapd_web_server
14421 mapd      20   0  351144  25168  18448 D   0.0  0.0   0:00.74 mapd_server

Looking at the most recent minute of log files I know have an error…possibly
coming from the gpus? Are there particula compute modes that need to set on the
gpu like exclusivity or persistence?

I0824 08:51:07.092370 16989 MapDHandler.cpp:201] Started in GPU mode
I0824 08:51:07.107959 16989 MapDServer.cpp:108] Interrupt signal (11) received.

-Henk


#6

hmm, part of output got stripped…

[ root@n37 ~]# nvidia-smi
Fri Aug 24 07:31:41 2018
±----------------------------------------------------------------------------+
| NVIDIA-SMI 396.37 Driver Version: 396.37 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20m Off | 00000000:02:00.0 Off | 0 |
| N/A 32C P0 47W / 225W | 3MiB / 4743MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla K20m Off | 00000000:03:00.0 Off | 0 |
| N/A 33C P0 48W / 225W | 3MiB / 4743MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 Tesla K20m Off | 00000000:83:00.0 Off | 0 |
| N/A 30C P0 48W / 225W | 1MiB / 4743MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 3 Tesla K20m Off | 00000000:84:00.0 Off | 0 |
| N/A 30C P0 46W / 225W | 0MiB / 4743MiB | 59% Default |
±------------------------------±---------------------±---------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
[root@n37 ~]# nvidia-smi --query-gpu=gom.current --format=csv,noheader
Compute
Compute
Compute
Compute

#7

Hi @hmeij,

The K20 GPU requires that you explicitly enable graphics/rendering support.
Please enable All ON mode the GPU:
sudo nvidia-smi --gom=0
sudo reboot

This should solve your issue.
Regards,
Veda


#8

Phenomenal! Solved indeed. Can’t wait to play with it.

Thanks,
-Henk


#9

@hmeij, Thanks for confirming.