MapD core p2.8xlarge large table distribution


#1

Hello,

I have a table with 1000s of columns and more and 500,000,000 rows.

I loaded the table in MapD.

Initially, I was using p2.xlarge instance to work with this table. According to the size of table it was not fitting in to GPU RAM. So, I thought of having a larger instance (i.e. p2.8large)

I am able to see that the table is loaded in just one of the GPU out of given 8.

snapshot of nvidia-smi is as follow.

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1785    C+G   /opt/mapd/bin/mapd_server                 11439MiB |
|    1      1785    C+G   /opt/mapd/bin/mapd_server                   2812MiB |
|    2      1785    C+G   /opt/mapd/bin/mapd_server                   2812MiB |
|    3      1785    C+G   /opt/mapd/bin/mapd_server                   2812MiB |
|    4      1785    C+G   /opt/mapd/bin/mapd_server                   2812MiB |
|    5      1785    C+G   /opt/mapd/bin/mapd_server                   2812MiB |
|    6      1785    C+G   /opt/mapd/bin/mapd_server                   2812MiB |
|    7      1785    C+G   /opt/mapd/bin/mapd_server                   2812MiB |
+-----------------------------------------------------------------------------+

I want to know, will one table be loaded in one GPU only? (I am using MapD Core Community Edition)

Is there any configuration by which I can tell MapD to load this large table across muliple GPUs?

Thanks.

Best,
Ankit


#2

try with sharding, even it would be useful for joins, but it’s worth a try

https://www.mapd.com/docs/latest/mapd-core-guide/tables/#create-table

Recommendations

Set shard_count to the number of GPUs you eventually want to distribute the data table across.