Within a single server, MapD Core can handle datasets 1.5-3TB in raw size in GPU RAM


#1

what does it mean?1.5T-3TB is too big,how can I achieve it in a single server?


#2

Hi

Just to be clear this is referring to the raw size of the CSV being loaded into the DB.

There are two main things that come into play:

Firstly, during the load the data is compressed into the smallest reasonable form we can work with, Consider a textual TIMESTAMP field (eg: 31-Oct-13 11:30:25 -0800) 24 bytes in the CSV file, but in the database form with fixed encoding it is only 4 bytes, a reduction of 6x. If there are lots of repeated strings with Dictionary encoding many bytes might be reduced to 4 or less bytes depending on cardinality we often see very large reduction on log file type data where the text is semi structured and highly repeated.

Secondly, only columns required for filters, aggregates and calculations actually are loaded onto the GPU.

With these two factors in mind you should hopefully be able to see that depending on your original dataset and the questions you need to answer, you can easily have raw data size far larger than the GPU memory required.

You may want to download our white paper and read through that to get some additional information.

As a side note we do allow for paging to the GPU from CPU memory if a query really needs more GPU than is available to solve the query, but normally we would recommend looking at a larger box or going distributed.

Regards