Query couldn't keep the entire working set of columns in GPU memory


#1

Hi,
on my CENTOS 7 I have 128GB of CPU RAM, and 8GB of GPU RAM.

I loaded only one table in MapD, 20 millions rows, 30 columns.

I am pretty sure that this table should fit in my CPU+GPU ram, but I keep getting the following error:

“Error: Query failed : Exception: Query couldn’t keep the entire working set of columns in GPU memory
SQLState: null”

It seems that MapD is trying to fit ALL the data in the GPU ram, why doesn’t use also the CPU RAM?


Is there a memory replacement mechanism for mapd?
Exception: Query would require a scan without a limit on table(s):
#2

Hi

Thanks for giving MapD a trial.

Could you share the schema and the query you are trying to run, without that info its rather difficult to tell exactly what options you have and what might be going on.

By default MapD tries to execute the query using the GPU as best it can. In this circumstance it has failed to be able to fit all the components it requires into the GPU to solve the query. Without seeing the nature of the query it is a little difficult to speculate on what may be a useful next step. If the query involves many filtered columns we have some additional options on how best to load the data to take advantage of MapD.

If the nature of some of your queries are too memory ‘heavy’ for the gpu we have some options to drop back and allow more to execute on the CPU.

Lets gather some more details together

regards


#3

Thanks for the quick answer.

Here is the schema of the table:

I cannot execute any type of query, even the most simple one:

select voce from mapd_fh206_fg894_fg510_V3_AC

I always get the same message


#4

and here a sample of data:

In addition, I am using MapD 3.0.0


#5

Nevermind, I solved it.

The problem was that I shut down a tensorflow Jupyter notebook working on GPU in order to free GPU memory for MapD, but I need to stop and restart Mapd : now it found the free memory available, so the query now is working.

Thank you !


#6

I’d also add that while MapD can do projections without limits, its not really optimized for this use case. Doing a projection with a limit or some kind of aggregation will generally be performant but returning 20M rows to the client will generally become network bound.


#7

Hi Todd,
Thanks for the answer, of course I have a limit 100 and don’t execute a full extraction: I presume I would have a Java heap space error in Squirrel, at least.
Cheers.