Exception: Failed to run the cardinality estimation query: Query couldn't keep the entire working set of columns in GPU memory


#1

Hi,

I’m currently running MapD core with 1.2 billion rows of the nyc taxi data set on the following hardware:

CPU: Intel i7-4790 CPU @ 3.60GHz
Memory: 16GB DDR3 RAM
GPU: NVidia GTX 980

I’m trying to run some of the example queries from Mark’s benchmark blog http://tech.marksblogg.com/billion-nyc-taxi-rides-aws-ec2-p2-8xlarge-mapd.html. but encounter problems on query 4:

SELECT passenger_count,
extract(year from pickup_datetime) AS pickup_year,
cast(trip_distance as int) AS distance,
count(*) AS the_count
FROM taxi_large
GROUP BY passenger_count,
pickup_year,
distance
ORDER BY pickup_year,
the_count desc;

I understand the hardware is far from good , especially with a dataset this size but from what I’ve read about MapD it should still be able to run queries by spilling into RAM and to the Hard disk.

On execution in GPU mode the above query errors with the exception:

Exception: Failed to run the cardinality estimation query: Query couldn’t keep the entire working set of columns in GPU memory

It does however run to completion on CPU mode.

I’ve already read Is there a memory replacement mechanism for mapd? and set both flags mentioned. My config file is as follows:

port = 9091
http-port = 9090
data = "/home/mapd/MAPD_STORAGE/data"
null-div-by-zero = true
gpu = true
allow-cpu-retry = true
enable-watchdog = false

[web]
port = 9092
frontend = “/home/mapd/mapd/frontend”

After the query returns with the error running \memory_summary displays the following:

MapD Server Memory Usage
CPU RAM IN BUFFER USE : 2990.72 MB
GPU VRAM USAGE (in MB’s)
GPU MAX ALLOC IN-USE FREE
0 3432.91 3086.68* 2990.72 95.96

As you can see it fails with lot’s of free RAM to still use. Could anyone suggest why this is?

Additionally could anyone possibly help to diagnose the problem I’m facing and suggest a possible fix to allow the query to run to completion in GPU mode?

Thanks!


#2

Hi,

Your configuration is set up correctly to allow for fall back to CPU execution.

The issue your experiencing is a bug where before the query can attempt execution some cardinality calculations are being done. The fall-back path is not fully implemented in case of failure via this path.

I will raise an issue, so we can correct this path.

regards