StreamImporter in Docker fails to import large (>5GB) CSV files


#1

Hi, I’m trying to load a large (>5GB) CSV file to MapD using StreamImporter.
I have MapD in Docker (CPU only).
Here is what I do:

  1. docker run -d -v $HOME/mapd-docker-storage:/mapd-storage -p 9090-9092:9090-9092 mapd/mapd-ce-cpu to start the container
  2. docker exec -it <containerID> bin/mapdql to start the client
  3. CREATE TABLE IF NOT EXISTS basic ( ...); to create a table
  4. cat /mapd-storage/bas.csv | bin/StreamImporter basic mapd -u mapd -p HyperInteractive --delim ',' --batch 1000000 to import the file

What happens is that som rows are imported and then suddenly the container terminates and in the logs I found this error Could not sync file to disk.

I’ve also tried other import methods as well, but with StreamImporter I was able to import the most number of rows.

Thanks for any suggestions…

Here is the entire log:
Log file created at: 2018/03/22 08:25:31 Running on machine: 10bbe3099ee0 Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg E0322 08:25:31.523133 8 MapDHandler.cpp:152] This build isn't CUDA enabled, will run on CPU E0322 08:25:31.980234 8 MapDHandler.cpp:184] No GPUs detected, falling back to CPU mode F0322 08:35:43.294337 54 FileMgr.cpp:529] Could not sync file to disk


#2

Hi,

Its a poor error message.

It is likely that this is reporting that you have run out of disk space.

regards