Data Load Failing - StreamInsert



I’m attempting load 1B rows into a table via CSV piped to StreamInsert. I can get 750M rows in fine. However, once it gets towards 800M it’s failing. All core MapD processes seem to still be running except Calcite.

The message in the log files is

F1005 15:04:06.242560 51235 File.cpp:40] Check failed: f

The overall size of the data is large at > 200G, so I’m not sure if that’s related. It’s running on an EC2 r4.16xlarge, so it should have the memory to cope with the amount of data.

Any ideas/translation of Check failed: f ?



I believe you are probably running into a ulimit restriction of the user you are running.

Please check that you have set your ulimit to more than 1024 to allow for more files to be created

How are you starting MapD? If you are using systemd it should be taking care of the ulimit

If not that, check if you have run out of space on the disk.

I will raise an issue to improve the error message there prior to the check.



Thanks @dwayneberry, I’ll check ulimit, sounds a likely candidate.

Actually, I spoke to soon. ulimit is reporting as unlimited so looks solid. There’s over 600G free on the disk, so that should be good.

I’m just running MapD nohup at the moment, rather than systemd.



What does the output of ulimit -a say, we are specifically interested in open files



Yup, the max open files were still at 1024. My bad.

Just testing the node again now.