Export Data/File Corruption [SOLVED: System RAM issue]


#1

I used the COPY command from mapdql to export a 100 million row table to a csv file. It mostly worked - however, there is randomly through the file a few unprintable characters such as ^S or ^F. Also, there were a few lines where two database rows appeared on the same line separated by a “*” character.

I am not sure if there a stability/file system problems with my Ubuntu 16.04 Linux system or if the mapd mapdql/COPY command introduced these errors into the file.

Has anyone else seen these kinds of issues using the COPY command to make a CSV file?


#2

I had as issue with Mapd 4 installed on Ubuntu 16.04 but it was related to projections involving smallint and geometric datatypes, but the data wasnt corrupted as you described.

Maybe you are trying to export too many rows with many colums at once and Mapd is a database thinked for analytics, so it’s very efficent while ingesting big amounts of records and filtering and gruoping them, but it’s not so efficent on projections of large amount of rows.

So i suggest you to divide you export in batches by using rowid attribute.

e.g.
copy (select * from table where rowid between 0 and 1000000)
copy (select * from table where rowid between 0 and 2000000)

copy (select * from table where rowid between 0 and 10000000)

or (if you have anough memory)
copy (select * from table where mod(rowid,1000000)=0)
copy (select * from table where mod(rowid,1000000)=1)

copy (select * from table where mod(rowid,1000000)=9)


#3

I’m an old Oracle DBA so using rowid as a workaround as you described is definitely a useful technique for many issues.

Thanks


#4

I am an Oracle DBA too :wink: Anyway rowid in mapd is quite different than Oracle’s one and definetly easier to use.

I suggest you a different approach than row organized database, because in a columnar is very difficult to implements an efficient projection stage with cursors, expecially on mpp ones


#5

After doing some extensive testing of my linux system (with memtest86+), I have verified that one of the four installed RAM modules I had installed was bad. I removed that bad RAM module and did extensive overnight testing both with memtest86+ and with various application level benchmark programs I’ve developed to do data loads and everything is running error free.

I just wanted to document in this thread in case others come across it in the future, that file corruption problems I ran into were almost certainly caused by bad RAM and not caused by mapd.

I don’t want to unfairly blame mapd for problems not of its making!

Thanks.