Unable to copy from text file using max_reject


#1

I am getting following Exception: String too long for dictionary encoding.

here is the script i am trying to execute:

max_reject = 1000000

COPY table_1 from ‘input_file’ with (delimiter = ‘\t’, max_reject = 1000000);

Even increasing max_reject does not seems to work ? Am i doing anything wrong here?


#2

Hi,

Could you share some details.

Are there some known bad strings (ie very long?) in the file?

Please share the schema and the first 50 lines of input_file

How large are the strings you are trying to load?

regards


#3

Hi,

i was able to reproduce your issue with a small test case.

it appears that if there is one bad exceeding long string it cause an exception that stops the entire COPY.

This is not the correct behavior. I will create an issue and we will prioritize the fix.

For now you will have to clean up your input file to keep individual string lengths under 65K.

regards


#4

I am loading a dataset of 30GB, i don’t know exactly where it is failing.

What i know, even when the COPY has failed, it was able to import 0.4M from 89M records. Chaning max_reject is not helping. Also, i wanted to know by default STRING has encoding of DICT(32) right?


#5

Hi

Yes default ENCODING DICT (32) is default. This is unrelated to the length of the string.

This is the number of unique strings a dictionary can contain.

regards


#6

Hi,

Your long string issue causing COPY failure should be resolved with the fix to this issue

https://github.com/mapd/mapd-core/issues/52

If you are using the open source release you can try it now, otherwise it should be out in the next release early next week.

regards