Exception: Cast from dictionary-encoded string to none-encoded would be slow


#1

I have created the following table:

mapdql> \d ind_fg040_geocoded
CREATE TABLE ind_fg040_geocoded (
cod INTEGER NOT NULL,
tipocli INTEGER,
rs TEXT ENCODING DICT(32),
indirizzo TEXT ENCODING DICT(32),
lat DOUBLE,
lng DOUBLE,
accuracy TEXT ENCODING DICT(32),
formatted_address TEXT ENCODING DICT(32),
address_type TEXT ENCODING DICT(32),
status TEXT ENCODING DICT(32),
index DOUBLE)

But this query trows the error in object: I can not get why, because the two columns have exactly the same encoding:

select case when formatted_address IS NOT NULL THEN formatted_address ELSE indirizzo END from ind_fg040_geocoded

Note that thare are some NULL values in formatted_address.

MapD-3.1.1


#2

Hi

When a text field is DICT encoded all that gets to the GPU is a dictionary id (a number referencing the unique occurrence of the text). This static watchdog error is basically trying to tell you that the process of converting/decoding the dict encoded field back to a textual field as required currently by this case is going to be expensive.

You could turn off watchdog and see if the query performs acceptably to you.

Are there lots of duplicates in your formatted_address column or your indrizzio column? You could potentially change these two columns to TEXT ENCODING NONE to hopefully avoid this issue at the expense of GPU memory.

It is an interesting case that we may want to explore more, as it does seem a limitation for this kind of work.

How much data is involved.

Regards


#3

Hi dwayne,
thanks for your support.

In my case formatted address has very few duplicates and is little, having no more then 2000 rows, but of course, this is not the only possible scenario.

Can you point me at the documentation explaining how to turn off watchdog ? I have only 8 GB of GPU RAM and I will try to save it.

In addition, what are the others consequences about turning watchdog off?


#4

I found in another post how to successfully disable watchdog : the test for this query is successful.

But I would like to use --allow-cpu-retry as a way to manage datasets not fitting in GPU, in order to use also the CPU RAM:

Are --allow-cpu-retry and --enable-watchdog=false alternatives?

With watchdog=false, now, do I am not using the 128GB of CPU RAM on my server?


#5

Hi,

--allow-cpu-retry and --enable-watchdog=false are compatible, if I understand your question.

watchdog=false will potentially allow more high memory (more than can fit on the GPU’s) to be executed which will cause swapping of memory in the GPU. It really depends on your queries and your data.

regards


#6

Perfect, is clear. Thank you.