Unique Set of Columns


#1

I have billions of records with columns A, B, C, D. I’d like to keep track of only the unique combination of columns. Is there an efficient way to do that? In other words, I’d like to insert a new record only if that combination has never been seen before.

Is there an efficient way to do this in mapd?


#2

Hi,

I am assuming you are not referring to cross combinations of same values here.

ie

1,2,3,4,5

is not the same as

5,4,3,2,1

MapD has no inbuilt constraints to do this.

There was discussion here on a process you could use to do deduplicated incremental inserts to a table

See if that is useful for your purposes.

Regards


#3

Correct - no cross combinations.

I read through the “Avoiding Duplicate Rows” thread. In my case, all the data will come from a single table in another database.

Ideally, the functionality I want to eventually develop is to detect when a new combination of values comes in. Perhaps I can initially seed the mapd table by using some other tool to deduplicate outside of mapd before loading and then just use mapd to do querys on new records as they come in to see if they exist in mapd yet…