Hamming Distance / XORing Matrix


#1

Good day!

I determine degree of similarity between cells in rows.

In MySQL I can use function BIT_COUNT(a ^ b) to calculate Hamming Distance.

Now I testing MapD Version: 3.6.0-20180313-7a75b56 (GPU) and I do not known how to calculate Hamming Distance here?

or why does not exist matrix functions? GPU must work with matrices perfectly!

Thanks.


#2

Hi @seregaperm,

Thanks for your message. We already allow extension functions (essentially row-level compiled UDFs) on scalars and arrays, just not text yet. With an extension function you could easily write the logic for xor and bit_count to do what you want.

It shouldn’t be hard to add the ability to enable running extension functions on text, and so we’ll try to take a look at this as soon as we have the bandwidth. Will keep you posted!

Regards


#3

I working with BIGINT data and counting Hamming distance (not text).

I try to decompose my BIGINT to bits by cells and got over 500 cells to found Hamming distance (I have 9 BIGINT cells in my data). But its very slow:

H = abs(a1+b1-1) + abs(a2+b2-1) + abs(a3+b3-1) + … + abs(a500+b500-1)

I think, this task can resolve by hardware much faster …

Of course I would like to working with very long numeric sequence (up to 1000 bytes) as a single value

But thanks, be waiting.


#4

Hi @darwin,

it would be useful to know how do tefine this UFSs function; i tried to add a function on ExtensionFunctions.hpp with no success, because the calcite server didnt recognize the added funtion.

a bit_count function would be easly efficently on GPU with intrinsic functions __popc and __popcll