MySQL and RedShift function compatibility


#1

Is there any documentation that compares MapD supported SQL functions with those of other databases like MySQL and RedShift? I’m looking to port some queries for performance reasons. It would be nice to see where the gaps are to determine how difficult a rewrite would be.

I’ve looked at the DML (https://www.mapd.com/docs/latest/mapd-core-guide/dml) and DDL (https://www.mapd.com/docs/latest/mapd-core-guide/data-definition). Are those the definitive sources and always up-to-date?


#2

Hi,

There is no documentation around comparisons with other database’s functions.

The documentation you have highlighted is definitive and is kept as up to date as we can manage.

If there is a specific function or question about a function please feel free to ask.

Regards


#3

Thanks.

We frequently use LEFT(date_field, 7) to get results in the yyyy-mm format. I don’t see any string manipulation functions. I can extract the month and year date parts but not put them together again. How would you do this?


#4

Hi,

I would recommend you try the DATE_TRUNC() function and truncate to the month.

For example SELECT DATE_TRUNC(month, mytimestamp)) FROM mytable limit 50;

It will not give you the truncated string but it will give you a date based on the month logically truncated to your needs hopefully.

Regards


#5

Thanks.

Unfortunately Redshift requires month to be a string (for example DATE_TRUNC('month', mytimestamp), see http://docs.aws.amazon.com/redshift/latest/dg/r_DATE_TRUNC.html). So this doesn’t allow direct reuse of scripts between the two databases. Could the string value also be supported?


#6

Interesting. Here’s a specific case where dynamic encoding of the string would perform well, guessing that the number of month strings is pretty well bounded at 12. Maybe there’s some more general heuristic for encoding dynamic strings of reasonable cardinality, or passing short short ones. SUBSTRING yielding a finite moderate length might be no worse than fixed width types. Maybe a rats nest of special cases, or maybe this figures into calculated space estimate for when to push it to CPU RAM. Thoughts, @dwayneberry? @asuhan?


#7

I’d rather make DATE_TRUNC work with strings instead if it solves @dallinregehr’s problem. Pro tip: PG_DATE_TRUNC('month', mytimestamp) is actually equivalent to DATE_TRUNC(month, mytimestamp) in our system.

Substring in itself is not that hard, we can do something along the lines of string_view. If we don’t want to filter / group on substring expressions, it can be done. Basically, if it’s only used for the formatting of the final result, there’s no reason we shouldn’t have it. We still have to weigh it against other things we can do, but it can be done.