Query a row while grouping by only one column


#1

Is there a way to select rows of column1 and column2 where column1 is distinct and column2 doesn’t need to be distinct?

When I try select distinct column1, column2 from table1 limit 100 I get the error Failed to get a high quality cardinality estimation. When I try to use select column1,column2 from table1 group by column1 limit 100 then I get an error that column2 is not being grouped.

I can understand how this would be problematic, because MapD wouldn’t know which values for column2 to return, but it actually doesn’t matter which of the available results get returned for column2. Can it be done?


#2

Hi,

What is the type of column2. MapD does not currently have a SAMPLE style aggregator, but if column2 is a numeric/date or time you could choose the MIN or the MAX.

regards


#3

Column 2 is a string, not dictionary encoded. Just in case I checked that min doesn’t work on a string.

My end solution was something like:
select column1,column2 from table1 where column1 in (select distinct column 1 from table1 limit 100) and then I reduce it on my end.

The real query actually has several columns, and grouping by all of them crushed the performance. This is the most performant way I’ve found to accomplish this.


#4

This type of query can be solved using the rowid pseudo column.

In this example, carrier_name is column1, and dest_city is column2.
Because you can get a min(rowid) if you then join back to the same table on that rowid, you can get your sample value for the second column.

SELECT a.carrier_name, b.dest_city
FROM (
SELECT a.carrier_name, min(a.rowid) rowid
from flights_2008_7m a
GROUP BY 1 LIMIT 100) a join
flights_2008_7m b on (a.rowid=b.rowid);


#5

p.s. didn’t realize it had been 3 months. I am new to the community and was reading through the forums. :slight_smile:


#6

Hi @RobertDupuy. The time goes really quickly. I’ve been really happy to discover MapD and the incredible help of people like @dwayneberry.

The data set I have doesn’t actually have a row id. Due to the size of the data files and the lack of support for update statements, I use sed on the csv files to clean and pre-process the rows, removing some columns and merging others. Maybe I could add row ids in the same way.

What does GROUP BY 1 do?


#7

Rowid is a pseudo column that exists in any mapd table. Group by 1 is equivalent to group by column1 in your query


#8

Neat, thank you.:grinning: