Exception: Query would require a scan without a limit on table(s)


#1

I’d like to scan a large 40 million row table of IP addresses for a set of suspicious addresses that I am interested in. I formulated the query/subquery below - and this works perfectly on another database I’m using (postgres) but it does not work on mapd (both databases are running on the same server).

mapdql> select sip_dip_string from suspicious_ips where sip_dip_string not in (select sip_dip_string from netflow_data);
Exception: Query would require a scan without a limit on table(s): netflow_data

Is there a way to increase the limit on how many rows can be returned by a subquery? Also, how can I go about measuring how much of my 40 million row table I can fit into GPU memory. Is there documentation I could consult to begin to get a deeper level of the tuning characteristics of mapd and how my data set fits in available memory?


#2

Interestingly, reformulating the query along the lines below works fine.

select * from netflow_data where sip_dip_string in (select distinct(sip_dip_string) from suspicious_ips);


#3

The calculation is a simple one; you have to sum up the bytes of columns involved on filtering, join and aggregation then multiply for the number of records of respective tables; the additional Memory needed for operations like distintics varies on cardinalities etc.