Select field, count(*) from table group by field, The result is confuse


#1

I use sql to select my tables, by the result is confuse me.
The table’s count of row is 128571365
Then select condition unit_id = 2238, the count is 369458. It is ok
But I select by group unit_id. The result confuse me .

where i has mistake? the unit_id = 2238 , count is 8590227878, it is bigger then the count of table, why?


#2

Hi,

Yes, those results don’t look right based on your details.

Would it be possible to try these same SQL queries from the mapdql command line tool, and include all the output here?

regards


#3

I have try this sql use mapdql, The result is same.


#4

Could you please confirm the query is actually

Select unit_id, count(*) unit_count from f_flow where platform = 'door' group by unit_id limit 10;

That there is a * in the count(*)

regards


#5

mapdql> select count() unit_count from f_flow;
unit_count
128571365
mapdql> select unit_id, count() unit_count from f_flow where platform = ‘door’ group by unit_id limit 10;
unit_id|unit_count
2238|8590227878
3001|6442451268
3207|4294969355
2230|8589936204
3000|8589997725
2942|6442452776
2908|8589937217
2838|8589971440
2837|2147483711
2490|4294967744
mapdql> select count(
) unit_count from f_flow where platform = ‘door’ and unit_id = 2238;
unit_count
293290


#6

Thanks

What version are you using and what platform are you running on.

Is it possible to get a copy of the dataset, this is occurring on?

regards


#7

I use mapd 3.0. The data has something about my customer. I can’t copy it to you. sorry


#8

Hi @llz, what hardware (particularly number and type of GPUs) are you running on?


#9

Thanks

Where did the version of the software come from (AWS, build or CE download)

What precise hardware platform are you running on

regards


#10

oeasy@oeasy:~/mapd-3.0.0-render/bin$ nvidia-smi
Mon Jun 5 13:04:24 2017
±----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66 Driver Version: 375.66 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108… Off | 0000:02:00.0 Off | N/A |
| 23% 33C P8 9W / 250W | 2810MiB / 11172MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 GeForce GTX 108… Off | 0000:03:00.0 Off | N/A |
| 23% 38C P8 10W / 250W | 2730MiB / 11172MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 GeForce GTX 108… Off | 0000:82:00.0 Off | N/A |
| 23% 31C P8 8W / 250W | 2730MiB / 11172MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 3 GeForce GTX 108… Off | 0000:83:00.0 Off | N/A |
| 23% 32C P8 9W / 250W | 2730MiB / 11172MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 805 C+G …e/oeasy/mapd-3.0.0-render/bin/mapd_server 2808MiB |
| 1 805 C+G …e/oeasy/mapd-3.0.0-render/bin/mapd_server 2728MiB |
| 2 805 C+G …e/oeasy/mapd-3.0.0-render/bin/mapd_server 2728MiB |
| 3 805 C+G …e/oeasy/mapd-3.0.0-render/bin/mapd_server 2728MiB |
±----------------------------------------------------------------------------+
The version of software, I download from your website. mapd-ce-latest-Linux-x86_64-render.tar


#11

I have try this sql on other table, the table has only 4000 records. The result is right.


#12

Thanks

Could you share the schema and some obsfucated data.

What is the OS you are running on? uname -a?

regards


#13

Description: Ubuntu 16.04.2 LTS
Release: 16.04
Codename: xenia

CREATE TABLE IF NOT EXISTS f_flow (
platform TEXT,
uid TEXT,
pid TEXT,
ref_a TEXT,
ref_plat TEXT,
os TEXT,
dt TEXT,
rate TEXT,
sz TEXT,
lg TEXT,
timer TEXT,
url TEXT,
start_time BIGINT,
level_time BIGINT,
ip TEXT,
vid TEXT,
st BIGINT,
sid TEXT,
u_create_date TEXT,
country TEXT,
area TEXT,
region TEXT,
city TEXT,
county TEXT,
isp TEXT,
s_type TEXT,
s_word TEXT,
version TEXT,
short_timer TEXT,
street TEXT,
community TEXT,
gps_iterms TEXT,
volume INT,
unit_id INT,
door_code TEXT,
signal TEXT,
network TEXT,
open_door INT,
open_status TEXT,
response TEXT,
data_source TEXT,
create_date TEXT
);


#14

Thanks

What is the RAM in this box? What CPU/s?

How did you install your nvidia driver, I see it is a newer than then default one I believe for ubuntu LTS from nvidia?

trying to solve the puzzle :slight_smile:

regards


#15

Can you tell us what the min and max are for the unit_id column? Also are there any nulls in that column?


#16

Hi liz

Would it be possible to change your schema so that unit_id is declare as a TEXT field rather than INT.

Then reload and give the query another try?

regards


#17

My computer is 8 Core and RAM is 64G. unit_id has nulls in that column, min=0, max=971102.


#18

Then,
mapdql> select uid, start_time from f_flow where platform = ‘door’ order by start_time desc limit 10;
Exception: Query would require a scan without a limit on table(s): f_flow
Can someone help me , what that mean?


#19

Thanks, You advice is right, When i change the schema of unit_id as a TEXT field, It works well.


#20

Btw @llz we’ve replicated the issue you described and should have a fix soon. Thank you for patiently walking us through it!


Encounter an exception: Query would require a scan without a limit on table(s)