SQL: optimisation of complex query


#1

Our project needs to perform a embedded query looking for moving objects in spatio-temporal space. For example, we wrote a SQL in below that objects moving from ‘hotel’ to ‘school’ and then to ‘bank’. Is there any way to optimise the statement because it needs much time to get results? Thanks in advance.

with round2 as (with round as (with hotel as (SELECT pid, lat, lng FROM poi WHERE type = ‘hotel’)

SELECT distinct r1.tid, r1.lat, r1.lng, r1.dtime FROM wtaxi_full r1, hotel h1

WHERE r1.lat < h1.lat+0.01 AND r1.lat > h1.lat-0.01 AND r1.lng < h1.lng+0.01 AND r1.lng > h1.lng-0.01

AND DATEDIFF(‘SECOND’, r1.dtime, TIMESTAMP’2008-03-06 10:21:22’) < 0 AND DATEDIFF(‘SECOND’, r1.dtime, TIMESTAMP’2008-03-06 12:21:22’) > 0)

(with school as (SELECT pid, lat, lng FROM sigpoi WHERE type = ‘school’)

SELECT distinct t.tid, t.lat, t.lng, t.dtime FROM round t, school s

WHERE t.lat < s.lat+0.01 AND t.lat > s.lat-0.01 AND t.lng < s.lng+0.01 AND t.lng > s.lng-0.01 ))

(with bank as (SELECT pid, lat, lng FROM sigpoi WHERE type = ‘bank’)

SELECT distinct r.tid, r.lat, r.lng FROM round2 r, bank b

WHERE r.lat < b.lat+0.01 AND r.lat > b.lat-0.01 AND r.lng < b.lng+0.01 AND r.lng > b.lng-0.01

AND DATEDIFF(‘SECOND’, r.dtime, TIMESTAMP’2008-03-06 10:21:22’) < 0 AND DATEDIFF(‘SECOND’, r.dtime, TIMESTAMP’2008-03-06 12:21:22’) > 0)


#2

Hi @bobofee,

Could you please elaborate on the dataset you are using, i.e. the DDL for the tables, how many rows/columns and how much time it’s currently taking. Can you also send us the output from mapdql commands - \memory_summary and \memory_gpu.

Regards,
Veda


#3

if you can you can start removing datediff and using between instead

evidence

> select count() from yellow_tripdata where datediff('second',tpep_pickup_datetime,timestamp '2014-02-02 10:00:00') <0 and datediff('second',tpep_pickup_datetime,timestamp '2014-02-02 12:00:00')>0;
> Execution time: 150 ms, Total time: 150 ms
> 
> select count() from yellow_tripdata where tpep_pickup_datetime between timestamp '2014-02-02 10:00:01'  and timestamp '2014-02-02 11:59:59';
> Execution time: 9 ms, Total time: 10 ms

there is an improvement of 13-15 times on the filtering operation


#4

It does work! I run this SQL, and the running time drops from 33100ms to 9981ms. Thank you so much @aznable