Geospatial manipulation example


#1

I am very excited when this 4.0 version was released.
There are so many added features including geospatial objects.

from the documentation, the query for determine how many points which inside a polygon, is something like …
SELECT count(*) from geo1 where ST_CONTAINS(poly1, ‘POINT(0 0)’), which poly is my determined polygon.

if I have longitude and latitude with float data type, is there any easy method to query my total points inside my polygon?

thanks before


#2

do you mean joining two tables one containing points and the other one containing a polygon? fwik it’s not possible at the moment.

i ended up creating a lookup table with a python script


#3

Hi @easymavinmind,

To your question, 4.0 doesn’t have the binary constructor for points st_mkpoint (see here for usage in Postgis). To achieve what you’d like, you’ll have to import the point data as a bona-fide point column and not as lon/lat floats or doubles. However we are planning on adding st_mkpoint in an upcoming release to allow this use case.

Hi @aznable,

Joins (assuming you have a table with a point type and a table with a poly type) indeed possible, although with a few kinks and caveats that we hope to work out in future releases.

First off, you need to enable loop joins via the --allow-loop-joins argument to mapd_server. Second, you’ll get the best performance if the poly table is on the right of the join, i.e. SELECT COUNT(*) FROM point_table, poly_table WHERE ST_CONTAINS(poly_col, point_col). Currently though we reorder tables for joins such that the largest tables get pushed to the left. So if your point table is smaller than your poly table, you should make sure to a) put it on the left side of the join from clause, and b) set the mapd_server argument --from-table-reordering=false.

Also currently we don’t allow an update via a subquery, which is what you would need to enrich a point table with poly ids, but this is something we are working on as well.

Final caveat is that since MapD does not yet support dynamic spatial hashing (or spatial indexes), you will be doing a brute force loop join between the two tables. Due to the computation power of GPUs, this actually works pretty well when the cartesian cardinality of the two tables is reasonable (i.e. billions, not necc trillions), i.e. 10M points agains 3K counties, but even GPUs can’t beat physics as the point or poly tables get very large. Adding dynamic spatial hashing is something else we’re investigating currently.

In summary, currently the spatial support is best used for things like filtering all points in a single poly, and of course for rendering. That said, more complex things like joins do work, and our plan is to make them both easier to use and significantly more performant in upcoming releases.

Regards


#4

thanks @darwin

thanks for the parameter; i already tried with a cross-join and a filtering with no luck. In this way is fatser and the results are more accurate than the python script


#5

hi @darwin,

yes. mkpoint is exactly what I am looking for.
Thanks for your suggestion too. I definitely will do it to achieve what I state before.
and I cant wait for this feature to be released soon.


#6

hi @darwin

i have a further question about querying total data inside multipolygon.

i have a multipolygon (quite long) which represents boundary of any region, but mapd could not handle this query and restarted.
meanwhile, when I query with a polygon (also, a long one), it worked very well.


edited

solved, because my query is too long


#7

Hi @easymavinmind,

Sorry to hear this. Can you give us a bit more detail on the query you are running? Also if you have any ability to share the dataset that would be great as well.

Thanks in advance for your help!

Regards


#8

FYI, the binary point constructor is now on the official geo roadmap on GitHub. The work-around for 4.0 is to re-import from original source with a DDL specifying Point spanning consecutive columns with the x,y coordinates in decimal degrees WGS84 (see new geodocs). I’m writing a blog post about this to be released shortly.


#9

here is one example of my queries that causing crashed multipolygon.txt (779.6 KB)

@darwin, how could I set a table to have POINT datatype? I can only do it if I manually import csv data to mapd immerse (even when I set this to POINT datatype, I can not query this field, the error said it was a TINYINT) . I have tried to use spark to stream from kafka, but it cannot be set as POINT datatype. Any clue?


#10

You want something like:

CREATE TABLE bld_sc_points (bldID BIGINT, mapd_geo GEOMETRY(POINT,4326);

The 4326 is the spatial reference id or ‘SRID’ for WGS84 lat/lon. That’s pretty much required for downstream use of the points. (It defaults to undefined, if you don’t specify.)


#11

i think he asked for a way to insert point datatatypes from kafka so with jdbc driver i guess; i got problems with python driver because when i try to insert a row with an id and a point (i havent tried other geo datatypes) the drivers cannot do the insert saying i am trying to insert a row with 2 fields on a table with 4 fields, so i am forced to generate a text file and import with copy command


#12

This is true. As far as I know, spark cannot make a table with point datatype