Connect Superset/Looker to Mapd


#1

Our business analysts are familar with tools like Looker (https://looker.com) and Apache Superset (https://github.com/apache/incubator-superset).
What would be the best way of connecting Mapd to these query/visualization tools?
We’re currently running Mapd Community Edition on AWS for evaluation purposes.


#2

I’ve asked a similar question on the Looker forum recently, here is what I got in reply:


#3

Hi Dimitri,

Thanks for your reply!
Did you get looker to work with mapd?
If so: how are you currently connecting the two?
And if not: did you contact mapd directly about this?


#4

I’m still testing it locally with an assumption that I will be able to connect somehow :slight_smile:


Mapd as postgres foreign data wrapper?
#5

Hi @jjdr ,

i just finished a minimal (read-only) implementation of a sqlalchemy dialect for Mapd; i can share the code if you want to test


#6

Hi @aznable, that’s awesome to hear! Is this something you’d be willing to share with the community?


#7

Of course but I have to do a package (at least a basic setup) and clean up the code; it’s working but I haven’t tested extensively

I already did a pull request to include the mapd query runner I wrote for Redash; I guess it will be available in the next days.

I am a bit short on time lately so it’s taking more time than expected to write connectors for most promising open source dv tools


#8

Sounds great @aznable! Would be happy to test your code, did you put it on github/somewhere else?
If you don’t want to share publicly you can also send me a DM of course.


#9

UHno problems to share

First of all you have to install pymapd the python driver of mapd

Is available in pipy repository so you can install with pip install pymapd, the you have to download this file

sqlalchemy_mapd.zip (38.1 KB)

to install you have to launch set setup.py with install command so

python setup.py install

the connect string format is
mapd://username:password@host:port/database_name
so mapd://mapd:HyperInteractive:yourhost:9091/mapd

i suggest to try with the newest release of mapd because superset for every statetemt calls a create_engine, then a connect and a close on connect that does not close anything on database :slight_smile: (it release the connection to the sqlalchemy pool, but instead reusing on subsequent call it close the connection and reopen…crazy) so with just 8 session you are running out of connection quite fast.

you can try to add “engine_params”: {“poolclass” : “NullPool” } on extra section of database connection, but on my windows enviroment isn’t working with any dialect; even calling directly sqlalchemy create_engine fail, but maybe is a problem of my enviroment only

if you want time grains avaiable on superset you have to add this class on superset db_engine_specs.py file

class MapdEngineSpec(BaseEngineSpec):
engine = 'mapd’
time_grains = (
Grain(“Time Column”, _(‘Time Column’), “{col}”),
Grain(“second”, _(‘second’), “DATE_TRUNC(second, {col})”),
Grain(“minute”, _(‘minute’), “DATE_TRUNC(minute, {col})”),
Grain(“hour”, _(‘hour’), “DATE_TRUNC(hour, {col})”),
Grain(“day”, _(‘day’), “DATE_TRUNC(day, {col})”),
Grain(“week”, _(‘week’), “DATE_TRUNC(week, {col})”),
Grain(“month”, _(‘month’), “DATE_TRUNC(month, {col})”),
Grain(“quarter”, _(‘quarter’), “DATE_TRUNC(quarter, {col})”),
Grain(“year”, _(‘year’), “DATE_TRUNC(year, {col})”),
)

the performance with Mapd is very good, but it would be better without all those reconnections to database; they are costing 0.1/0.2 seconds fo lag

i dont know is sql lab is working because on my installation, it simply does not work with any database, but you will get the table and columns list, so i am confident it will work for you.

any feedback is appreciated, so i will be able to improve the code


#10

Will report back as soon as I have had time to try this out, thanks again for sharing!