Demo: Explore every Github PR between 2012 and 2017


#1

Hey guys just wanted to share something I built over the weekend using mapd:

https://github-investigation.firebaseapp.com/

It may be a little slow because I’m using CPU only instances at the moment.


#2

Hi @shusson this is amazing! Crossfilter is a great paradigm for exploring this dataset, its very easy to see the relative rise and fall of different languages over time.

I know you are running on a CPU instance but I had one idea on something that might speed it up. By default we horizontally partition (or “fragment” in MapD parlance) into partitions/fragments of 32M rows. This means that with 50M rows only 50/32 = 1.56 of the 8 cores on those GCP instances will be busy. You might try reimporting with a smaller fragment size, say 51/8 = 6.375M, to see if you get a speedup. (Fragment size is documented here.)

Best,

Todd


#3

Thanks @todd,

I’m actually already changing the fragment size depending on what infrastructure I deploy on and can confirm that it makes a significant speed up. Soon I’ll have some increased quotas on AWS so next time I’ll use P2 spot instances.

Cheers,
Shane


#4

Hi @shusson,

We saw that you had to take your demo down: “Due to server costs this live demo is no longer available. Contact me if you want to see it or deploy it yourself using the instructions on github.”

We were very impressed at MapD with your demo. Let us know if we can help you with hosting the demo on MapD servers.

Thanks,
Rebecca