MapD Tweetmap demo


#1

I am interested in learning about how MapD generates top hashtags for a particular spatio-temporal query.

As I understand, MapD Core will retrieve all tweets inside a spatio-temporal region and generate top hashtags from the results. Is everything done on GPU?


#2

Hi,

Thanks for your interest in MapD.

Yes the top hashtags query all runs on GPU. It filters the set of hashtags down by the spatio-temporal contraints and then does a count group by on the unnest hash tags,

hashtags are stored as an array of strings.

An example query looks like this:

SELECT UNNEST(hashtags) as key0,COUNT(*) AS val FROM tweets_new WHERE (goog_x >= -20037508.33999941 AND goog_x < 20037508.33999947) AND (goog_y >= -19362086.710562475 AND goog_y < 19362086.71052295) GROUP BY key0 ORDER BY val DESC LIMIT 227

regards


#3

Very interesting! Thank you for your response. So as I understand, hashtags column in tweets_new table is encoded as TEXT DICT?


#4

Hi

It is an array of text dict

the table definition is

CREATE TABLE tweets_new (
tweet_id TEXT ENCODING NONE,
tweet_time TIMESTAMP ENCODING FIXED(32),
lat FLOAT,
lon FLOAT,
sender_id BIGINT,
sender_name TEXT ENCODING DICT(32),
location TEXT ENCODING DICT(32),
source TEXT ENCODING DICT(16),
reply_to_user_id BIGINT,
reply_to_tweet_id BIGINT,
lang TEXT ENCODING DICT(8),
followers INTEGER,
followees INTEGER,
tweet_count INTEGER,
join_time TIMESTAMP ENCODING FIXED(32),
tweet_text TEXT ENCODING NONE,
country TEXT ENCODING DICT(8),
admin1 TEXT ENCODING DICT(32),
admin2 TEXT ENCODING DICT(16),
place_name TEXT ENCODING DICT(32),
state_abbr TEXT ENCODING DICT(8),
county_state TEXT ENCODING DICT(16),
origin TEXT ENCODING DICT(8),
hashtags TEXT[] ENCODING DICT(32),
tweet_tokens TEXT[] ENCODING DICT(32),
goog_x FLOAT,
goog_y FLOAT,
is_exact BOOLEAN)

regards


#5

Thank you so much for that information! I plan to use MapD Core to create something similar to Tweetmap, and this would help me a lot!


#6

Great,

Hope to see it in action soon.

One note: You do not need to use mercator projection (goog_x, goog_y) anymore you can use lat an lon directly

There is a project underway to bring the full tweetmap app up to date and share it as a sample of a custom built app on MapD framework but it may be too late for you, if you are already underway.

regards


#7

Thank you for this information! Actually I am working on the backend right now, so hopefully I will be able to reuse part of the frontend.


#8

Do Arrays work in the current release?

I only ask because I do not see them mentioned in the documentation (but boy, am I glad to see them)


#9

Hey acmeguy,
MapD Core does have basic support for arrays, currently for non-distributed architectures only. For e.g.

select unnest(hashtags),count(*) from tweets group by unnest(hashtags) limit 50

Further documentation here.

Best,
Ed


#10

Can confirm. I was able to create a simple tweets table with tweet_text TEXT[] ENCODING DICT(32) and do aggregation based on that field.