LEFT JOIN performance


#1

Hi,
I’m currently testing MapD as potential supplement to BigQuery. I’ve constructed a typical query that we run on BQ to run on MapD and was wondering whether the results I’m getting are correct or there is some optimisation potential.

We have two tables with the following # of rows and unique user_ids:

  1. user_stats (facts table)

  2. users (dimensions table)

The field user_id is defined in both tables as “user_id TEXT ENCODING DICT”

We now try to run the following test query.

    SELECT 
    stats.date,
    u.age,
    count(1)
    FROM user_stats_90d stats
    LEFT JOIN users u USING(user_id)

    GROUP BY 1,2
    LIMIT 10

This query takes about 10s to finish (when the data is already in memory)

Is that an expected performance in this case or am I doing something wrong?

My setup:

  • Latest community edition of MapD (cpu only version on Ubuntu)
  • Google Cloud Compute engine instance with 64cpu and 420GB of memory (nothing else running on it)
  • CPU platform: intel skylake
  • 415GB of free memory after the test

Looking forward to your replies.
Best,
Dimitri


#2

Hi @dimitrimasin, thanks for your post.

I noticed that your join is not going to be 1-to-1 (i.e. there are multiple entries in the dimension table for each user), and such joins are likely never to be as performant as 1-to-1 joins. That said, I think we can improve performance by tweaking a few things.

By default MapD partitions data in 32M “fragments”, and when running on CPU we split work between cores by partition/fragment. So in your case only two cores would be used, since 56M rows / 32M default fragment size = 1.75.

The first thing I would try is making the fragment size smaller, in your case perhaps 880000 rows so that each of the 64 cores has roughly equal data to work with(documentation here). I do worry about overhead of merging all that data at the end of the query, so it may end up that a bigger fragment size may work better.

The second thing we can try if fragment resizing doesn’t help performance enough is sharding both tables on user_id (docs here). But I would recommend trying the first approach and seeing if it helps performance first. Let us know what you see.

Regards


#3

@darwin thanks a lot for your suggestions. Whenever I’ve tried to define fragments size MapD didn’t load the data after waiting for hours, no error message, no cpu or disk usage.
I’ve also tried sharding which has worked well. 4 shards brought the execution time to 3 secs, 8 shards to 1 sec and 16 shards to 0.5sec.

Would you expect an additional speed-up from a GPU if it were a GPU setup?


#4

it depends which and how many GPUs you have installed on system.

trying with two tables with a realtion 1-n like your and using 2 cores of an haswell clocked @4.2ghz (you are using an estimate of 13-14) and a pascal GPU with 3584 CUDA Cores the response times are the following

mapdql> \gpu
mapdql> select DATA_EMISSIONE, STATO_FATTURA_PCC,count() from fatture join cig_cup on fatture.NUME_PROG_FATTURA=cig_cup.NUME_PROG_FATTURA group by 1,2 limit 10;
2014-01-01|LA|36
2014-01-02|LA|348
2014-01-03|LA|125
2014-01-04|LA|15
2014-01-05|LA|16
2014-01-06|LA|16
2014-01-07|LA|357
2014-01-08|LA|262
2014-01-09|LA|324
2014-01-10|LA|391
10 rows returned.
Execution time: 378 ms, Total time: 388 ms

mapdql> \cpu
mapdql> select DATA_EMISSIONE, STATO_FATTURA_PCC,count() from fatture join cig_cup on fatture.NUME_PROG_FATTURA=cig_cup.NUME_PROG_FATTURA group by 1,2 limit 10;
2014-01-01|LA|36
2014-01-02|LA|348
2014-01-03|LA|125
2014-01-04|LA|15
2014-01-05|LA|16
2014-01-06|LA|16
2014-01-07|LA|357
2014-01-08|LA|262
2014-01-09|LA|324
2014-01-10|LA|391
10 rows returned.
Execution time: 5936 ms, Total time: 5949 ms

the cardinality of tables is the following

mapdql> select count() from fatture;
61789225
1 rows returned.
Execution time: 14 ms, Total time: 14 ms
mapdql> select count(
) from cig_cup;
39894179
1 rows returned.
Execution time: 14 ms, Total time: 15 ms

using more 12-13 cores the CPU response time would be around 1 sec, so it’s likely that with a gpu enabled system you would get a big speedup


#5

Hi @dimitrimasin,

Sorry to hear that. Can you paste your DDL for the table creation? What fragment size did you use.

Glad to hear that sharding worked well. As @aznable mentioned, adding GPUs should drop the query times significantly, and it should scale well to multiple GPUs particularly with sharding enabled (that said, timings will be dependent on a variety of factors).

Regards