MapD CE crashes on relatively small dataset with a star schema (resolved in v 3.3.0)


#1

Hi.

I have errors Thrift: Thu Jul 27 14:59:50 2017 TSocket::write_partial() send() <Host: localhost Port: 9091>Broken pipe and other errors in situation that very similiar to that unresolved topic: SlabTooBig exception

I have relatively small datase with a classical star schemat: 78M rows in fact table and 30K in biggest dimension table

Given that I get the server crashed with such queries as

select productName, storeName, sum(measure2) as salesAmount
  from main_data, invent_table, invent_location
  where main_data.itemKey=invent_table.itemKey
   and main_data.locationKey=invent_location.locationKey
   and documentType = 'Sales'
  group by productName, storeName
  order by salesAmount desc
 limit 100;

explain calcite for that query looks reasonably:

  Explanation
LogicalSort(sort0=[$2], dir0=[DESC])
  LogicalAggregate(group=[{0, 1}], salesAmount=[SUM($2)])
    LogicalProject(productName=[$16], storeName=[$33], measure2=[$7])
      LogicalFilter(condition=[AND(=($2, $13), =($3, $31), =($1, 'Sales'))])
        LogicalJoin(condition=[true], joinType=[inner])
          LogicalJoin(condition=[true], joinType=[inner])
            EnumerableTableScan(table=[[mapd, main_data]])
            EnumerableTableScan(table=[[mapd, invent_table]])
          EnumerableTableScan(table=[[mapd, invent_location]])

schemas for these three tables are:

mapdql> \d main_table
CREATE TABLE main_data (
dateKey SMALLINT,
documentType TEXT ENCODING DICT(8),
itemKey SMALLINT,
locationKey SMALLINT,
propertyKey SMALLINT,
itemLocationKey INTEGER,
measure1 FLOAT,
measure2 FLOAT,
measure3 FLOAT,
measure4 FLOAT,
measure5 FLOAT,
liquidity TEXT ENCODING DICT(32))


mapdql> \d invent_table
CREATE TABLE invent_table (
itemKey SMALLINT,
productCounter SMALLINT,
productCode TEXT ENCODING DICT(32),
productName TEXT ENCODING DICT(32),
vendorCode TEXT ENCODING DICT(32),
Direction TEXT ENCODING DICT(32),
Category TEXT ENCODING DICT(32),
Groups TEXT ENCODING DICT(32),
Subgroup TEXT ENCODING DICT(32),
Unit TEXT ENCODING DICT(32),
Brand TEXT ENCODING DICT(32),
categoryManager TEXT ENCODING DICT(32),
RatingProductPOCompanyTO SMALLINT,
SalesABCc TEXT ENCODING DICT(8),
MarkupABCc TEXT ENCODING DICT(8),
XYZ TEXT ENCODING DICT(32),
XyzProductCVofMonth INTEGER)

mapdql> \d invent_location
CREATE TABLE invent_location (
locationKey SMALLINT,
groupStores TEXT ENCODING DICT(32),
storeName TEXT ENCODING DICT(32),
storeCode TEXT ENCODING DICT(32),
storeType TEXT ENCODING DICT(32),
companyAll TEXT ENCODING DICT(32),
storesSubgroup TEXT ENCODING DICT(32),
groupSubdivision TEXT ENCODING DICT(32),
company TEXT ENCODING DICT(32),
subdivision TEXT ENCODING DICT(32),
roleSubdivision TEXT ENCODING DICT(32),
organization TEXT ENCODING DICT(32),
supervizer TEXT ENCODING DICT(32),
storeTag TEXT ENCODING DICT(32))

Please, tell me, what I’m doing wrong here


#2

Some additional information.
I make that research on a ubuntu machine from velocihost.net with 32Gb of memory and NVidia pascal 1060 with 6Gb of memory
What is interesting that similiar query on fact table run just fine and even not tap into so much memory.


select itemKey, locationKey, sum(measure2) as salesAmount
  from main_data
  where 
   documentType = 'Sales'
  group by itemKey, locationKey
  order by salesAmount desc
  LIMIT 100;

.....

7459|53|0.000000
2672|49|0.000000
3779|47|0.000000
3779|8|0.000000
11828|161|0.000000
7459|51|0.000000
1155|78|0.000000
6981|5|0.000000
12261|29|0.000000
17687|57|0.000000
100 rows returned.
Execution time: 189 ms, Total time: 190 ms
mapdql>

mapdql> \memory_summary
MapD Server CPU Memory Summary:
            MAX            USE      ALLOCATED           FREE
    25711.76 MB      640.75 MB     4096.00 MB     3455.25 MB

MapD Server GPU Memory Summary:
[GPU]            MAX            USE      ALLOCATED           FREE
  [0]     5467.91 MB      640.75 MB     2048.00 MB     1407.25 MB


mapdql> \status
The Server Version Number  : 3.2.4-20171006-6a29a52
The Server Start Time      : 2017-10-26 : 14:47:8
The Server edition         : 3.2.4-20171006-6a29a52
mapdql>


So problems start to appear when I try to use dimensions in my queries


#3

Hi,

Is it possible to get the mapd_server.INFO log from the failed exectuion and the data content for the tables?

regards


#4

Definitely.
We heavily depend on a classical star schema model in our application.
If my current errors are unexpected and could be fixed I will make all necessary efforts to help to reproduce this scenario. I will provide logs in several hours ( we are on opposite timezones with SF) . It will take some time to obfuscate a data but after that we are ready to provide a failing dataset or temporary access to this server if that would be more effective. Any other action on my behalf than could help to solve this problem?


#5

Hi,

If you can reproduce your issue on the smallest data set possible is obviously the best way for us to try to resolve quickly.

Getting us the logs plus some cardinalty information on the join would be the minimum starting position to be able to try to offer any help.

Working “up” through a layer of simpler single table joins on the two tables dimension tables would also be useful to start gathering some base information.

regards


#6

Actually that is just a bit of additional information I forgot to mention, but you pointed it out. Query with just one join (fact table and one of dimension) works just fine. I’ll be able to provide examples of working queries simultaneously with logs. As first things available.


#7

Just to note major improvements to our join functionality are landing in 3.3.0 which should be released in the next few days. It does a better job of folding multi-way joins so that we don’t need to materialize intermediate join results. Would you be ok with waiting until that lands and then seeing if it fixes your issue? If not then we can go from there.


#8

And as to cardinality : invent_table has 30 000 rows and invent_location less then that.


#9

Can you tell us what version you are running? Or are you building from source?


#10

Good. We are looking forward to v3.3


#11

My current version is 3.2.4. Full output from \status command is in second comment if it could help. I installed ce through the Ubuntu apt installation instructions.


#12

I paste mapd_server.INFO content below, last lines there are

I1027 02:23:50.759819  3920 Calcite.cpp:260] Time in Thrift 15 (ms), Time in Java Calcite server 1922 (ms)
I1027 02:23:50.763725 18685 BufferMgr.cpp:266] ALLOCATION slab of 4194304 pages (2147483648B) created in 2 ms GPU_MGR:0
I1027 02:23:50.763916 18685 BufferMgr.cpp:266] ALLOCATION slab of 8388608 pages (4294967296B) created in 0 ms CPU_MGR:100:

Looks like it trying to allocate on GPU more data then memory is available?

Log file created at: 2017/10/27 02:23:35
Running on machine: dev
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I1027 02:23:35.576241  3871 MapDServer.cpp:602] MapD started with data directory at '/var/lib/mapd/data'
I1027 02:23:35.576483  3871 MapDServer.cpp:609]  Watchdog is set to 1
I1027 02:23:35.576499  3871 MapDServer.cpp:631]  cuda block size 0
I1027 02:23:35.576510  3871 MapDServer.cpp:632]  cuda grid size  0
I1027 02:23:35.576519  3871 MapDServer.cpp:633]  calcite JVM max memory  1024
I1027 02:23:35.576529  3871 MapDServer.cpp:634]  MapD Server Port  9091
I1027 02:23:35.576537  3871 MapDServer.cpp:635]  MapD Calcite Port  9093
I1027 02:23:35.576581  3871 MapDHandler.cpp:154] MapD Server 3.2.4-20171006-6a29a52
I1027 02:23:36.458072  3871 CudaMgr.cpp:127] Using 1 Gpus.
I1027 02:23:36.458158  3871 DataMgr.cpp:120] cpuSlabSize is 4096M
I1027 02:23:36.458181  3871 DataMgr.cpp:122] reserved GPU memory is 604.837M includes render buffer allocation
I1027 02:23:36.458202  3871 DataMgr.cpp:132] gpuSlabSize is 2048M
I1027 02:23:36.458349  3871 FileMgr.cpp:173] Completed Reading table's file metadata, Elasped time : 0ms Epoch: 0 files read: 0 table location: '/var/lib/mapd/data/mapd_data/table_0_0/'
I1027 02:23:36.458384  3871 Calcite.cpp:156] Creating Calcite Handler,  Calcite Port is 9093 base data dir is /var/lib/mapd/data
I1027 02:23:36.458392  3871 Calcite.cpp:95] Running calcite server as a daemon
I1027 02:23:36.781669  3871 Calcite.cpp:124] Calcite server start took 300 ms 
I1027 02:23:36.781714  3871 Calcite.cpp:125] ping took 20 ms 
I1027 02:23:36.791291  3871 Calcite.cpp:281] [{"name":"Tan","ret":"double","args":["double"]},{"name":"Truncate__","ret":"float","args":["float","i32"]},{"name":"ln","ret":"double","args":["double"]},{"name":"distance_in_meters__","ret":"double","args":["float","float","float","float"]},{"name":"Floor__","ret":"float","args":["float"]},{"name":"Floor__2","ret":"i32","args":["i32"]},{"name":"Floor__3","ret":"i64","args":["i64"]},{"name":"rect_pixel_bin","ret":"float","args":["float","float","float","i32","i32"]},{"name":"Truncate","ret":"double","args":["double","i32"]},{"name":"Floor__1","ret":"i16","args":["i16"]},{"name":"radians","ret":"double","args":["double"]},{"name":"degrees","ret":"double","args":["double"]},{"name":"Ceil__","ret":"float","args":["float"]},{"name":"ln__","ret":"double","args":["float"]},{"name":"approx_distance_in_meters","ret":"double","args":["float","float","float","float"]},{"name":"reg_hex_vert_pixel_bin_y","ret":"float","args":["float","float","float","float","float","float","float","float","float","float","i32","i32"]},{"name":"reg_hex_vert_pixel_bin_x","ret":"float","args":["float","float","float","float","float","float","float","float","float","float","i32","i32"]},{"name":"Ceil__2","ret":"i32","args":["i32"]},{"name":"Ceil__1","ret":"i16","args":["i16"]},{"name":"Ceil__3","ret":"i64","args":["i64"]},{"name":"power","ret":"double","args":["double","double"]},{"name":"conv_4326_900913_x","ret":"double","args":["double"]},{"name":"reg_hex_horiz_pixel_bin_y","ret":"float","args":["float","float","float","float","float","float","float","float","float","float","i32","i32"]},{"name":"conv_4326_900913_y","ret":"double","args":["double"]},{"name":"Atan","ret":"double","args":["double"]},{"name":"reg_hex_horiz_pixel_bin_x","ret":"float","args":["float","float","float","float","float","float","float","float","float","float","i32","i32"]},{"name":"Floor","ret":"double","args":["double"]},{"name":"Log10","ret":"double","args":["double"]},{"name":"Truncate__1","ret":"i16","args":["i16","i32"]},{"name":"Truncate__2","ret":"i32","args":["i32","i32"]},{"name":"Log","ret":"double","args":["double"]},{"name":"Truncate__3","ret":"i64","args":["i64","i32"]},{"name":"Cos","ret":"double","args":["double"]},{"name":"Log__","ret":"double","args":["float"]},{"name":"Log10__","ret":"double","args":["float"]},{"name":"Asin","ret":"double","args":["double"]},{"name":"Cot","ret":"double","args":["double"]},{"name":"Tan__","ret":"double","args":["float"]},{"name":"rect_pixel_bin_x","ret":"float","args":["float","float","float","float","float","i32"]},{"name":"rect_pixel_bin_y","ret":"float","args":["float","float","float","float","float","i32"]},{"name":"distance_in_meters","ret":"double","args":["double","double","double","double"]},{"name":"round_to_digit","ret":"double","args":["double","i32"]},{"name":"Atan2","ret":"double","args":["double","double"]},{"name":"Acos","ret":"double","args":["double"]},{"name":"pi","ret":"double","args":[]},{"name":"Sin","ret":"double","args":["double"]},{"name":"Ceil","ret":"double","args":["double"]},{"name":"Exp","ret":"double","args":["double"]}]
I1027 02:23:36.791652  3871 MapDHandler.cpp:199] Started in GPU mode
I1027 02:23:36.803804  3871 EglGLWindow.cpp:107] Window Setting: DRAWABLE_TYPE: PBUFFER.
I1027 02:23:36.803875  3871 EglUtils.cpp:61] EGL Setting: BITS_RGBA = 8.
I1027 02:23:36.803892  3871 EglUtils.cpp:84] EGL Setting: BITS_ALPHA = 8.
I1027 02:23:36.803972  3871 EglGLRenderer.cpp:101] Renderer Setting: USE_CORE_PROFILE: True.
I1027 02:23:36.866713  3871 EglGLRenderer.cpp:170] Renderer Setting: <OPENGL_MAJOR>.<OPENGL_MINOR>: 4.5.
I1027 02:23:37.097955  3871 QueryRenderManager.cpp:306] QueryRenderManager initialized for rendering. start GPU: 0, num GPUs: 1, Render cache limit: 500
I1027 02:23:48.821068  3920 MapDHandler.cpp:351] User mapd connected to database mapd
I1027 02:23:48.821497  3920 MapDHandler.cpp:563] sql_execute :hXTJf6mEN9CXPH1Qh3TKYNWzIEqnWgI2:query_str:select productName, storeName, sum(measure2) as salesAmount   from main_data, invent_table, invent_location   where main_data.itemKey=invent_table.itemKey
   and main_data.locationKey=invent_location.locationKey
   and documentType = 'Sales'
  group by productName, storeName
  order by salesAmount desc
 limit 100;
I1027 02:23:48.822662  3920 Calcite.cpp:247] User mapd catalog mapd sql 'select productName, storeName, sum(measure2) as salesAmount   from main_data, invent_table, invent_location   where main_data.itemKey=invent_table.itemKey
   and main_data.locationKey=invent_location.locationKey
   and documentType = 'Sales'
  group by productName, storeName
  order by salesAmount desc
 limit 100;'
I1027 02:23:49.759760  3922 FileMgr.cpp:173] Completed Reading table's file metadata, Elasped time : 9ms Epoch: 82 files read: 7 table location: '/var/lib/mapd/data/mapd_data/table_1_12/'
I1027 02:23:49.766551  3922 Catalog.cpp:1439] Instantiating Fragmenter for table main_data took 16ms
I1027 02:23:49.813925  3922 Catalog.cpp:1485] Time to load Dictionary 105 was 47ms
I1027 02:23:49.858645  3922 Catalog.cpp:1485] Time to load Dictionary 106 was 44ms
I1027 02:23:49.893241  3923 FileMgr.cpp:173] Completed Reading table's file metadata, Elasped time : 6ms Epoch: 1 files read: 2 table location: '/var/lib/mapd/data/mapd_data/table_1_11/'
I1027 02:23:49.893646  3923 Catalog.cpp:1439] Instantiating Fragmenter for table invent_table took 7ms
I1027 02:23:49.940206  3923 Catalog.cpp:1485] Time to load Dictionary 92 was 46ms
I1027 02:23:49.986261  3923 Catalog.cpp:1485] Time to load Dictionary 93 was 45ms
I1027 02:23:50.031177  3923 Catalog.cpp:1485] Time to load Dictionary 94 was 44ms
I1027 02:23:50.054100  3923 Catalog.cpp:1485] Time to load Dictionary 95 was 22ms
I1027 02:23:50.087842  3923 Catalog.cpp:1485] Time to load Dictionary 96 was 33ms
I1027 02:23:50.131839  3923 Catalog.cpp:1485] Time to load Dictionary 97 was 43ms
I1027 02:23:50.176362  3923 Catalog.cpp:1485] Time to load Dictionary 98 was 44ms
I1027 02:23:50.221256  3923 Catalog.cpp:1485] Time to load Dictionary 99 was 44ms
I1027 02:23:50.253933  3923 Catalog.cpp:1485] Time to load Dictionary 100 was 32ms
I1027 02:23:50.266898  3923 Catalog.cpp:1485] Time to load Dictionary 101 was 12ms
I1027 02:23:50.280084  3923 Catalog.cpp:1485] Time to load Dictionary 102 was 13ms
I1027 02:23:50.292654  3923 Catalog.cpp:1485] Time to load Dictionary 103 was 12ms
I1027 02:23:50.304687  3923 Catalog.cpp:1485] Time to load Dictionary 104 was 11ms
I1027 02:23:50.314812  3924 FileMgr.cpp:173] Completed Reading table's file metadata, Elasped time : 5ms Epoch: 1 files read: 2 table location: '/var/lib/mapd/data/mapd_data/table_1_10/'
I1027 02:23:50.315193  3924 Catalog.cpp:1439] Instantiating Fragmenter for table invent_location took 5ms
I1027 02:23:50.327558  3924 Catalog.cpp:1485] Time to load Dictionary 79 was 12ms
I1027 02:23:50.339241  3924 Catalog.cpp:1485] Time to load Dictionary 80 was 11ms
I1027 02:23:50.350872  3924 Catalog.cpp:1485] Time to load Dictionary 81 was 11ms
I1027 02:23:50.362421  3924 Catalog.cpp:1485] Time to load Dictionary 82 was 11ms
I1027 02:23:50.373755  3924 Catalog.cpp:1485] Time to load Dictionary 83 was 11ms
I1027 02:23:50.385257  3924 Catalog.cpp:1485] Time to load Dictionary 84 was 11ms
I1027 02:23:50.396857  3924 Catalog.cpp:1485] Time to load Dictionary 85 was 11ms
I1027 02:23:50.408501  3924 Catalog.cpp:1485] Time to load Dictionary 86 was 11ms
I1027 02:23:50.420563  3924 Catalog.cpp:1485] Time to load Dictionary 87 was 12ms
I1027 02:23:50.432111  3924 Catalog.cpp:1485] Time to load Dictionary 88 was 11ms
I1027 02:23:50.443601  3924 Catalog.cpp:1485] Time to load Dictionary 89 was 11ms
I1027 02:23:50.455441  3924 Catalog.cpp:1485] Time to load Dictionary 90 was 11ms
I1027 02:23:50.466943  3924 Catalog.cpp:1485] Time to load Dictionary 91 was 11ms
I1027 02:23:50.759819  3920 Calcite.cpp:260] Time in Thrift 15 (ms), Time in Java Calcite server 1922 (ms)
I1027 02:23:50.763725 18685 BufferMgr.cpp:266] ALLOCATION slab of 4194304 pages (2147483648B) created in 2 ms GPU_MGR:0
I1027 02:23:50.763916 18685 BufferMgr.cpp:266] ALLOCATION slab of 8388608 pages (4294967296B) created in 0 ms CPU_MGR:0


#13

Actually we have found a workaround. There is a working variant for a similiar query. (That query even join and use three dimension tables, not two)

select t.Category, l.subdivision, -sum(s.measure) salesQty
from (select d.itemKey, d.locationKey, sum(d.measure1) as measure
      from main_data d, calendar c
      where c.dateKey=d.dateKey
        and d.documentType = 'Sales'
        and c.yearNum=2017
        and c.monthNum=8
      group by d.itemKey, d.locationKey, c.yearNum, c.monthNum) s
left join invent_table t on t.itemKey=s.itemKey
left join invent_location l on l.locationKey=s.locationKey
group by t.Category, l.subdivision;

Explanation for it looks like:

                                                                group by t.Category, l.subdivision;
Explanation
LogicalProject(Category=[$0], subdivision=[$1], ПродажиКоличество=[-($2)])
  LogicalAggregate(group=[{0, 1}], agg#0=[SUM($2)])
    LogicalProject(Category=[$9], subdivision=[$30], measure=[$2])
      LogicalJoin(condition=[=($21, $1)], joinType=[left])
        LogicalJoin(condition=[=($3, $0)], joinType=[left])
          LogicalProject(itemKey=[$0], locationKey=[$1], measure=[$4])
            LogicalAggregate(group=[{0, 1, 2, 3}], measure=[SUM($4)])
              LogicalProject(itemKey=[$2], locationKey=[$3], yearNum=[$15], monthNum=[$17], measure1=[$6])
                LogicalFilter(condition=[AND(=($13, $0), =($1, 'Sales'), =(CAST($15):INTEGER, 2017), =(CAST($17):INTEGER, 8))])
                  LogicalJoin(condition=[true], joinType=[inner])
                    EnumerableTableScan(table=[[mapd, main_data]])
                    EnumerableTableScan(table=[[mapd, calendar]])
          EnumerableTableScan(table=[[mapd, invent_table]])
        EnumerableTableScan(table=[[mapd, invent_location]])


#14

As Darwin said 3.3 version was landing soon.
I’ve just upgraded mapd server and with that release old queries that crashed server now work just fine for me.
Thank you for great job


How could I implement paging on sorted dataset?
#15

Hi,

Thanks for the update, that is great that you let us know.

regards