Cross-filtering/charting with nested aggregation


#1

Hi,

We started using MapD recently and I’m trying to put together a quick demo page after experimenting some with the dashboard. Our data table is in the following (simplified) format:

Columns:
- Date
- Portfolio
- Stock
- Return

One of the charts on the page is a time series chart, displaying the sum of the Return column by the date. This is pretty straightforward to do: just define the dimension on the crossfilter as "Date" and use reduceSum("Return").

On the other chart however, I want to group by Portfolio but I want to display the average of the the daily total returns. In SQL terms, this would be equivalent to the following query:

SELECT Portfolio as key0, AVG(val) FROM (SELECT Portfolio, Date, SUM(Return) AS val FROM MyTable GROUP BY Portfolio, Date) GROUP by key0;

Of course I want these two charts to be crossfiltered to each other, so selecting a specific date range in the first one would filter through the second one.

I cannot figure out how to do this. Can this be done at all?

Thanks for the help in advance!


#2

Hi @siklosg,

Thanks for trying out MapD. Currently I don’t believe there is a way to do what you are asking but adding sub-query support to Immerse is in our longer term plans. We’ll keep you up to date as things progress.

You can always try using mapd-charting and build the custom logic into the SQL generated (contained in mapd-crossfilter, a dependency).

Let us know what ends up being the right course for you.

Regards


#3

Ah based on the above you may already be using mapd-charting. If so your logic would require some customization of mapd-crossfilter, as described above. We’re happy to discuss more if you are interested in going this route.


#4

Thanks for you quick response! I did manage to find a solution for this particular problem, which I’d like to detail here for the interest of the community, as I think this might be useful for others as well. I also have some additional questions that came up in the process.

So basically, I realized that my original query could be transformed into another one that I could do with mapdc.js. The original SQL of:

SELECT Portfolio as key0, AVG(val) FROM (SELECT Portfolio, Date, SUM(Return) AS val FROM MyTable GROUP BY Portfolio, Date) GROUP by key0;

Could be made into:

SELECT Portfolio as key0, SUM(return)/COUNT(DISTINCT(Date)) FROM MyTable GROUP by key0;

This I could do with a “custom” reducer like so:

var dimension = crossfilter.dimension("Portfolio");
var groups = dimension.group.reduce([{ expression: "SUM(return)/COUNT(DISTINCT(Date))", agg_mode: "custom", name: "val" }]);

First, I was only able to find the “custom” aggregation mode by looking at the mapdc.js source code. Is this documented anywhere? Second, this didn’t quite work because the resulting, generated query looked like this:

SELECT Portfolio as key0, SUM(return)/COUNT(DISTINCT(Date)) as val FROM MyTable WHERE DISTINCT(Date) IS NOT NULL GROUP by key0 ORDER BY key0;

This is invalid, because of the **WHERE DISTINCT(Date) IS NOT NULL**. Just for the heck of it, I tried putting in different things in the denominator, inside the COUNT and the generated query always had that in the WHERE clause, requiring it not to be NULL. I’m not sure why that is and how to get rid of that.

I finally managed to get it right by defining a multi-reducer expression, calculating the sum and the count separately and then defining an valueAccessor function for the chart that does the division. (This also ends up playing nicer with divide by zero scenarios, I think…)

I’d still be very interested to find out why the first solution did not work though! Thanks in advance!


Generated SQL for custom reducer
#5

Found the solution and posted it here.


#6

10x for sharing with community