Update / Upsert support


#1

Hi
Is there anyway to update existing rows in the MapD database.
As an example if we have a window aggregation for 1h and the data is emitted every minute.
We would then want to upsert into MapD with the new values

Is there any plans to allow that functionality?

Regards
T


#2

Hi,
Thanks for the question. We don’t currently support update or upsert, but it is on our development roadmap for later in the year. We’ll do an announcement when it’s ready.

Regards,
Ed


#3

Thanks, looking forward to it


#4

As a follow-up question
We have a use case where we could generate 100s of millions of rows per day and need to keep it around for quite a while. At the same time we generate aggregated data rows for reporting to minimize raw data.
Currently they are being userted into a report db.
How would this usecase currently be covered by mapd if there is no update/upsert support?


#5

Hi,

How long is “keep it around for quite a while”

How much of a window do you want to be able to query on interactively?

We have the concept of a capped collection where a table is defined with a maximum number of rows and as more rows are added the older rows drop off when the max size id reached. With this approach users can have the last X number of rows always available for full granularity query. So your 100 million rows a day would just land in the DB and be available until the table rolls them off when enough new data is loaded in.

Rather than update, why cant you just insert a new aggregated record for each unit of granularity of items your interested in. ie at the end of each hr run a query to generate the aggregates you are interested for the previous hr and insert it into a different SUMMARY table.

regards


#6

The users need reporting on data spanning several years which means using the raw data is a bit infeasible.
Doing a hourly rollup would be possible if we can do a union on the raw data from the current hour as the users would expect more or less real time data (lag of a couple of minutes ok but preferably not)

Is this something other users of mapd have achived?


#7

Hi Tobad,
We wanted to let you know we’ve published a near-term roadmap for the OS MapD Core project. You can find the link under table of contents here or direct link: https://github.com/mapd/mapd-core/blob/master/ROADMAP.md

We have not listed an exact timeline associated with these items, but these are our top internal priorities for the project over the coming months.

Regards


Transactions, Consistency
#8

Hi,
Has there been any progress on support for update/upsert? I really like the look of MapD, and am considering it for a big finance client of mine. But one of their key requirements is updates.
Alex


#9

Hi @alex,

Thanks for your message. Currently bulk update/delete is in development, although upsert support is not currently planned. I hope this helps!

Regards


#10

Hi Darwin,
I assume what you are talking about it is:

  • users declaring columns as primary keys
  • MapD maintaining an index on a table’s PK
  • looking up in that index each row being inserted, and if found, invalidating the row (assuming you are not modifying in place).
    If that is what you are talking about, upsert support should follow from update support.
    Regards,
    Alex

#11

I also need update my GIS data ~


#12

Hi @alex,

No the update/delete support I am speaking about won’t use an index, but will be the standard update <table_name> set <column_name> = and delete from <table_name> where... SQL semantics, using our normal fast scan approach. You’d be able to of course delete rows based on some set of key values using SQL and then re-insert them.

Regards


#13

@todd

Is there a timeline for update/delete support?

Thanks
Mawdo


#14

Hi @mawdo1, its imminent, hopefully to be released within the next 3-4 weeks.

Regards


#15

Thanks @todd, great news - eagerly looking forward to it.