Transfer a MSFT SQL DB to MapD Core


#1

Hi,

I have a .bak file (MSFT SQL backup file) in an ftp server and I’d want to load it into mapD core. The .bak file is approx 400 Gb so it’s a large file and that’s why I’d want to plan this transfer effectively.

As of now I would:

  1. first connect to the ftp server from an EC2 instance running MSFT SQL server
  2. transfer the .bak file in .zip compressed format from the ftp server to my MSFT SQL server
  3. Uncompress the zip file to get the .bak file
  4. Restore DB in MSFT SQL Server
  5. Export the DB (all tables) from MSFT SQL server to MapD Core

If you have advise for 1-4 you are more than welcome, but the part I am really looking for help is 5.

  1. What’d be a good format to export the Db from MSFT SQL server as to make the transfer to mapD core as seamless as possible?
  2. Also I was planning to put the files in whatever recommended format from 1. in EBS Storage and then use mapD core to read the data from the EBS volume that I’d attach to the p2.large EC2 instance running mapD. Is reading from EBS faster than reading from S3?

Any advise on this process would be greatly appreciated it.

NOTE:
I am using amazon m4.xlarge for MSFT SQL server, and p2.large MapD AWS marketplace latest version, as well as S3 and EBS Storage if needed.


#2

Hi,

Thanks for trying out MapD.

MapD can import direct from another DB via SQlImporter, see http://docs.mapd.com/latest/mapd-core-guide/loading-data/#sql-importer. This may be the easiest way to at least get started with your transfer, once you have it loaded in the other DB assuming you can connect via a JDBC adapter to the other DB.

I would suggest doing a limited import the first time (ie add a LIMIT to your SELECT statement) and then check the types that are being used in MapD to make sure they are appropriate to your purpose and expectations. You can adjust this by specifying the CAST explicitly on the SELECT.

I do have a concern about loading a 400GB DB into a single p2.large instance. A p2.xlarge is a single GPU with 12GB of VRAM and 60GB of CPU ram. MapD is an in memory DB and so your suggested plan is raising some red flags about the sizing of your test machine. You may want to consider a p2.8xlarge.

Regards