From Spreadsheet to Database with ddlgenerator
I’ve been using some fun open data sets recently but I’m more familiar with databases than with traditional data science formats (this one is CSV). Importing CSV into Postgres is fairly well-established, but you need a table structure to import into … and I was looking at a CSV with 50 columns (for my post about exoplanets on the Aiven blog). Enter this neat utility: ddlgenerator.
I used it for a CSV file and put the data into PostgreSQL but on closer inspection I realised that this tool can ingest a great selection of different formats, including JSON, YAML and even HTML. And it can create the table structure for any of the dialects that SQLAlchemy can speak — I expected PostgreSQL and MySQL support but the selection is pretty extensive, including SQLite, SQL Server and even Oracle databases.
The tool itself is so simple it’s almost disappointing! Install it with
pip, and it's a CLI tool.
$ pip install ddlgenerator $ ddlgenerator postgres planets.csv
The output is the SQL statements to run on the platform you specified (e.g.
postgres) to create the table structure for the input data (e.g.
planets.csv). I used the built in support to import the CSV file to my new PostgreSQL database, but
ddlgenerator can also generate all the
INSERT statements you'd need to get SQL statements to put the data in the table as well, so that's neat.
If you’re also wrangling data, this is one more tool for the box. And now I’ve written it down, I might remember it next time I need it too :)
Originally published at https://lornajane.net on March 22, 2021.