How to import data from MySQL database into Pandas Data Frame

It is easy to load CSV data into Python’s Pandas Data Frame. I always think this is also the case for loading MySQL into Data Frame. However, it turns out to be quite troublesome.

I did a search in Google on how to import MySQL data into Pandas, but most search results were StackOverflow topics about how to import Pandas data into MySQL.

Then I came across this topic, which explicitly said:

No

Pandas has no native way of reading a mysqldump without it passing through a database.

There is a possible workaround, but it is in my opinion a very bad idea.

Interesting :/ I did a search further and found some Pandas’s function about SQL: pandas.read_sql, pandas.read_sql_table. But I couldn’t find good code example on how to use these functions with MySQL database.

Solution for importing MySQL data into Data Frame

I decided to transform data in MySQL database to CSV files because Python is very friendly with CSV.Using PHPMyAdmin, it is easy to

Using PHPMyAdmin, it is easy to export CSV in the desired format. Take note on configurations I am using here, or it will gives error when trying to import.

Step by step: Exporting MySQL database (Require PHPMyAdmin)

  1. In PHPMyAdmin, go into the database you would like to export. Click ‘Export’ tab.
    Screen Shot 2559-09-05 at 22.55.46
  2. Choose ‘Custom’ export method, and set format as ‘CSV’
    mysql-data-to-pandas-2
  3. Select which tables in the database you would like to export. You only need some of them for data analysis purpose.
    Ask developers of the project if you are unsure which tables you need to get which data.
    mysql-data-to-pandas-3
  4. For the output setting, choose ‘Save to file’ and tick ‘Export tables as seperate files’ because you want 1 table per 1 CSV to easily import into data frames.
    mysql-data-to-pandas-4
  5. This step is important!
    Type , as separator for columns as it is the default value for CSV import function in Python.
    Type  to enclose values, because you might run into some data with , sign in it.
    Tick ‘Put column names in the first row’
    mysql-data-to-pandas-5
  6. Now, click ‘Go’ and you should get databasename.csv.zip. You may pat yourself on the back.

    For Mac Users:
    Unfortunately, we are not done here. Why? you asked.
  7. When you extract zip file, you will get one databasename.csv, but you are supposed to get separated CSV files for each table.What you need to do is right click on databasename.csv file and go to  ‘Show Package Content’
    mysql-data-to-pandas-6
    You should find all the CSV files for each table. Copy those data files to where you wanted for Python to get easy access. You may also want to rename the files because they usually have long name.

Note: If you are dealing with big data (large table), you will have to export each table one by one. You can do this by going into table and click ‘Export’ tab. All the configurations are the same, except that you will have the ability to limit how many rows you would like to export.

Step by step: Import data into Pandas data frame

We can simply use pandas.read_csv to import the data files.

import pandas as pd

# Import data
links = pd.read_csv('links.csv')
options = pd.read_csv('options.csv')
postmeta = pd.read_csv('postmeta.csv')
posts = pd.read_csv('posts.csv')

I will write more on how to do data wrangling in the next article. Pandas is very powerful tool.

Please kindly give me feedback about this article, and please let me know if you have other ways to import MySQL data !


Also published on Medium.

One thought on “How to import data from MySQL database into Pandas Data Frame

  1. Thank you for the article.
    I use pd.read_sql

    Here is an example:
    “`
    import pymysql.cursors
    import pandas as pd
    import numpy as np

    mysql_config = {
    ‘host’: ‘localhost’,
    ‘user’: ‘user’,
    ‘password’: ‘password’,
    ‘db’: ‘database’,
    ‘charset’: ‘utf8’,
    ‘cursorclass’: pymysql.cursors.DictCursor
    }
    mysql_connection = pymysql.connect(**mysql_config)
    sql = “SELECT id, col_1, col_2 /* …. */ FROM table WHERE /* … */”
    df = pd.read_sql(sql, mysql_connection, index_col=’id’)
    “`

Leave a Reply

Your email address will not be published. Required fields are marked *