[Google Analytics API] How to get data from Google Analytics with Python in Jupyter Notebook (with tutorial file)

Today I found an online tool that can get the stats of the published articles from Google Analytics. That’s how I got interested in Google Analytics API. As I am studying Data Science at the moment, knowing how to do web analytics would open up a lot of new possibilities.

I started Googling and found this basic tutorial for Google Reporting API V4 (Google Analytics is the product name. Hence, this API is not called Google Analytics API !). I was able to follow this tutorial to get the data from Google Analytics platform successfully.

However, there are many points that struggled me because there is no picture in the tutorial. For example, how to get view ID, or how to get the service account email.

Therefore, I created this guide with lots of screenshots and also provide the Jupyter Notebook file. So you can easily follow this guide to retrieve data from Google Analytics.

Step-by-Step Guide to get data from Google Analytics in Python

1. Install Google API Client in Python

Create new Jupyter Notebook file, then add this code to the first block.

# Install Google API Client
!pip install --upgrade google-api-python-client

!pip is to run pip from Jupyter Notebook. If you run it from the terminal, just remove the exclamation mark.

After the client is installed, go to the next step.

2. Create new project & new service account

This part is the most confusing in my opinion. Because the tutorial does not tell us much what we have to go through. But don’t worry, I have added enough screenshots to help you here.

Go to Google Console to create new project first. You can name it anything you want.

The steps to create new project. You might not have to click 1. if you haven’t created any project.

Then select the project, and create service by going to Credentials > Create credentials > Service account key

How to create new service account

Fill the form as in the image below.

Name the credential anything you want.

When you click ‘Create’ button, it will ask to confirm if you want to create the credential without role. Just click ‘Create without Role’ button to confirm.

After the credential is created, the key file will be downloaded. This file will be used in the next step to connect Google API.

It will take you back to the credentials page. Click the “Manage Service Account” link.

Next part is the most important. Simply copy the email from “Service account ID” column. This email will be used in the next step.

3. Connect Jupyter Notebook to Google Analytics

Add this code to the new block in Jupyter Notebook.

"""Hello Analytics Reporting API V4."""

import argparse

from apiclient.discovery import build
from oauth2client.service_account import ServiceAccountCredentials

import httplib2
from oauth2client import client
from oauth2client import file
from oauth2client import tools


SCOPES = ['https://www.googleapis.com/auth/analytics.readonly']
DISCOVERY_URI = ('https://analyticsreporting.googleapis.com/$discovery/rest')
KEY_FILE_LOCATION = '<FILE-NAME-HERE>'
SERVICE_ACCOUNT_EMAIL = '<SERVICE-EMAIL-HERE>'
VIEW_ID = '<VIEW-ID-HERE>'

There are 3 places for you to fill out.

  1. KEY_FILE_LOCATION : Move the key file from the previous step into the same folder as Jupyter Notebook file. And add the file name there. For example, the key file is called ‘download’ when I used Google Chrome.
  2. SERVICE_ACCOUNT_EMAIL : Paste the email that you copied in the previous step.
  3. VIEW_ID : This ID can be retrieved from Google Analytics. I will show you how to retrieve it below.

When you go into Google Analytics, you will be able to select the account, properties, and views that you want to view the stats. The number below each view is the view ID.

After these 3 places are filled with correct value. We can now start querying data from Google Analytics.

4. Getting data from Google Analytics

Create new block in Jupyter Notebook and add the code below. This code is modified from the basic tutorial.

def initialize_analyticsreporting():
  """Initializes an analyticsreporting service object.

  Returns:
    analytics an authorized analyticsreporting service object.
  """

  credentials = ServiceAccountCredentials.from_p12_keyfile(
    SERVICE_ACCOUNT_EMAIL, KEY_FILE_LOCATION, scopes=SCOPES)

  http = credentials.authorize(httplib2.Http())

  # Build the service object.
  analytics = build('analytics', 'v4', http=http, discoveryServiceUrl=DISCOVERY_URI)

  return analytics


def get_report(analytics):
  # Use the Analytics Service Object to query the Analytics Reporting API V4.
  return analytics.reports().batchGet(
        # Get sessions number from the last 7 days
#       body={
#         'reportRequests': [
#         {
#           'viewId': VIEW_ID,
#           'dateRanges': [{'startDate': '7daysAgo', 'endDate': 'today'}],
#           'metrics': [{'expression': 'ga:sessions'}]
#         }]
#       }
      # Get posts from last 7 days
      body={
          'reportRequests': [
              {
                  'viewId': VIEW_ID,
                  'dateRanges': [{'startDate': '7daysAgo', 'endDate': 'today'}],
                  'metrics': [
                      {'expression': 'ga:pageviews'},
                      {'expression': 'ga:uniquePageviews'},
                      {'expression': 'ga:timeOnPage'},
                      {'expression': 'ga:bounces'},
                      {'expression': 'ga:entrances'},
                      {'expression': 'ga:exits'}
                  ],
                  "dimensions": [
                      {"name": "ga:pagePath"}
                  ],
                  "orderBys": [
                      {"fieldName": "ga:pageviews", "sortOrder": "DESCENDING"}
                  ]
              }
          ]
      }
  ).execute()


def print_response(response):
  """Parses and prints the Analytics Reporting API V4 response"""

  for report in response.get('reports', []):
    columnHeader = report.get('columnHeader', {})
    dimensionHeaders = columnHeader.get('dimensions', [])
    metricHeaders = columnHeader.get('metricHeader', {}).get('metricHeaderEntries', [])
    rows = report.get('data', {}).get('rows', [])

    for row in rows:
      dimensions = row.get('dimensions', [])
      dateRangeValues = row.get('metrics', [])

      for header, dimension in zip(dimensionHeaders, dimensions):
        print ( header + ': ' + dimension )

      for i, values in enumerate(dateRangeValues):
        print ('Date range (' + str(i) + ')' )
        for metricHeader, value in zip(metricHeaders, values.get('values')):
          print ( metricHeader.get('name') + ': ' + value )


def main():

  analytics = initialize_analyticsreporting()
  response = get_report(analytics)
  print_response(response)

if __name__ == '__main__':
  main()

The important part for this code is the way you can get data from Google Analytics, which is in the get_report() function.

In the code above, I created the query to get stats of the posts that are viewed in last 7 days, ordered by the page view count. You will get the data similar to a screenshot below when you run the code.

Example output when run this code for Designil.com

To construct the query, I recommended to learn the basic of query in Reporting API V4.

If you have some ideas for query in mind, you can look into common queries for Reporting API V3, which will give you the idea of how to retrieve some common information. However, you will have to adapt the query a little bit (feature names are usually the same though), since API V4 use different syntax.

That’s it! Hope you guys can get started on connecting Python to Google Analytics data and create something fantastic.

Download: Jupyter Notebook file for this tutorial

Feel free to download Jupyter Notebook file and test it out! I write clear instruction in there as well 🙂

If you prefer .py file, you can download it from the Google’s basic tutorial in the top of this article.


Also published on Medium.

Leave a Reply

Your email address will not be published. Required fields are marked *