ga4 analytics data api
Development

GA4 Analytics Data API – Extract Data Using Python

The GA4 API, also known as the Analytics Data API, is a new topic for everyone. We used the Reporting API for Universal Analytics previously, but unfortunately, it is not accessible for Google Analytics 4 (GA4). The Analytics Data API currently offers features that are available in Alpha and Beta channels. As a result, it is still under development but provides promising results.

In this article, I will try to provide a general overview about Analytics Data API and demonstrate how we can use it to extract GA4 data using Python. Let’s begin.

Table of Contents

    What is Analytics Data API

    Basically, the Analytics Data API is an endpoint that we can use to extract data from Google Analytics 4. If you’re familiar with the Universal Analytics Reporting API, the logic is exactly the same. You can retrieve all the GA4 reports that you’re actively using. You need to choose one of the client libraries to request data from the data API, such as Python, Java, NodeJS, PHP.

    Data API includes some special methods to assist you in requesting different types of data like pivot tables, funnels, real-time data, batches, etc. So, I will attach a link here from Google to explain each method and how you can use it. Please see the below link :

    https://developers.google.com/analytics/devguides/reporting/data/v1?hl=en#available_methods

    Why You Should Use It

    In brief, we can use the Data API for the same reason as our usage of the Google Sheets Analytics add-on. You can create :

    • Custom dashboards that depend on your requirements.
    • Cool and complex automated reports.
    • Aggregated data tables with GA4 data and other business data.
    • Large datasets to be used for data analysis.

    I highly recommend using it because it is much faster than you might think. I hate waiting for Google Analytics to generate the complete report and seeing the “loading” icon for an hour when i am trying to create quick reports. It is quite frustrating. Even if you aim to generate attractive automated and custom dashboards with Google Looker Studio (Datastudio), the speed is not very satisfactory either.

    Setting Up The Environment

    To begin using the Analytics Data API, you must first set up your environment. Create a folder on your desktop and open it using your code editor. Ensure that you have Python and the PIP package manager installed.

    Step 1: This step is crucial because you need to create a Google Cloud Workspace and enable the Analytics Data API to be able to request any data. Fortunately, Google strives to simplify every process for us. So, if you visit the link below, you will find the “Enable the Google Analytics Data API v1” button.

    https://developers.google.com/analytics/devguides/reporting/data/v1/quickstart-client-libraries?hl=en

    If you click on it, it will ask you for a workspace name to create. You can name it “analytics-data-api” and if everything goes smoothly, it will provide you with a JSON file that we will use for the authentication process. Save this file as “credentials.json” in your development folder.

    Step 2: You need to grant access to the Analytics Data API so that it can access your GA4 account. To achieve this, go to “Admin > Account Access Management” in GA4 and provide editor access to the email address that appears in the credentials.json file as the value of the “client_email” parameter.

    Step 3: You need to locate your property ID. If you go to “Admin > Property Settings” you will find your property ID in the top-right corner of the page. Take note of it as we will use it in the future.

    Step 4 : Create a python file in your project directory and open the terminal.

    Step 5 : Install the “google-analytics-data” package with PIP manager to your project.

    pip3 install google-analytics-data

    Now, you are ready to write some Python codes and request data from Analytics Data API.

    Demo Application

    First of all, you should import the neccessary libraries to your Python code.

    import os, csv
    from google.analytics.data_v1beta import BetaAnalyticsDataClient
    from google.analytics.data_v1beta.types import Dimension, Metric, DateRange, RunReportRequest, OrderBy

    Basically, we have impored :

    • “csv” library to create csv files,
    • “os” library for authentication
    • “google-analytics-data” library to send request to GA4

    Now, in the next step, we should define all the required variables using the code provided below:

    os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'credentials.json'
    
    client = BetaAnalyticsDataClient()
    property = "properties/xxxxxx"
    dimensions = [Dimension(name="pagePath"), Dimension(name="sessionSourceMedium")]
    metrics = [Metric(name="screenPageViews")]
    date_ranges = [DateRange(start_date="30daysAgo", end_date="yesterday")]
    order_bys=[OrderBy(dimension={'dimension_name':'sessionSourceMedium'})]

    We have defined the location of the “credentials.json” file that we created for authentication. We have also addressed our property ID (xxxxxx) along with all the dimensions and metrics that we want to include in our report. Lastly, we have defined the date range and ordering preference. Please don’t forget to replace the “property ID” with your own.

    As you can see, we use camel case values to request dimensions, metrics, date ranges, and order. These values are predetermined in Google’s Data API documentation. You can find the names of all dimensions and metrics here before executing the API request.

    Now, we are prepared to request the data. Execute the following code to make it possible.

    request = RunReportRequest(property=property, dimensions=dimensions, metrics=metrics, date_ranges=date_ranges, order_bys=order_bys)
    response = client.run_report(request)
    

    Upon running the code and printing out the request response, you will be able to see the response schema.

    All the requested metrics and dimensions are returned in the rows {…} object as dimension_values and metric_values. Now, we should create loops to read all these values.

    header_row = [dim.name for dim in dimensions] + [metric.name for metric in metrics]
    
    data_rows = []
    
    for row in response.rows:
        dimension_values = [dim.value for dim in row.dimension_values]
        metric_values = [metric.value for metric in row.metric_values]
        data_row = dimension_values + metric_values
        data_rows.append(data_row)
    

    Firstly, we have retrieved all the headers that we defined as dimensions and metrics using a loop (first line). Next, we have created an array to store all the metric and dimension values that are returned in the API response.

    Finally, we will proceed to create a well-structured CSV file containing all the data, which we can use later.

    with open('file.csv', 'w', newline='') as file:
        csv.writer(file).writerow(header_row)
        csv.writer(file).writerows(data_rows)

    This code will generate a file named “file.csv” where you can find all the results returned from the Data API. Please refer to the screenshot for a visual representation.

    Congratulations! You are now well-prepared to request data from the Analytics Data API using Python.

    Conclusion

    Learning how to utilize the Google Analytics Data API to retrieve data from your GA4 account can provide you with numerous opportunities and a high degree of flexibility. As mentioned earlier, this knowledge enables you to generate custom and automated reports, thereby enhancing your analytical capabilities and elevating your business insights to a new level. I hope you find this information valuable. Cheers!

    Leave a Reply

    Your email address will not be published. Required fields are marked *