Tutorial: Use Google Sheets to store Python DataFrames

Google Sheets offers straightforward but occasionally clunky data storage for Python projects. Is a free spreadsheet a database replacement? Not quite, but I'll walk you through how you can use a Google Sheet to store your Python DataFrame for free in the cloud.

Step 0: Is this a good idea?

 
It depends on your use case. Google Sheets makes collaboration easy, is simple to use, integrates with other applications via its API, includes version history, and it's free. However, it's limited to 10 million cells, can be slow, and is susceptible to data disruptions from network issues.

Step 1: Get cloud credentials

To read or write to a Google Sheet, you'll first have to create a project on Google Cloud Console, enable the API, and then download a JSON key.

1. Create a new Cloud Console project

  1. Go to the Google Cloud Console.
  2. Create a project:
    • Click on the Select a project dropdown at the top of the page.
    • Click on New Project.
    • Name your project (e.g., "My Google Sheets Project").
    • Click Create.

2. Enable the Google Sheets API

  1. From the left-hand menu, select APIs & Services > Library.
  2. In the API Library, search for "Google Sheets API."
  3. Click on the Google Sheets API result and then click Enable.

3. Set up service account credentials

  1. Create a service account:

    • Go to APIs & Services > Credentials.
    • Click + Create Credentials and select Service Account.
    • Provide a name for your service account, and click Create and Continue.
    • Assign the role Editor or Owner, then click Continue.
    • Click Done to finish creating the service account.
  2. Create a JSON key:

    • On the Credentials page, find your service account under the Service Accounts tab.
    • Click on your service account, then go to the Keys tab.
    • Click Add Key > Create New Key.
    • Choose JSON and click Create. A JSON file with your credentials will download. 

4. Share your Google Sheet with the service account

  1. Create a new Google Sheet or open an existing one.
  2. Share the Sheet:
    • Click Share in the top-right corner of the sheet.
    • In the "Share with people and groups" field, paste the service account email (which looks something like [email protected]).
    • Set the permissions to Editor and click Send.

Step 2: Read and write Python DataFrame to the Google Sheet1. Install libraries

pip install gspread oauth2client

2. Identify key  variables

  • docid: This is the long string of numbers and letters in the URL of the Google Sheet after the "/d/" and before the "/edit." For example, the docid is 1NbE4VXFA1fhkwql2DjzeEwfgrfW7B3Uj4omyfOMwOYE for the following URL: https://docs.google.com/spreadsheets/d/1NbE4VXFA1fhkwql2DjzeEwfgrfW7B3Uj4omyfOMwOYE/edit

  • worksheet_name: The name of the tab in the Google Sheet. This is "Sheet1" by default.
  • json_credential: For this demo, I've copied the JSON from the file I download from Google Cloud Console into a string literal in the code below. Follow better security practices as you move your code to production.

3 . Review the complete example

Download the code and example data on GitHub.

import pandas as pd
import gspread
from oauth2client.service_account import ServiceAccountCredentials
import json

class GoogleSheet:

    def __init__(self, docid, worksheet_name, credentials_string):
        credentials = json.loads(credentials_string, strict=False)
        client = gspread.service_account_from_dict(credentials)
        sh = client.open_by_key(docid)
        self.worksheet = sh.worksheet(worksheet_name)

    # Helper function to convert column number to letter
    def column_number_to_letter(self, n):
        result = ""
        while n > 0:
            n, remainder = divmod(n - 1, 26)
            result = chr(65 + remainder) + result
        return result

    def setGoogleSheet(self, data):
        print("Uploading Google Sheet.")
        data.fillna('', inplace=True)

        # Convert DataFrame to list of lists
        values = data.values.tolist()
        
        # Add headers to the values
        values.insert(0, data.columns.tolist())

        # Define the range to update
        num_rows = len(values)
        num_columns = len(values[0])
        last_column = self.column_number_to_letter(num_columns)
        cells_range = f"A1:{last_column}{num_rows}"
        
        # Update cells in the sheet
        self.worksheet.update(range_name=cells_range, values=values)

    def blankGoogleSheet(self):
        print("Blanking Google Sheet.")
        sheet = self.worksheet.get_all_values()
        headers = sheet.pop(0)
        sheet_df = pd.DataFrame(sheet, columns=headers)

        # Number of rows and columns
        num_rows = len(sheet_df.index) + 1  # +1 for the header row
        num_columns = len(sheet_df.columns)

        if num_columns > 0:
            # Define the range of cells to blank
            last_column = self.column_number_to_letter(num_columns)
            cells = f"A1:{last_column}{num_rows}"
            cell_list = self.worksheet.range(cells)

            # Blank out the cells
            for cell in cell_list:
                cell.value = ""

            # Update the cells in the sheet
            self.worksheet.update_cells(cell_list)

        else:
            print("Worksheet was already blank.")

    def getGoogleSheet(self):
        print ("Downloading Google Sheet.")
        #Get Form Responses tab as a list of lists
        sheet = self.worksheet.get_all_values()

        #Convert sheet to dataframe
        headers = sheet.pop(0)
        sheet = pd.DataFrame(sheet, columns=headers)

        return sheet

    def getNewRows(self, new, old):
        print ("Getting new rows.")
        return new[~new.index.isin(old.index)]
    
    def appendRow(self, row):
        self.worksheet.append_row(row)

json_credential = """
{
    "type": "service_account",
    "project_id": "",
    "private_key_id": "",
    "private_key": "",
    "client_email": "",
    "client_id": "",
    "auth_uri": "",
    "token_uri": "",
    "auth_provider_x509_cert_url": "",
    "client_x509_cert_url": ""
}"""

example = pd.read_csv("example.csv")

#Sample URL: https://docs.google.com/spreadsheets/d/1NbE4VXFA1fhkwql2DjzeEwfgrfW7B3Uj4omyfOMwOYE/edit
#docid is the string of numbers after the "/d/" and before the "/edit"
docid = "1NbE4VXFA1dgxwql2DjzeEwmkrfW7B3Uj4omyfOMwOYE"

#worksheet_name is the name of the tab
worksheet_name = "Class Data"

sheet = GoogleSheet(docid, worksheet_name, json_credential)

#Blank the Google Sheet
sheet.blankGoogleSheet()

#Write new dataframe to the Google Sheet
sheet.setGoogleSheet(example)

#Get the contents of the Google Sheet as a DataFrame
df = sheet.getGoogleSheet()
print (df)

Related post

Comments