Tutorial: Use Google Sheets to store Python DataFrames
Google Sheets offers straightforward but occasionally clunky data storage for Python projects. Is a free spreadsheet a database replacement? Not quite, but I'll walk you through how you can use a Google Sheet to store your Python DataFrame for free in the cloud.
Step 0: Is this a good idea?
Step 1: Get cloud credentials
To read or write to a Google Sheet, you'll first have to create a project on Google Cloud Console, enable the API, and then download a JSON key.
1. Create a new Cloud Console project
- Go to the Google Cloud Console.
- Create a project:
- Click on the Select a project dropdown at the top of the page.
- Click on New Project.
- Name your project (e.g., "My Google Sheets Project").
- Click Create.
2. Enable the Google Sheets API
- From the left-hand menu, select APIs & Services > Library.
- In the API Library, search for "Google Sheets API."
- Click on the Google Sheets API result and then click Enable.
3. Set up service account credentials
Create a service account:
- Go to APIs & Services > Credentials.
- Click + Create Credentials and select Service Account.
- Provide a name for your service account, and click Create and Continue.
- Assign the role
Editor
orOwner
, then click Continue. - Click Done to finish creating the service account.
Create a JSON key:
- On the Credentials page, find your service account under the Service Accounts tab.
- Click on your service account, then go to the Keys tab.
- Click Add Key > Create New Key.
- Choose JSON and click Create. A JSON file with your credentials will download.
4. Share your Google Sheet with the service account
- Create a new Google Sheet or open an existing one.
- Share the Sheet:
- Click Share in the top-right corner of the sheet.
- In the "Share with people and groups" field, paste the service account email (which looks something like
[email protected]
). - Set the permissions to Editor and click Send.
Step 2: Read and write Python DataFrame to the Google Sheet1. Install libraries
pip install gspread oauth2client
2. Identify key variables
docid: This is the long string of numbers and letters in the URL of the Google Sheet after the "/d/" and before the "/edit." For example, the docid is
1NbE4VXFA1fhkwql2DjzeEwfgrfW7B3Uj4omyfOMwOYE
for the following URL:https://docs.google.com/spreadsheets/d/1NbE4VXFA1fhkwql2DjzeEwfgrfW7B3Uj4omyfOMwOYE/edit
- worksheet_name: The name of the tab in the Google Sheet. This is "Sheet1" by default.
- json_credential: For this demo, I've copied the JSON from the file I download from Google Cloud Console into a string literal in the code below. Follow better security practices as you move your code to production.
3 . Review the complete example
Download the code and example data on GitHub.
import pandas as pd
import gspread
from oauth2client.service_account import ServiceAccountCredentials
import json
class GoogleSheet:
def __init__(self, docid, worksheet_name, credentials_string):
credentials = json.loads(credentials_string, strict=False)
client = gspread.service_account_from_dict(credentials)
sh = client.open_by_key(docid)
self.worksheet = sh.worksheet(worksheet_name)
# Helper function to convert column number to letter
def column_number_to_letter(self, n):
result = ""
while n > 0:
n, remainder = divmod(n - 1, 26)
result = chr(65 + remainder) + result
return result
def setGoogleSheet(self, data):
print("Uploading Google Sheet.")
data.fillna('', inplace=True)
# Convert DataFrame to list of lists
values = data.values.tolist()
# Add headers to the values
values.insert(0, data.columns.tolist())
# Define the range to update
num_rows = len(values)
num_columns = len(values[0])
last_column = self.column_number_to_letter(num_columns)
cells_range = f"A1:{last_column}{num_rows}"
# Update cells in the sheet
self.worksheet.update(range_name=cells_range, values=values)
def blankGoogleSheet(self):
print("Blanking Google Sheet.")
sheet = self.worksheet.get_all_values()
headers = sheet.pop(0)
sheet_df = pd.DataFrame(sheet, columns=headers)
# Number of rows and columns
num_rows = len(sheet_df.index) + 1 # +1 for the header row
num_columns = len(sheet_df.columns)
if num_columns > 0:
# Define the range of cells to blank
last_column = self.column_number_to_letter(num_columns)
cells = f"A1:{last_column}{num_rows}"
cell_list = self.worksheet.range(cells)
# Blank out the cells
for cell in cell_list:
cell.value = ""
# Update the cells in the sheet
self.worksheet.update_cells(cell_list)
else:
print("Worksheet was already blank.")
def getGoogleSheet(self):
print ("Downloading Google Sheet.")
#Get Form Responses tab as a list of lists
sheet = self.worksheet.get_all_values()
#Convert sheet to dataframe
headers = sheet.pop(0)
sheet = pd.DataFrame(sheet, columns=headers)
return sheet
def getNewRows(self, new, old):
print ("Getting new rows.")
return new[~new.index.isin(old.index)]
def appendRow(self, row):
self.worksheet.append_row(row)
json_credential = """
{
"type": "service_account",
"project_id": "",
"private_key_id": "",
"private_key": "",
"client_email": "",
"client_id": "",
"auth_uri": "",
"token_uri": "",
"auth_provider_x509_cert_url": "",
"client_x509_cert_url": ""
}"""
example = pd.read_csv("example.csv")
#Sample URL: https://docs.google.com/spreadsheets/d/1NbE4VXFA1fhkwql2DjzeEwfgrfW7B3Uj4omyfOMwOYE/edit
#docid is the string of numbers after the "/d/" and before the "/edit"
docid = "1NbE4VXFA1dgxwql2DjzeEwmkrfW7B3Uj4omyfOMwOYE"
#worksheet_name is the name of the tab
worksheet_name = "Class Data"
sheet = GoogleSheet(docid, worksheet_name, json_credential)
#Blank the Google Sheet
sheet.blankGoogleSheet()
#Write new dataframe to the Google Sheet
sheet.setGoogleSheet(example)
#Get the contents of the Google Sheet as a DataFrame
df = sheet.getGoogleSheet()
print (df)
Comments