Tutorial: Use Python to store Google CrUX data
The Chrome User Experience Report (CrUX) tracks how Chrome users experience the web. Google uses it as a factor in search rankings. You can use it to analyze real-world load times, latency, and stability.
CrUX data focuses on 3 key metrics. Beat the 75th percentile of scores to improve SEO.
- Largest Contentful Paint (LCP): measures loading performance. LCP reports the render time of the largest image, text block, or video visible.
- Interaction to Next Paint (INP): measures interactivity. INP assesses a page's responsiveness by observing the latency of click, tap, and keyboard interactions. The final INP value is the longest interaction observed, ignoring outliers.
- Cumulative Layout Shift (CLS): measures visual stability. CLS is a measure of unexpected layout shifts. A layout shift occurs any time a visible element changes its position from one rendered frame to the next.
Google makes CrUX data available in Search Console and BigQuery, but what if you want to visualize it in Tableau or combine it with other sources of web data? Use the CrUX History API and Python.
First, get a CrUX API key:
- Go to Google Cloud Console
- Search for "Chrome UX Report API" and enable it for your project.
- Go to the API & Services section, select "Credentials," and create an API key.
Next, consider the following code:
class Crux:
def __init__(self, key):
self.api_url = 'https://chromeuxreport.googleapis.com/v1/records:queryHistoryRecord?key=' + key
def getData(self, url, form_factor):
myobj = {'url': url,'formFactor': form_factor,'metrics': []}
response = requests.post(self.api_url, json = myobj)
with open('response.json', 'w') as file:
# Write the content of the response to the file
file.write(response.text)
return response
def getDataframe(self, response):
api = response.json()
#get url and form factor
form_factor = api['record']['key']['formFactor']
url = api['record']['key']['url']
print (url)
#get metrics
metrics_dict = {}
metrics_keys_original = api['record']['metrics'].keys()
metrics_keys = [item for item in metrics_keys_original if "navigation" not in item]
for metric in metrics_keys:
if metric!="round_trip_time":
for series in api['record']['metrics'][metric]['histogramTimeseries']:
data_list = series['densities']
metrics_dict[("histogramTimeseries" + ":" + metric + ":" + str(series['start']))]=data_list
for metric in metrics_keys:
data_list = api['record']['metrics'][metric]['percentilesTimeseries']['p75s']
metrics_dict[("percentilesTimeseries" + ":" + metric)]=data_list
metrics_keys = [item for item in metrics_keys_original if "navigation" in item]
for metric in metrics_keys:
for series in api['record']['metrics'][metric]['fractionTimeseries']:
data_list = api['record']['metrics'][metric]['fractionTimeseries'][series]['fractions']
metrics_dict[("navigation_types" + ":" + series)]=data_list
#get dates
date_list = []
for index, value in enumerate(data_list):
json_date = api['record']['collectionPeriods'][index]['firstDate']
year = str(json_date['year'])
month = str(json_date['month'])
if len(month)==1:
month = "0"+month
day = str(json_date['day'])
if len(day)==1:
day = "0"+day
text_date = year + '-' + month + '-' + day
date_list.append(text_date)
#Get repeated lists of URL and form factor and add the date list
info_dict = {}
url_list = []
form_factor_list = []
for x in data_list:
url_list.append(url)
form_factor_list.append(form_factor)
info_dict['url']=url_list
info_dict['form_factor']=form_factor_list
info_dict['date']=date_list
#Combine the info dict with the metrics dict
final_dict = {**info_dict, **metrics_dict}
#Convert final, combined dict to dataframe
df = pd.DataFrame(final_dict)
print ("-----------------")
return df
#Get list of URLs from Google Sheet
input = ['https://example.com/products','https://example.com/services','etc']
crux_key = 'YOUR_API_KEY_HERE'
crux = Crux(crux_key)
form_factors = ["DESKTOP", "PHONE"]
output_df = pd.DataFrame()
for url in input:
for form in form_factors:
time.sleep(1)
api_response = crux.getData(url,form)
if api_response.status_code!=404:
df = crux.getDataframe(api_response)
output_df = pd.concat([output_df, df], axis=0)
output_df.to_csv('output.csv')
input
is the list of URLs that you want CrUX data. The script iterates over each URL, fetching data from both desktop and phone users, and writes the data to the output_df
dataframe. When the list is exhausted, it writes the final data to output.csv.
You'll get the following columns:
- url
- form_factor
- date
- histogramTimeseries:first_contentful_paint:0
- histogramTimeseries:first_contentful_paint:1800
- histogramTimeseries:first_contentful_paint:3000
- histogramTimeseries:interaction_to_next_paint:0
- histogramTimeseries:interaction_to_next_paint:200
- histogramTimeseries:interaction_to_next_paint:500
- histogramTimeseries:largest_contentful_paint:0
- histogramTimeseries:largest_contentful_paint:2500
- histogramTimeseries:largest_contentful_paint:4000
- histogramTimeseries:cumulative_layout_shift:0.00
- histogramTimeseries:cumulative_layout_shift:0.10
- histogramTimeseries:cumulative_layout_shift:0.25
- histogramTimeseries:experimental_time_to_first_byte:0
- histogramTimeseries:experimental_time_to_first_byte:800
- histogramTimeserries:interaction_to_next_paint
- percentilesTimeseries:largest_contentful_paint
- percentilesTimeseries:round_trip_time
- percentilesTimeseries:cumulative_layout_shift
- percentilesTimeseries:experimental_time_to_first_byte
- navigation_types:restore
- navigation_types:back_forward
- navigation_types:back_forward_cache
- navigation_types:prerender
- navigation_types:navigate
- navigation_types:navigate_cache
- navigation_types:reloadries:interaction_to_next_paint
- percentilesTimeseries:largest_contentful_paint
- percentilesTimeseries:round_trip_time
- percentilesTimeseries:cumulative_layout_shift
- percentilesTimeseries:experimental_time_to_first_byte
- navigation_types:restore
- navigation_types:back_forward
- navigation_types:back_forward_cache
- navigation_types:prerender
- navigation_types:navigate
- navigation_types:navigate_cache
- navigation_types:reload
Check out my Python and Google Sheets tutorial for an easy way to store CrUX data. Consider setting up a Google Sheet tab to allow users to specify the URL list. Using the tutorial, you can modify my code above. For example:
input_gsheet = GoogleSheets("YOUR_SHEET_ID","Input",google_sheets_json_key)
input = input_gsheet.getGoogleSheet()
input = input["URL"].tolist()
Comments