Tutorial: Use Python to store Google CrUX data

The Chrome User Experience Report (CrUX) tracks how Chrome users experience the web. Google uses it as a factor in search rankings. You can use it to analyze real-world load times, latency, and stability.

CrUX data focuses on 3 key metrics. Beat the 75th percentile of scores to improve SEO.

  • Largest Contentful Paint (LCP): measures loading performance. LCP reports the render time of the largest image, text block, or video visible.
  • Interaction to Next Paint (INP): measures interactivity. INP assesses a page's responsiveness by observing the latency of click, tap, and keyboard interactions. The final INP value is the longest interaction observed, ignoring outliers.
  • Cumulative Layout Shift (CLS): measures visual stability. CLS is a measure of unexpected layout shifts. layout shift occurs any time a visible element changes its position from one rendered frame to the next. 

Google makes CrUX data available in Search Console and BigQuery, but what if you want to visualize it in Tableau or combine it with other sources of web data? Use the CrUX History API and Python.

First, get a CrUX API key:

  1. Go to Google Cloud Console
  2. Search for "Chrome UX Report API" and enable it for your project.
  3. Go to the API & Services section, select "Credentials," and create an API key.

Next, consider the following code:

class Crux:
    def __init__(self, key):
        self.api_url = 'https://chromeuxreport.googleapis.com/v1/records:queryHistoryRecord?key=' + key

    def getData(self, url, form_factor):
        myobj = {'url': url,'formFactor': form_factor,'metrics': []}
        response = requests.post(self.api_url, json = myobj)
        with open('response.json', 'w') as file:
            # Write the content of the response to the file
            file.write(response.text)
        return response

    def getDataframe(self, response):
        api = response.json()

        #get url and form factor
        form_factor = api['record']['key']['formFactor']
        url = api['record']['key']['url']
        print (url)

        #get metrics
        metrics_dict = {}
        metrics_keys_original = api['record']['metrics'].keys()
        metrics_keys = [item for item in metrics_keys_original if "navigation" not in item]

        for metric in metrics_keys:
            if metric!="round_trip_time":
                for series in api['record']['metrics'][metric]['histogramTimeseries']:
                    data_list = series['densities']
                    metrics_dict[("histogramTimeseries" + ":" + metric + ":" + str(series['start']))]=data_list
        
        for metric in metrics_keys:
            data_list = api['record']['metrics'][metric]['percentilesTimeseries']['p75s']
            metrics_dict[("percentilesTimeseries" + ":" + metric)]=data_list

        metrics_keys = [item for item in metrics_keys_original if "navigation" in item]
        for metric in metrics_keys:
            for series in api['record']['metrics'][metric]['fractionTimeseries']:
                data_list = api['record']['metrics'][metric]['fractionTimeseries'][series]['fractions']
                metrics_dict[("navigation_types" + ":" + series)]=data_list

        #get dates
        date_list = []
        for index, value in enumerate(data_list):
            json_date = api['record']['collectionPeriods'][index]['firstDate']
            year = str(json_date['year'])
            month = str(json_date['month'])
            if len(month)==1:
                month = "0"+month
            day = str(json_date['day'])
            if len(day)==1:
                day = "0"+day
            text_date = year + '-' + month + '-' + day
            date_list.append(text_date)

        #Get repeated lists of URL and form factor and add the date list
        info_dict = {}
        url_list = []
        form_factor_list = []
        for x in data_list:
            url_list.append(url)
            form_factor_list.append(form_factor)
        info_dict['url']=url_list
        info_dict['form_factor']=form_factor_list
        info_dict['date']=date_list

        #Combine the info dict with the metrics dict
        final_dict = {**info_dict, **metrics_dict}

        #Convert final, combined dict to dataframe
        df = pd.DataFrame(final_dict)
        print ("-----------------")
        return df

#Get list of URLs from Google Sheet

input = ['https://example.com/products','https://example.com/services','etc']

crux_key = 'YOUR_API_KEY_HERE'
crux = Crux(crux_key)
form_factors = ["DESKTOP", "PHONE"]
output_df = pd.DataFrame()

for url in input:
    for form in form_factors:
        time.sleep(1)
        api_response = crux.getData(url,form)
        if api_response.status_code!=404:
            df = crux.getDataframe(api_response)
            output_df = pd.concat([output_df, df], axis=0)
            
output_df.to_csv('output.csv')

input is the list of URLs that you want CrUX data. The script iterates over each URL, fetching data from both desktop and phone users, and writes the data to the output_df dataframe. When the list is exhausted, it writes the final data to output.csv.

You'll get the following columns:

  • url
  • form_factor
  • date
  • histogramTimeseries:first_contentful_paint:0
  • histogramTimeseries:first_contentful_paint:1800
  • histogramTimeseries:first_contentful_paint:3000
  • histogramTimeseries:interaction_to_next_paint:0
  • histogramTimeseries:interaction_to_next_paint:200
  • histogramTimeseries:interaction_to_next_paint:500
  • histogramTimeseries:largest_contentful_paint:0
  • histogramTimeseries:largest_contentful_paint:2500
  • histogramTimeseries:largest_contentful_paint:4000
  • histogramTimeseries:cumulative_layout_shift:0.00
  • histogramTimeseries:cumulative_layout_shift:0.10
  • histogramTimeseries:cumulative_layout_shift:0.25
  • histogramTimeseries:experimental_time_to_first_byte:0
  • histogramTimeseries:experimental_time_to_first_byte:800
  • histogramTimeserries:interaction_to_next_paint
  • percentilesTimeseries:largest_contentful_paint
  • percentilesTimeseries:round_trip_time
  • percentilesTimeseries:cumulative_layout_shift
  • percentilesTimeseries:experimental_time_to_first_byte
  • navigation_types:restore
  • navigation_types:back_forward
  • navigation_types:back_forward_cache
  • navigation_types:prerender
  • navigation_types:navigate
  • navigation_types:navigate_cache
  • navigation_types:reloadries:interaction_to_next_paint
  • percentilesTimeseries:largest_contentful_paint
  • percentilesTimeseries:round_trip_time
  • percentilesTimeseries:cumulative_layout_shift
  • percentilesTimeseries:experimental_time_to_first_byte
  • navigation_types:restore
  • navigation_types:back_forward
  • navigation_types:back_forward_cache
  • navigation_types:prerender
  • navigation_types:navigate
  • navigation_types:navigate_cache
  • navigation_types:reload

Check out my Python and Google Sheets tutorial for an easy way to store CrUX data. Consider setting up a Google Sheet tab to allow users to specify the URL list. Using the tutorial, you can modify my code above. For example:

input_gsheet = GoogleSheets("YOUR_SHEET_ID","Input",google_sheets_json_key)
input = input_gsheet.getGoogleSheet()
input = input["URL"].tolist()

 

 

Related post

Comments