Particulate Matter Across Socioeconomic Strata of Countries

Author

Juan F. Martinez

Overview

In this lesson, you will use NASA socioeconomic and environmental Earthdata available at NASA SEDAC to compare relationships between levels of socioeconomic deprivation agaisnts air quality data of particulate matter (PM) in different international administrative areas.

This lesson walks through the process of calculating and visualizing zonal statistics for a set of countries using raster data, focusing on GRDI country quintiles and PM2.5 concentration levels within these quintile areas. It begins by subsetting data by country and iterating over each country to extract relevant zonal statistics like mean, median, and various percentiles for each quintile. These statistics are stored in a GeoDataFrame, which is later used to create a choropleth map that visualizes specific GRDI metrics across countries. The lesson includes a detailed analysis of PM2.5 concentrations within different GRDI quartiles for selected countries. This involves clipping the raster data to each country’s geometry, filtering the data based on the GRDI quartiles, and calculating the mean PM2.5 levels for each quartile. The results are then visualized using customized plots to highlight the relationship between air quality and GRDI metrics across the selected countries.

Learning Objectives

After completing this lesson, you should be able to:

Gain a general understanding of what is particulate matter (PM) in the air and how it impacts human health.
Learn about global socioeconomic dimensions of deprivation and how they are spatially represented.
Find statistical thresholds in socioeconomic data.
Perform zonal statistics to summarize spatial data
Resample spatial data to harmoniza and compare socioeconomic data against environmental data.
Display data on a maps to get a general understanding of the spatial distribution of data.
Summarize spatial data into table plots to compare how air quality differs in different socioeconomic conditions of international administrative areas.

Introduction

Data Collection and Integration

The Global (GL) Annual PM2.5 Grids from MODIS, MISR and SeaWiFS Aerosol Optical Depth (AOD), v4.03 (1998 – 2019) can can be downloaded from the Socioeconomic Data and Applications Center ([SEDAC](https://sedac.ciesin.columbia.edu/)) (Center For International Earth Science Information Network-CIESIN-Columbia University 2022a).

The Global Gridded Relative Deprivation Index (GRDI), v1 (2010 – 2020) dataset can be downloaded from SEDAC as well (Center For International Earth Science Information Network-CIESIN-Columbia University 2022b).

Gather comprehensive datasets from reliable sources such as the US EPA’s EJSCREEN tool and other public health databases like John Hopkins and County Health Rankings. Integrate relevant environmental data (e.g., air pollutant concentrations, pollution source proximity) with health outcomes data (e.g., COVID-19 prevalence, NSCLC incidence). Ensure data compatibility and quality through data cleaning and validation procedures.

Preparing Environment and Variables

Importing python packages required:

import xarray as xr
import rioxarray
import rasterstats
from rasterio.enums import Resampling
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import geopandas as gpd
import pygadm

import plotly.graph_objects as go

Load the GRDIv1 and PM2.5 data from local sources:

# Load rasters
grdi_path = r"Z:\Sedac\GRDI\data\povmap-grdi-v1-geotiff\final data\povmap-grdi-v1.tif"
pm25_path = r"F:\TOPSSCHOOL\git\TOPSTSCHOOL-air-quality\data\sdei-global-annual-gwr-pm2-5-modis-misr-seawifs-aod-v4-gl-03-2019-geotiff\sdei-global-annual-gwr-pm2-5-modis-misr-seawifs-aod-v4-gl-03-2019.tif"

Using the package rasterio to load the data into memory. This allows us to read the data and use it for processing.

# Open the input and reference rasters
grdi_raster = rioxarray.open_rasterio(grdi_path, mask_and_scale=True)
pm25_raster = rioxarray.open_rasterio(pm25_path, mask_and_scale=True)

Matching Data Points using Bilinear Resample

The GRDI raster and PM2.5 rasters are incompatible in resolution. One method of harmonizing data is by using the Resampling bethod with a bilinear method. In this case, we reduce, or coarsen, the resolution of the GRDI raster to match the PM2.5 raster.

# Resample the input raster to match the reference raster
grdi_raster = grdi_raster.rio.reproject_match(pm25_raster,resampling=Resampling.bilinear)

Previewing Spatial Data in a Plot

# Plotting the rasters
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 20))

# Plot the original GRDI raster in the first subplot
im1 = ax1.imshow(grdi_raster.values[0], cmap='viridis', interpolation='nearest')
ax1.set_title('Original GRDI Raster')
fig.colorbar(im1, ax=ax1, orientation='horizontal', label='GRDI Values')

# Plot the PM2.5 raster in the second subplot
im2 = ax2.imshow(pm25_raster.values[0], cmap='hot', interpolation='nearest')
ax2.set_title('PM2.5 Raster')
fig.colorbar(im2, ax=ax2, orientation='horizontal', label='PM2.5 Values')


# Show the plots
plt.tight_layout()
plt.show()

Working with administrative Data

pygadm is a package that has international administrative units from levels 0 to 2. We can search the available countries by listing the Names.

country_table = gpd.GeoDataFrame(pygadm.Names())
len(country_table)

Some available areas with a unique GID_0 code share Names; therefore we drop the rows that contain digits.

country_table = country_table[~country_table['GID_0'].str.contains('\d', na=False)]
len(country_table)

Subset Data From a Table

Doing Zonal statistics for more than 200 countries may take a while, therefore, we can subset the data randomly with the .sample() method.

country_sample = country_table.sample(n=15)
country_sample

	NAME_0	GID_0
251	Samoa	WSM
76	Fiji	FJI
72	Spain	ESP
200	Senegal	SEN
183	Philippines	PHL
231	Trinidad and Tobago	TTO
7	United Arab Emirates	ARE
226	Tajikistan	TJK
166	New Caledonia	NCL
260	Zambia	ZMB
112	British Indian Ocean Territory	IOT
77	Falkland Islands	FLK
259	South Africa	ZAF
139	Lithuania	LTU
212	South Sudan	SSD

Zonal Statistics for Each Administrative Area

rasterstats has a funcion zonal_stats() which allows you to use vectors to summarize raster data. We summarize GRDIv1 data to calculate the following statistics: count, minimum, mean, max, median, standard deviation, range, and percentiles 20, 40, 60, and 80.

stats_results = gpd.GeoDataFrame()

for index, row in country_sample.iloc[:].iterrows():
    country = row['NAME_0']
    country_GID = row['GID_0']
    try:
        country_poly =  pygadm.Items(admin=country_GID, content_level=0)
    except:
        print(country, " skipped.")
        continue

    # Create a mask for the polygons
    grdi_country_zs= rasterstats.zonal_stats(country_poly, grdi_raster.values[0], affine=grdi_raster.rio.transform(), stats="count min mean max median std median range percentile_20 percentile_40 percentile_60 percentile_80")
    # # pm25_country_zs= rasterstats.zonal_stats(country_poly, pm25_arr, affine=pm25_transform, stats="count min mean max median std median range percentile_20 percentile_40 percentile_60 percentile_80", nodata=pm25_raster.nodata)
    # # Extract statistics into a dictionary
    country_stats = {
        'Country_Name': country,
        'Country_GID' : country_GID,
        'GRDI_Count': grdi_country_zs[0]['count'],
        'GRDI_Min': grdi_country_zs[0]['min'],
        'GRDI_Mean': grdi_country_zs[0]['mean'],
        'GRDI_Max': grdi_country_zs[0]['max'],
        'GRDI_Median': grdi_country_zs[0]['median'],
        'GRDI_Std': grdi_country_zs[0]['std'],
        'GRDI_Range': grdi_country_zs[0]['range'],
        'GRDI_P20': grdi_country_zs[0]['percentile_20'],
        'GRDI_P40': grdi_country_zs[0]['percentile_40'],
        'GRDI_P60': grdi_country_zs[0]['percentile_60'],
        'GRDI_P80': grdi_country_zs[0]['percentile_80'],
        #     # 'PM25_Count': pm25_country_zs[0]['count'],
        #     # 'PM25_Min': pm25_country_zs[0]['min'],
        # 'PM25_Mean': pm25_country_zs[0]['mean'],
        #     # 'PM25_Max': pm25_country_zs[0]['max'],
        #     # 'PM25_Median': pm25_country_zs[0]['median'],
        #     # 'PM25_Std': pm25_country_zs[0]['std'],
        #     # 'PM25_Range': pm25_country_zs[0]['range'],
        #     # 'PM25_P20': pm25_country_zs[0]['percentile_20'],
        #     # 'PM25_P40': pm25_country_zs[0]['percentile_40'],
        #     # 'PM25_P60': pm25_country_zs[0]['percentile_60'],
        #     # 'PM25_P80': pm25_country_zs[0]['percentile_80'],
        'geometry' : country_poly['geometry'].iloc[0]
    }
    country_stats_gdf = gpd.GeoDataFrame([country_stats], geometry='geometry')
    # stats_results.append(country_stats_gdf)
    stats_results = pd.concat([stats_results, country_stats_gdf], ignore_index=True)

Let’s use the .head() method from Pandas to check the top of our table

stats_results.head()

	Country_Name	Country_GID	GRDI_Count	GRDI_Min	GRDI_Mean	GRDI_Max	GRDI_Median	GRDI_Std	GRDI_Range	GRDI_P20	GRDI_P40	GRDI_P60	GRDI_P80	geometry
0	Samoa	WSM	893	16.779119	72.627572	82.130798	75.354385	9.305466	65.351679	69.140343	73.877106	76.735214	79.172623	MULTIPOLYGON (((-171.4079 -14.0746, -171.4055 ...
1	Fiji	FJI	4899	11.950370	72.009945	86.728622	74.518845	9.951100	74.778253	69.718742	73.071259	75.752144	77.892212	MULTIPOLYGON (((178.1063 -19.1592, 178.1064 -1...
2	Spain	ESP	302383	2.526078	53.366115	72.849083	58.745010	12.811808	70.323005	47.256508	56.190113	60.127953	62.182636	MULTIPOLYGON (((-17.9632 27.6904, -17.9657 27....
3	Senegal	SEN	32706	26.157717	85.125428	99.563820	88.179062	9.393603	73.406103	83.247147	86.728622	88.949966	89.592674	MULTIPOLYGON (((-15.9534 12.4426, -16.0378 12....
4	Philippines	PHL	134581	6.626353	72.399834	91.909012	75.517365	11.636341	85.282659	68.165047	73.975517	76.879448	79.703705	MULTIPOLYGON (((119.4706 4.5911, 119.4689 4.58...

Defining a Funtion

We can create a custom function that can allow us to use the zonal statistics process multiple times. A custom function can be created using the def FUNCTION_NAME(PARAMETER1, PARAMETER2): fuction to define what the fucntion will do.

def calculate_country_stats(country_sample, grdi_raster, pm25_raster=None):
    """
    Calculate statistics for each country in the sample.

    Parameters:
    - country_sample: A pandas DataFrame containing country information with 'NAME_0' and 'GID_0' columns, in this case the country_table.
    - grdi_raster: A raster object with which to perform the zonal statistics.
    - pm25_raster: (Optional) A raster object for PM2.5 data. If provided, statistics will also be calculated for this raster.

    Returns:
    - stats_results: A GeoDataFrame containing the statistics for each country.
    """
    stats_results = gpd.GeoDataFrame()

    for index, row in country_sample.iloc[:].iterrows():
        country = row['NAME_0']
        country_GID = row['GID_0']
        try:
            country_poly = pygadm.Items(admin=country_GID, content_level=0)
        except Exception as e:
            print(country, "skipped due to error:", e)
            continue

        # Create a mask for the polygons and perform zonal statistics on GRDI raster
        grdi_country_zs = rasterstats.zonal_stats(
            country_poly, grdi_raster.values[0], 
            affine=grdi_raster.rio.transform(), 
            stats="count min mean max median std range percentile_20 percentile_40 percentile_60 percentile_80"
        )

        # Uncomment and update the following lines if you want to include PM2.5 statistics
        # if pm25_raster is not None:
        #     pm25_country_zs = rasterstats.zonal_stats(
        #         country_poly, pm25_raster.values[0], 
        #         affine=pm25_raster.rio.transform(), 
        #         stats="count min mean max median std range percentile_20 percentile_40 percentile_60 percentile_80", 
        #         nodata=pm25_raster.nodata
        #     )

        # Extract statistics into a dictionary
        country_stats = {
            'Country_Name': country,
            'Country_GID' : country_GID,
            'GRDI_Count': grdi_country_zs[0]['count'],
            'GRDI_Min': grdi_country_zs[0]['min'],
            'GRDI_Mean': grdi_country_zs[0]['mean'],
            'GRDI_Max': grdi_country_zs[0]['max'],
            'GRDI_Median': grdi_country_zs[0]['median'],
            'GRDI_Std': grdi_country_zs[0]['std'],
            'GRDI_Range': grdi_country_zs[0]['range'],
            'GRDI_P20': grdi_country_zs[0]['percentile_20'],
            'GRDI_P40': grdi_country_zs[0]['percentile_40'],
            'GRDI_P60': grdi_country_zs[0]['percentile_60'],
            'GRDI_P80': grdi_country_zs[0]['percentile_80'],
            'geometry' : country_poly['geometry'].iloc[0]
        }

        # If PM2.5 statistics are calculated, add them to the dictionary
        # if pm25_raster is not None:
        #     country_stats.update({
        #         'PM25_Count': pm25_country_zs[0]['count'],
        #         'PM25_Min': pm25_country_zs[0]['min'],
        #         'PM25_Mean': pm25_country_zs[0]['mean'],
        #         'PM25_Max': pm25_country_zs[0]['max'],
        #         'PM25_Median': pm25_country_zs[0]['median'],
        #         'PM25_Std': pm25_country_zs[0]['std'],
        #         'PM25_Range': pm25_country_zs[0]['range'],
        #         'PM25_P20': pm25_country_zs[0]['percentile_20'],
        #         'PM25_P40': pm25_country_zs[0]['percentile_40'],
        #         'PM25_P60': pm25_country_zs[0]['percentile_60'],
        #         'PM25_P80': pm25_country_zs[0]['percentile_80'],
        #     })

        country_stats_gdf = gpd.GeoDataFrame([country_stats], geometry='geometry')
        stats_results = pd.concat([stats_results, country_stats_gdf], ignore_index=True)

    return stats_results

From the table above, we can choose an attribute, or column, to display it in a map plot. In this case, I’m choosing the GRDI Max

column_chosen = 'GRDI_Max' #GRDI_Max, GRDI_Min, GRDI_Median
# Plotting
fig, ax = plt.subplots(1, 1, figsize=(15, 10))
stats_results.plot(column=column_chosen, ax=ax, legend=True,
    legend_kwds={'label': f"{column_chosen} per country.",
                      'orientation': "horizontal"})
ax.set_title('Choropleth Map Showing GRDI Mean per country')
ax.set_axis_off()  # Turn off the axis numbers and ticks
plt.show()

Selecting Data by Column

Start my creating a list of countries that you are interested in to Subset data from the DataFrame that match the values in the NAME_0 column. The .isin() mehthod checks each element in the DataFrame’s column for the item present in the list and returns matching rows.

# selected_countries = ["Algeria", "Somalia", "Colombia", "Timor Leste", "Finland", "Nicaragua", "United Kingdom", "Mali"]
# selected_countries = ["Anguilla", "Armenia", "Angola", "Argentina", "Albania", "United Arab Emirates", "American Samoa", "Australia" ]
selected_countries = ["Algeria", "Somalia", "Colombia", "Timor Leste", "Finland", "Nicaragua", "United Kingdom", "Mali", "Armenia", "Argentina",  "Albania", "United Arab Emirates", "Indonesia", "Qatar"]

#use the list above to subset the country_table DataFrame by the column NAME_0 
selected_countries = country_table[country_table['NAME_0'].isin(selected_countries)]

Using a Defined Custom Function

Recalling the defined fucntion calculate_country_stats, we can use our selected_countries list, and the GRDI and PM2.5 rasters, to create a new table of zonal statistics.

stats_results = calculate_country_stats(selected_countries, grdi_raster)

Show the head of the table again:

stats_results.head()

	Country_Name	Country_GID	GRDI_Count	GRDI_Min	GRDI_Mean	GRDI_Max	GRDI_Median	GRDI_Std	GRDI_Range	GRDI_P20	GRDI_P40	GRDI_P60	GRDI_P80	geometry
0	Albania	ALB	19076	8.272310	61.513866	75.395561	66.220139	11.644754	67.123251	55.861691	64.445587	67.104332	68.589424	MULTIPOLYGON (((20.0541 39.6917, 20.0389 39.69...
1	United Arab Emirates	ARE	18229	5.732072	42.347647	67.470955	45.034630	17.949307	61.738883	24.688160	38.623692	50.710217	61.909931	MULTIPOLYGON (((54.1541 22.7548, 53.3313 22.85...
2	Argentina	ARG	474297	7.572341	66.341158	81.701645	68.789696	10.416561	74.129304	66.774643	68.278297	69.335510	71.098465	MULTIPOLYGON (((-66.5458 -55.061, -66.5486 -55...
3	Armenia	ARM	9108	6.944193	59.871617	73.707901	64.272858	12.315193	66.763708	52.692970	61.288189	66.743027	69.004845	MULTIPOLYGON (((45.8319 39.8311, 45.8448 39.82...
4	Colombia	COL	223557	11.956444	71.523674	84.922409	73.350586	8.762370	72.965965	69.966934	72.190056	74.405403	76.502792	MULTIPOLYGON (((-77.491 4.1451, -77.4985 4.140...

Plot the map again choosing a column to plot:

column_chosen = 'GRDI_Max' #GRDI_Max, GRDI_Min, GRDI_Median
stats_results.plot(column=column_chosen, legend=True)
plt.show()

Creating a Table with Results

We can create a list of tuples that we can use to refer to the GRDI statistical values, and the name, color, and symbol we want to assign. In this case, we are using the GRDI zonal statistics of each country we selected that include the Mean, Minimum, Maximum, and interquartiles.

# List of GRDI values and their corresponding properties
#column, value name, color, symbol
grdi_data = [
    ('GRDI_Mean', 'Mean', 'orange', 'diamond'),
    ('GRDI_Min', 'Min', 'gray', '152'),
    ('GRDI_Max', 'Max', 'gray', '151'),
    ('GRDI_P20', 'Q20', 'blue', '142'),
    ('GRDI_P40', 'Q40', 'purple', '142'),
    ('GRDI_P60', 'Q60', 'green', '142'),
    ('GRDI_P80', 'Q80', 'red', '142')
]

We can create a figure to display the data based on the names colors and symbols we selected.

# Create a figure
fig = go.Figure()

# Add traces to the figure based on the data
for col, name, color, symbol in grdi_data:
    fig.add_trace(go.Scatter(
        x=stats_results[col],
        y=stats_results['Country_Name'],
        mode='markers',
        name=name,
        marker=dict(color=color, size=10, symbol=symbol)
    ))

# Customize layout
fig.update_layout(
    title='GRDI Statistics by Country',
    xaxis_title='GRDI Values',
    yaxis_title='Country Name',
    yaxis=dict(tickmode='linear'),
    legend_title='Statistics',
    yaxis_type='category',
    xaxis=dict(tickvals=[0, 20, 40, 60, 80, 100])
)

# Show plot
fig.show()

Summarizing PM2.5 Values by Socioeconomic Deprivation

Considering the GRDI quartile values as a level of socieoeconomic deprivation within each country, we can use the stats_results GeoDataFrame, the GRDI raster, and the PM2.5 raster to calculate the Mean PM.25 value within each of those areas in each country. This can describe how the air quality for different socioeconomic strata compare within the country, as well as against other countries.

The results will be added to the stats_results with the corresponting columns.

# iterate through the stats_results table rows
for index, row in stats_results.iloc[:].iterrows():
    #isolate each country's respective row
    row_df = gpd.GeoDataFrame([row], geometry='geometry').reset_index(drop=True)
    print(row_df.loc[0,'Country_GID'])
    try:
        #use rioxarray to clip the GRDI and PM2.5 rasters by the geometry of the respective country.
        grdi_country = grdi_raster.rio.clip(row_df.geometry, grdi_raster.rio.crs)
        pm25_country = pm25_raster.rio.clip(row_df.geometry, grdi_raster.rio.crs)
    except:
        print('Error in clip')
        continue

    #Applying squeeze() to this array removes the singleton dimension, reducing it to a 2D array with dimensions (rows, columns)
    grdi_country= grdi_country.squeeze()
    pm25_country= pm25_country.squeeze()


    # Subset the GRDI raster where values fall between each GRDI quintiles
    grdi_countryQ1 = grdi_country.where((grdi_country >= row_df.loc[0, 'GRDI_Min']) & (grdi_country <= row_df.loc[0, 'GRDI_P20']))
    grdi_countryQ2 = grdi_country.where((grdi_country >= row_df.loc[0, 'GRDI_P20']) & (grdi_country <= row_df.loc[0, 'GRDI_P40']))
    grdi_countryQ3 = grdi_country.where((grdi_country >= row_df.loc[0, 'GRDI_P40']) & (grdi_country <= row_df.loc[0, 'GRDI_P60']))
    grdi_countryQ4 = grdi_country.where((grdi_country >= row_df.loc[0, 'GRDI_P60']) & (grdi_country <= row_df.loc[0, 'GRDI_P80']))
    grdi_countryQ5 = grdi_country.where((grdi_country >= row_df.loc[0, 'GRDI_P80']) & (grdi_country <= row_df.loc[0, 'GRDI_Max']))


    # Mask the PM2.5 raster using the above GRDI quartile rasters, keeping only the cells that intersect
    pm25_countryQ1 = pm25_country.where(grdi_countryQ1.notnull())
    pm25_countryQ2 = pm25_country.where(grdi_countryQ2.notnull())
    pm25_countryQ3 = pm25_country.where(grdi_countryQ3.notnull())
    pm25_countryQ4 = pm25_country.where(grdi_countryQ4.notnull())
    pm25_countryQ5 = pm25_country.where(grdi_countryQ5.notnull())

    #Find the mean value of of the intersected PM2.5 rasters in each quartile
    pm25_countryQ1v = pm25_countryQ1.mean().item()
    pm25_countryQ2v = pm25_countryQ2.mean().item()
    pm25_countryQ3v = pm25_countryQ3.mean().item()
    pm25_countryQ4v = pm25_countryQ4.mean().item()
    pm25_countryQ5v = pm25_countryQ5.mean().item()

    #add the resuts to the stats_results table in the respective column
    stats_results.at[index, 'PM25_Q1'] = pm25_countryQ1v
    stats_results.at[index, 'PM25_Q2'] = pm25_countryQ2v
    stats_results.at[index, 'PM25_Q3'] = pm25_countryQ3v
    stats_results.at[index, 'PM25_Q4'] = pm25_countryQ4v
    stats_results.at[index, 'PM25_Q5'] = pm25_countryQ5v

ALB

ARE

ARG

ARM

COL

DZA

FIN

GBR

IDN

MLI

NIC

QAT

SOM

stats_results.head()

	Country_Name	Country_GID	GRDI_Count	GRDI_Min	GRDI_Mean	GRDI_Max	GRDI_Median	GRDI_Std	GRDI_Range	GRDI_P20	GRDI_P40	GRDI_P60	GRDI_P80	geometry	PM25_Q1	PM25_Q2	PM25_Q3	PM25_Q4	PM25_Q5
0	Albania	ALB	19076	8.272310	61.513866	75.395561	66.220139	11.644754	67.123251	55.861691	64.445587	67.104332	68.589424	MULTIPOLYGON (((20.0541 39.6917, 20.0389 39.69...	15.438293	15.031855	14.699892	14.612475	15.709009
1	United Arab Emirates	ARE	18229	5.732072	42.347647	67.470955	45.034630	17.949307	61.738883	24.688160	38.623692	50.710217	61.909931	MULTIPOLYGON (((54.1541 22.7548, 53.3313 22.85...	47.710175	47.893559	47.822292	48.220722	49.729328
2	Argentina	ARG	474297	7.572341	66.341158	81.701645	68.789696	10.416561	74.129304	66.774643	68.278297	69.335510	71.098465	MULTIPOLYGON (((-66.5458 -55.061, -66.5486 -55...	7.519970	5.924110	6.083218	7.562082	8.672854
3	Armenia	ARM	9108	6.944193	59.871617	73.707901	64.272858	12.315193	66.763708	52.692970	61.288189	66.743027	69.004845	MULTIPOLYGON (((45.8319 39.8311, 45.8448 39.82...	19.292490	16.728804	16.334721	16.337336	15.342737
4	Colombia	COL	223557	11.956444	71.523674	84.922409	73.350586	8.762370	72.965965	69.966934	72.190056	74.405403	76.502792	MULTIPOLYGON (((-77.491 4.1451, -77.4985 4.140...	27.425064	29.047859	24.132767	22.309097	20.367249

Plot Results of Mean PM2.5 in Socieceonomic Deprivation Quartiles per country

Similarly, we create a list of tuples of how we want to display the data, and create a figure based on the tuples. This plot would show each country in the y axis and the Log of Mean PM2.5 values in each country’s GRDI quarties.

# List of GRDI values and their corresponding properties
#column, value name, color, symbol
plot_data =[
    ('PM25_Q1', 'Q1', '#440154', '6'),  # Light Blue
    ('PM25_Q2', 'Q2', '#31688E', '5'),  # Light Green
    ('PM25_Q3', 'Q3', '#35B779', '7'),  # Yellow
    ('PM25_Q4', 'Q4', '#FDE725', '8'),  # Orange
    ('PM25_Q5', 'Q5', '#FF0000', '1')   # Red
]

# Create a figure
fig = go.Figure()

# Add traces to the figure.
for col, name, color, symbol in plot_data:
    xlog  = np.log(stats_results[col])
    fig.add_trace(go.Scatter(
        x=xlog,
        y=stats_results['Country_Name'],
        mode='markers+text',  # Add 'text' to mode
        text=[f'<b>{name}</b>' for _ in stats_results[col]],  # Repeat name for each point
        name=name,
        textfont=dict(color=color, size=12),
        textposition='top center',  # Position the text above the symbol
        marker_color=color,
        marker_line_color="midnightblue",
        marker_symbol=symbol,
        marker_size=14,
        marker_line_width=2,
        marker_opacity=0.6
        ))
fig.update_traces(textposition='top center')

    # Customize layout
fig.update_layout(
    title='Mean PM2.5 in each GRDI Quartile by Country',
    xaxis_title='Log of PM2.5 Mean Values',
    yaxis_title='Country Name',
    yaxis=dict(tickmode='linear'),
    legend_title='Statistics',
    yaxis_type='category',
    xaxis=dict(rangemode="tozero"),
    
    #xaxis=dict(tickvals=[0, 20, 40, 60, 80, 100])
    )

# Show plot
fig.show()

Use the plotly controls to take a closer look at the results.

With this shapely plot, We can examine differences between countires and PM2.5 values. The plot displays the coutnries on the Y axis and log values of the average PM2.5 value on the X axis. Each country displays PM2.5 values averaged within the quartile areas based on GRDI values of each country. A higher quartile (Q) implies a higher degree of deprivation, 1 being the lowest and 5 the highest.

Congratulations! …. Now you should be able to:

Test test…

Module 2: Air Quality Home

In this lesson, we explored ….

Module 2: Air Quality

References

Center For International Earth Science Information Network-CIESIN-Columbia University. 2022a. “Global Annual PM2.5 Grids from MODIS, MISR and SeaWiFS Aerosol Optical Depth (AOD), 1998-2019, V4.GL.03.” Palisades, NY: NASA Socioeconomic Data; Applications Center (SEDAC). https://doi.org/10.7927/FX80-4N39.

———. 2022b. “Global Gridded Relative Deprivation Index (GRDI), Version 1.” Palisades, NY: NASA Socioeconomic Data; Applications Center (SEDAC). https://doi.org/10.7927/3XXE-AP97.