SVI, TRI, and Health Outcomes

Overview

In this lesson, you will use….

Learning Objectives

After completing this lesson, you should be able to:

  • Manipulating and processing raster data; mainly loading and cropping.
  • Exploring SVI/raster by mapping and processing different sub datasets (overall/racial/socioeconomic).
  • Connecting to API’s for pollution and public health data.
  • Geolocating tabular data with latitude and longitude.
  • Joining tabular data to spatial data (probably polygons).
  • Connecting to OpenStreetMap API for basemaps.
  • Creating regional basemaps.
  • Layering spatial data on basemaps.
    • Semi-transparent SVI, TRI points, PLACES boundaries.
  • Creating maps with graduated point symbols for point source pollution releases.
  • Creating choropleths with census tract polygons for health outcomes data.
  • Zonal statistics with census tracts for:
    • Mean SVI values

    • Health outcomes summaries weighted by SVI.

  • (Maybe) Interpolate pollution load raster surface from point data.
  • Spatial correlation analysis between SVI, point pollution, and health outcomes. .

Introduction

Air Quality and Environmental Justice:

In many urban areas, air quality issues disproportionately affect low-income communities and communities of color. This disparity is a key focus of environmental justice efforts. In cities like Detroit and Chicago, industrial facilities, highways, and other pollution sources are often concentrated in disadvantaged neighborhoods.

NIMBY and Environmental Justice:

The NIMBY phenomenon can sometimes conflict with environmental justice goals. While wealthier communities may successfully oppose new polluting facilities in their areas, this can lead to these facilities being placed in less affluent neighborhoods with less political clout, exacerbating environmental injustices.

Community-Driven Science:

In response to these challenges, community-driven science initiatives have emerged. These efforts involve local residents in collecting data on air quality and other environmental factors, often using low-cost sensors and mobile monitoring techniques. This approach helps fill gaps in official monitoring networks and empowers communities to advocate for themselves.

Detroit and Chicago Examples:

Detroit: The city has a history of industrial pollution and is working to address air quality issues in areas like Southwest Detroit, where residents face higher exposure to pollutants from nearby industries and heavy truck traffic.

Chicago: The city has seen community efforts to address air pollution in areas like the Southeast Side, where residents have fought against polluting industries and advocated for stricter regulations.

These topics are interconnected, as community-driven science often arises in response to environmental justice concerns, while NIMBY attitudes can influence the distribution of pollution sources across urban areas.

EPA (Environmental Protection Agency):

The EPA is the primary federal agency responsible for environmental protection in the United States. It sets and enforces air quality standards, including the National Ambient Air Quality Standards (NAAQS) for six criteria pollutants. The EPA also maintains the AirNow system, which provides real-time air quality data to the public. CDC (Centers for Disease Control and Prevention): While primarily focused on public health, the CDC plays a crucial role in understanding the health impacts of air pollution. It conducts research on the relationships between air quality and various health outcomes, and provides guidance on protecting public health from environmental hazards. Importance of Open Data: Open data is crucial for community-driven solutions in several ways:

Transparency: Open data allows communities to verify official reports and hold authorities accountable. Accessibility: When air quality data is freely available, communities can use it to inform local decision-making and advocacy efforts. Innovation: Open data enables researchers, activists, and tech developers to create new tools and analyses that can benefit communities. Collaboration: Open data facilitates collaboration between communities, scientists, and policymakers, leading to more effective solutions.

Examples of Open Data in Action:

  • The EPA’s Air Quality System (AQS) database is publicly accessible, allowing researchers and community groups to analyze historical air quality trends.

  • In Chicago, the Array of Things project has installed sensors throughout the city, providing open data on various environmental factors including air quality.

  • Detroit’s Community Air Monitoring Project uses low-cost sensors to collect and share air quality data in areas underserved by official monitoring stations.

  • Community-Driven Solutions: With access to open data, communities can:

  • Identify local air quality hotspots that may be missed by sparse official monitoring networks.

  • Correlate air quality data with health outcomes to strengthen advocacy efforts.

  • Develop targeted interventions, such as promoting indoor air filtration on high-pollution days.

  • Create custom alerts and information systems tailored to local needs.

Challenges and Opportunities: While open data provides many benefits, challenges remain:

  • Ensuring data quality and consistency, especially when integrating data from various sources.

  • Addressing the “digital divide” to ensure all community members can access and use the data. Balancing the need for detailed local data with privacy concerns.

  • Building capacity within communities to effectively use and interpret complex environmental data.

The EPA, CDC, and other agencies are increasingly recognizing the value of community-driven science and open data. This has led to initiatives like the EPA’s Community-Focused Exposure and Risk Screening Tool (C-FERST) and the CDC’s Environmental Public Health Tracking Network, which aim to make environmental health data more accessible to communities.

EGLE and Environmental Justice in Detroit

  • Certainly. Detroit, Michigan has indeed seen significant environmental justice efforts, particularly in collaboration with EGLE (Michigan Department of Environment, Great Lakes, and Energy). Here are some examples:

  • Detroit Environmental Agenda: This community-led initiative works closely with EGLE to address environmental concerns in Detroit. It focuses on issues like air quality, water quality, and waste management.

  • 48217 Community Air Monitoring Project: Named after the zip code of a heavily industrialized area in Southwest Detroit, this project involves community members working with EGLE to monitor air quality using low-cost sensors.

  • Detroit Climate Action Plan: Developed in partnership with EGLE, this plan addresses climate change impacts on the city, with a focus on vulnerable communities.

  • Delray Neighborhood Initiatives: EGLE has been involved in efforts to address air quality concerns in the Delray neighborhood, which is impacted by industrial emissions and heavy truck traffic.

  • Green Door Initiative: This Detroit-based organization collaborates with EGLE on various environmental justice projects, including lead abatement and air quality improvement efforts.

  • Detroit River Sediment Cleanup: EGLE has been involved in efforts to clean up contaminated sediments in the Detroit River, which disproportionately affects nearby low-income communities.

  • Asthma Prevention Programs: EGLE supports community-based asthma prevention programs in Detroit, recognizing the link between air quality and asthma rates in disadvantaged neighborhoods.

These examples demonstrate the ongoing collaboration between community groups, local government, and EGLE to address environmental justice concerns in Detroit. However, as with many environmental justice efforts, challenges remain and work is ongoing.

Data Science Review

This lesson uses the pandas, geopandas, matplotlib, numpy, requests, contextily, pygris, rasterio, and xarray modules. If you’d like to learn more about the functions used in this lesson, you can refer to the documentation on their respective websites.

The pandas module is essential for data manipulation and analysis, while geopandas extends its functionality to handle geospatial data. matplotlib is used for creating static, animated, and interactive visualizations. numpy provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.

The requests module allows you to send HTTP requests and interact with APIs easily. contextily adds basemaps to your plots, enhancing the visual context of your geospatial data. pygris simplifies the process of working with US Census Bureau TIGER/Line shapefiles.

rasterio and xarray are used for working with geospatial raster data. rasterio reads and writes geospatial raster datasets, while xarray introduces labels in the form of dimensions, coordinates, and attributes on top of raw NumPy-like arrays, making it easier to work with labeled multi-dimensional arrays.

Make sure these modules are installed before you begin working with the code in this document. You can install them using pip:

Load the Data

SVI and Detroit Metro

SVI has a primary overall SVI score, but also provides sublayers. These include minority, socioeconomic, housing, and household data.

import xarray as xr
import rasterio
import rasterio.mask
import geopandas as gpd
import matplotlib.pyplot as plt
import pygris
import numpy as np

# Fetch Detroit metro area counties using pygris
metro_counties = pygris.counties(state="MI", year=2022)
detroit_metro = metro_counties[metro_counties['NAME'].isin([
    'Wayne', 'Oakland', 'Macomb', 'Livingston', 'St. Clair', 'Lapeer'
])]

# Dissolve the counties into a single polygon
detroit_metro = detroit_metro.dissolve(by='STATEFP')

# Convert to GeoDataFrame
detroit_metro = gpd.GeoDataFrame(detroit_metro, geometry='geometry', crs='EPSG:4269')

# Specify the TIF files
tif_files = [
    "data/svi/svi_2020_tract_overall_wgs84.tif",
    "data/svi/svi_2020_tract_minority_wgs84.tif",
    "data/svi/svi_2020_tract_socioeconomic_wgs84.tif",
    "data/svi/svi_2020_tract_housing_wgs84.tif",
    "data/svi/svi_2020_tract_household_wgs84.tif"
]

# Create an empty list to store the individual DataArrays
data_arrays = []

# Read each TIF file, clip it to Detroit metro's extent, and append it to the list
for file in tif_files:
    with rasterio.open(file) as src:
        # Reproject Detroit metro boundary to match the raster CRS
        metro_reprojected = detroit_metro.to_crs(src.crs)
        
        # Clip the raster to Detroit metro's geometry
        out_image, out_transform = rasterio.mask.mask(src, metro_reprojected.geometry, crop=True)
        out_meta = src.meta.copy()
        
        # Update the metadata
        out_meta.update({"driver": "GTiff",
                         "height": out_image.shape[1],
                         "width": out_image.shape[2],
                         "transform": out_transform})
        
        # Create coordinates
        height = out_meta['height']
        width = out_meta['width']
        cols, rows = np.meshgrid(np.arange(width), np.arange(height))
        xs, ys = rasterio.transform.xy(out_transform, rows, cols)
        
        # Convert lists to numpy arrays
        xs = np.array(xs)
        ys = np.array(ys)
        
        # Reshape coordinates to match dimensions of the raster
        xs = xs.reshape(height, width)
        ys = ys.reshape(height, width)
        
        # Create a DataArray from the clipped data
        da = xr.DataArray(out_image[0],  # Use the first band
                          coords={'y': ('y', ys[:, 0]),
                                  'x': ('x', xs[0, :])},
                          dims=['y', 'x'])
        da.attrs['crs'] = str(src.crs)  # Convert CRS to string
        da.attrs['transform'] = out_transform
        data_arrays.append(da)

# Combine all DataArrays into a single DataSet
ds = xr.concat(data_arrays, dim='layer')

# Rename the layers using the appropriate dimension
ds = ds.assign_coords(layer=('layer', ['Overall', 'Minority', 'Socioeconomic', 'Housing', 'Household']))

# Define the colorbar limits
vmin, vmax = 0, 1

# Create a multipanel plot
fig, axes = plt.subplots(3, 2, figsize=(15, 20))
axes = axes.flatten()

# Plot each layer
for i, layer in enumerate(ds.layer.values):
    # Plot with custom color limits
    im = ds.sel(layer=layer).plot(ax=axes[i], add_colorbar=False, vmin=vmin, vmax=vmax, cmap='viridis')
    axes[i].set_title(layer)
    
    # Plot Detroit metro boundary
    metro_reprojected.boundary.plot(ax=axes[i], color='red', linewidth=1)

# Remove the extra subplot
fig.delaxes(axes[5])

# Add a single colorbar
cbar_ax = fig.add_axes([0.92, 0.15, 0.02, 0.7])
cbar = fig.colorbar(im, cax=cbar_ax, label='SVI Score')

plt.tight_layout()
plt.show()

TRI API Column Definitions

Column Name Description
FACILITY_NAME The name of the facility reporting to TRI
TRI_FACILITY_ID A unique identifier for the facility in the TRI database
STREET_ADDRESS The street address of the facility
CITY_NAME The city where the facility is located
COUNTY_NAME The county where the facility is located
STATE_ABBR The two-letter abbreviation for the state where the facility is located
ZIP_CODE The ZIP code of the facility’s location
PREF_LATITUDE The preferred latitude coordinate of the facility
PREF_LONGITUDE The preferred longitude coordinate of the facility
PARENT_CO_NAME The name of the parent company, if applicable
INDUSTRY_SECTOR_CODE A code representing the industry sector of the facility
INDUSTRY_SECTOR A description of the industry

Note: The availability and exact names of these columns may vary depending on the specific TRI API endpoint and query parameters used. Always refer to the official EPA TRI documentation for the most up-to-date and comprehensive information.

Important Considerations

  1. Not all columns may be present in every API response.
  2. Column names may have slight variations (e.g., with or without underscores).
  3. The EPA occasionally updates their API and data structure.
  4. Some columns related to chemical releases and waste management may have additional variations or breakdowns.
  5. Numeric values (like release amounts) are typically reported in pounds, but always verify the units.
  6. For coordinates (PREF_LATITUDE and PREF_LONGITUDE), be aware that these are the preferred coordinates, which may have been adjusted for accuracy or privacy reasons.

ENVIROFACTS TRI Facilities w/ API

import xarray as xr
import rasterio
import rasterio.mask
import geopandas as gpd
import matplotlib.pyplot as plt
import pygris
import numpy as np
from shapely.geometry import box
import pandas as pd
import requests
import contextily as ctx

# Fetch Detroit metro area counties using pygris
metro_counties = pygris.counties(state="MI", year=2022)
detroit_metro = metro_counties[metro_counties['NAME'].isin([
    'Wayne', 'Oakland', 'Macomb', 'Livingston', 'St. Clair', 'Lapeer'
])]

# Dissolve the counties into a single polygon
detroit_metro = detroit_metro.dissolve(by='STATEFP')

# Get the bounding box
bbox = detroit_metro.total_bounds

# Print the bounding box coordinates
print("Bounding Box:")
print(f"Minimum X (Longitude): {bbox[0]}")
print(f"Minimum Y (Latitude): {bbox[1]}")
print(f"Maximum X (Longitude): {bbox[2]}")
print(f"Maximum Y (Latitude): {bbox[3]}")

# Create the bounding box as a polygon
bbox_polygon = gpd.GeoDataFrame(
    geometry=[box(*bbox)],
    crs=detroit_metro.crs
)

# Fetch TRI facility data from EPA API for each county
counties = ['Wayne', 'Oakland', 'Macomb', 'Livingston', 'St. Clair', 'Lapeer']
tri_data = []

for county in counties:
    api_url = f"https://data.epa.gov/efservice/tri_facility/state_abbr/MI/county_name/{county}/JSON"
    response = requests.get(api_url)
    if response.status_code == 200:
        county_data = response.json()
        tri_data.extend(county_data)
    else:
        print(f"Failed to fetch data for {county} County. Status code: {response.status_code}")

# Convert TRI data to a DataFrame
tri_df = pd.DataFrame(tri_data)

print(f"Number of facilities fetched: {len(tri_df)}")

# Create a copy of the dataframe to avoid SettingWithCopyWarning
tri_df_clean = tri_df.copy()

# Remove facilities with empty latitude or longitude values
tri_df_clean = tri_df_clean.dropna(subset=['pref_latitude', 'pref_longitude'])

print(f"Number of facilities after removing empty coordinates: {len(tri_df_clean)}")

# Convert latitude and longitude to numeric type
tri_df_clean['pref_latitude'] = pd.to_numeric(tri_df_clean['pref_latitude'], errors='coerce')
tri_df_clean['pref_longitude'] = pd.to_numeric(tri_df_clean['pref_longitude'], errors='coerce')

# Function to correct longitude
def correct_longitude(lon):
    if lon > 0:
        return -lon
    return lon

# Apply longitude correction
tri_df_clean['pref_longitude'] = tri_df_clean['pref_longitude'].apply(correct_longitude)

# Calculate IQR for longitude
Q1 = tri_df_clean['pref_longitude'].quantile(0.25)
Q3 = tri_df_clean['pref_longitude'].quantile(0.75)
IQR = Q3 - Q1

# Define bounds for outliers
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

# Remove outliers
tri_df_clean = tri_df_clean[(tri_df_clean['pref_longitude'] >= lower_bound) & 
                            (tri_df_clean['pref_longitude'] <= upper_bound)]

print(f"Number of facilities after removing longitude outliers: {len(tri_df_clean)}")

# Create a GeoDataFrame from the cleaned TRI data
detroit_tri = gpd.GeoDataFrame(
    tri_df_clean, 
    geometry=gpd.points_from_xy(tri_df_clean.pref_longitude, tri_df_clean.pref_latitude),
    crs="EPSG:4326"
)

# Reproject data to Web Mercator for contextily
detroit_metro = detroit_metro.to_crs(epsg=3857)
bbox_polygon = bbox_polygon.to_crs(epsg=3857)
detroit_tri = detroit_tri.to_crs(epsg=3857)

# Create the plot
fig, ax = plt.subplots(figsize=(15, 15))

# Plot the metro area and bounding box
detroit_metro.plot(ax=ax, color='lightblue', edgecolor='black', alpha=0.5)
bbox_polygon.boundary.plot(ax=ax, color='red', linewidth=2)

# Plot TRI facilities
detroit_tri.plot(ax=ax, color='red', markersize=50, alpha=0.7)

# Add the basemap
ctx.add_basemap(ax, source=ctx.providers.OpenStreetMap.Mapnik)

# Set the extent of the map to the bounding box
ax.set_xlim(bbox_polygon.total_bounds[0], bbox_polygon.total_bounds[2])
ax.set_ylim(bbox_polygon.total_bounds[1], bbox_polygon.total_bounds[3])

# Remove axes
ax.set_axis_off()

plt.title("Detroit Metro Area TRI Facilities", fontsize=16)
plt.tight_layout()
plt.show()

print(f"Final number of TRI facilities in the Detroit metro area: {len(detroit_tri)}")

Regulated Facilities w/ ICIS-AIR

import pandas as pd
import requests
import time

# Base URL for ECHO ICIS-AIR API
base_url = "https://echodata.epa.gov/echo/air_rest_services"

# Parameters for the initial API call
params = {
    "output": "JSON",
    "p_st": "MI"
}

def get_facilities():
    response = requests.get(f"{base_url}.get_facilities", params=params)
    if response.status_code == 200:
        data = response.json()
        if 'Results' in data:
            qid = data['Results']['QueryID']
            print(f"Query ID: {qid}")
            print(f"Total Facilities: {data['Results']['QueryRows']}")
            return qid
    print("Failed to get facilities and QID")
    return None

def get_facility_data(qid):
    all_facilities = []
    page = 1
    while True:
        params = {"qid": qid, "pageno": page, "output": "JSON"}
        response = requests.get(f"{base_url}.get_qid", params=params)
        if response.status_code == 200:
            data = response.json()
            if 'Results' in data and 'Facilities' in data['Results']:
                facilities = data['Results']['Facilities']
                if not facilities:  # No more facilities to retrieve
                    break
                all_facilities.extend(facilities)
                print(f"Retrieved page {page}")
                page += 1
            else:
                break
        else:
            print(f"Failed to retrieve page {page}")
            break
    return all_facilities

# Step 1: Get the Query ID
qid = get_facilities()

if qid:
    # Step 2: Use get_qid to retrieve all facility data
    print("Retrieving facility data...")
    facilities = get_facility_data(qid)
    
    # Convert to DataFrame
    df_icis_air = pd.DataFrame(facilities)
    
    print(f"\nSuccessfully retrieved {len(df_icis_air)} ICIS-AIR facilities for Michigan")
    print("\nColumns in the dataset:")
    print(df_icis_air.columns)
    
    # Display the first few rows
    print("\nFirst few rows of the data:")
    print(df_icis_air.head())
    
    # Save to CSV
else:
    print("Failed to retrieve facility data")

    import pandas as pd

# List of Detroit metro counties
metro_counties = ['Wayne', 'Oakland', 'Macomb', 'Livingston', 'St. Clair', 'Lapeer']

# Subset the dataframe to include only the Detroit metro counties
df_detroit_metro = df_icis_air[df_icis_air['AIRCounty'].isin(metro_counties)]

# Print information about the subset
print(f"Total ICIS-AIR facilities in Michigan: {len(df_icis_air)}")
print(f"ICIS-AIR facilities in Detroit metro area: {len(df_detroit_metro)}")

# Display the count of facilities in each metro county
print("\nFacilities per county:")
print(df_detroit_metro['AIRCounty'].value_counts())

# Display the first few rows of the subset
print("\nFirst few rows of the Detroit metro ICIS-AIR facilities:")
print(df_detroit_metro.head())

# Additional information: unique values in AIRCounty
print("\nUnique values in AIRCounty column:")
print(df_icis_air['AIRCounty'].unique())

Create geopandas and plot.

# Count records with missing coordinate values
missing_coords = df_detroit_metro[(df_detroit_metro['FacLat'].isnull()) | (df_detroit_metro['FacLong'].isnull())]
print(f"Number of ICIS-AIR records with missing coordinates: {len(missing_coords)}")

# Remove records with missing coordinates
df_detroit_metro = df_detroit_metro.dropna(subset=['FacLat', 'FacLong'])

# Create a GeoDataFrame for ICIS-AIR facilities
gdf_icis_air = gpd.GeoDataFrame(
    df_detroit_metro, 
    geometry=gpd.points_from_xy(df_detroit_metro.FacLong, df_detroit_metro.FacLat),
    crs="EPSG:4326"
)

# Reproject ICIS-AIR data to Web Mercator
gdf_icis_air = gdf_icis_air.to_crs(epsg=3857)

# Create the plot
fig, ax = plt.subplots(figsize=(15, 15))

# Plot the metro area and bounding box (reusing objects from earlier)
detroit_metro.plot(ax=ax, facecolor='none', edgecolor='blue', linewidth=2)
bbox_polygon.boundary.plot(ax=ax, color='red', linewidth=2)

# Plot ICIS-AIR facilities
gdf_icis_air.plot(ax=ax, color='cyan', markersize=50, alpha=0.7, label='ICIS-AIR Facilities')

# Plot TRI facilities (reusing the detroit_tri object from earlier)
detroit_tri.plot(ax=ax, color='purple', markersize=50, alpha=0.7, label='TRI Facilities')

# Add the basemap
ctx.add_basemap(ax, source=ctx.providers.OpenStreetMap.Mapnik)

# Set the extent of the map to the bounding box
ax.set_xlim(bbox_polygon.total_bounds[0], bbox_polygon.total_bounds[2])
ax.set_ylim(bbox_polygon.total_bounds[1], bbox_polygon.total_bounds[3])

# Remove axes
ax.set_axis_off()

# Add legend
ax.legend()

plt.title("Detroit Metro Area TRI and ICIS-AIR Facilities", fontsize=16)
plt.tight_layout()
plt.show()

print(f"Number of TRI facilities plotted: {len(detroit_tri)}")
print(f"Number of ICIS-AIR facilities plotted: {len(gdf_icis_air)}")

Air Release Vulnerability Index

import numpy as np
import xarray as xr
import rioxarray
import geopandas as gpd
import matplotlib.pyplot as plt
from rasterio.enums import Resampling
from scipy import stats

# Select the 'Overall' layer
svi_overall = ds.sel(layer='Overall')

# Convert to rioxarray for geospatial operations
svi_overall = svi_overall.rio.write_crs("EPSG:4326")

# Reproject SVI to match the CRS of the air release raster
svi_reprojected = svi_overall.rio.reproject_match(clipped_raster)

# Clip SVI raster to the Detroit metro boundary
svi_clipped = svi_reprojected.rio.clip(detroit_metro.geometry.values, detroit_metro.crs, drop=True, all_touched=True)

# Disaggregate the air release data to match the resolution of the SVI data
air_release_disaggregated = clipped_raster.rio.reproject_match(
    svi_clipped,
    resampling=Resampling.bilinear
)

# Calculate raster correlation between SVI overall and air release within Detroit metro boundary
svi_flat = svi_clipped.values.flatten()
air_flat = air_release_disaggregated.values.flatten()

# Remove NaN values
mask = ~np.isnan(svi_flat) & ~np.isnan(air_flat)
svi_flat = svi_flat[mask]
air_flat = air_flat[mask]

correlation, p_value = stats.pearsonr(svi_flat, air_flat)

print(f"Raster Correlation between SVI Overall and Air Release within Detroit metro: {correlation:.4f}")
print(f"P-value: {p_value:.4f}")

# New correlation analysis
# Ensure gdf_tri_custom is in the same CRS as svi_clipped
gdf_tri_custom = gdf_tri_custom.to_crs(svi_clipped.rio.crs)

# Extract SVI values at TRI facility locations
svi_values = []
for point in gdf_tri_custom.geometry:
    svi_value = svi_clipped.sel(x=point.x, y=point.y, method="nearest").values
    svi_values.append(svi_value)

# Add SVI values to gdf_tri_custom
gdf_tri_custom['SVI_OVERALL'] = svi_values

# Perform correlation analysis
correlation_points, p_value_points = stats.pearsonr(gdf_tri_custom['AIR_TOTAL_RELEASE'], gdf_tri_custom['SVI_OVERALL'])

print(f"Point-based Correlation between SVI Overall and AIR_TOTAL_RELEASE: {correlation_points:.4f}")
print(f"P-value: {p_value_points:.4f}")

# Log1p transform the air release data and scale to 0-1
air_release_log = np.log1p(air_release_disaggregated)
air_release_scaled = (air_release_log - air_release_log.min()) / (air_release_log.max() - air_release_log.min())

# Multiply scaled air release data with SVI data
vulnerability_indicator = air_release_scaled * svi_clipped

# Create the plots
fig, axs = plt.subplots(3, 1, figsize=(15, 45))

# Plot SVI Overall
im1 = svi_clipped.plot(ax=axs[0], cmap='viridis', vmin=0, vmax=1, add_colorbar=False)
plt.colorbar(im1, ax=axs[0], label='SVI Overall')
axs[0].set_title('Social Vulnerability Index (Overall)', fontsize=16)
detroit_metro.boundary.plot(ax=axs[0], color='black', linewidth=2)

# Plot Original Air Release (log-transformed for better visualization)
im2 = np.log1p(air_release_disaggregated).plot(ax=axs[1], cmap='YlOrRd', add_colorbar=False)
plt.colorbar(im2, ax=axs[1], label='Log(Air Release + 1)')
axs[1].set_title('Air Release (Log-transformed)', fontsize=16)
detroit_metro.boundary.plot(ax=axs[1], color='black', linewidth=2)

# Plot Air Release Vulnerability Indicator
im3 = vulnerability_indicator.plot(ax=axs[2], cmap='YlOrRd', vmin=0, vmax=1, add_colorbar=False)
plt.colorbar(im3, ax=axs[2], label='Air Release Vulnerability Indicator')
axs[2].set_title('Air Release Vulnerability Indicator\n(Scaled Air Release * SVI)', fontsize=16)
detroit_metro.boundary.plot(ax=axs[2], color='black', linewidth=2)

for ax in axs:
    ax.set_xlabel('Longitude')
    ax.set_ylabel('Latitude')
    ax.set_xlim(svi_clipped.x.min(), svi_clipped.x.max())
    ax.set_ylim(svi_clipped.y.min(), svi_clipped.y.max())

plt.tight_layout()
plt.show()

# Print some statistics
print(f"Minimum vulnerability indicator: {vulnerability_indicator.min().values:.4f}")
print(f"Maximum vulnerability indicator: {vulnerability_indicator.max().values:.4f}")
print(f"Mean vulnerability indicator: {vulnerability_indicator.mean().values:.4f}")
# Convert the vulnerability indicator to a pandas DataFrame
vulnerability_df = vulnerability_indicator.to_dataframe(name='index').reset_index()

# Sort by index value and get the top 10
top_10 = vulnerability_df.sort_values('index', ascending=False).head(10)

# Create points from the coordinates
top_10['geometry'] = gpd.points_from_xy(top_10.x, top_10.y)
top_10_gdf = gpd.GeoDataFrame(top_10, geometry='geometry', crs=vulnerability_indicator.rio.crs)

# Create the final map
fig, ax = plt.subplots(figsize=(15, 15))

# Plot the Detroit metro boundary
detroit_metro.boundary.plot(ax=ax, color='black', linewidth=2)

# Plot the top 10 points
top_10_gdf.plot(ax=ax, color='blue', markersize=100, alpha=0.7)

# Add labels to the points
for idx, row in top_10_gdf.iterrows():
    ax.annotate(f"#{idx+1}", (row.geometry.x, row.geometry.y), 
                xytext=(3, 3), textcoords="offset points", 
                color='white', fontweight='bold')

# Add a basemap
ctx.add_basemap(ax, crs=vulnerability_indicator.rio.crs.to_string(), source=ctx.providers.OpenStreetMap.Mapnik)

# Set the extent to match the Detroit metro area
ax.set_xlim(vulnerability_indicator.x.min(), vulnerability_indicator.x.max())
ax.set_ylim(vulnerability_indicator.y.min(), vulnerability_indicator.y.max())

ax.set_title('Top 10 Areas with Highest Air Release Vulnerability Index', fontsize=16)
ax.set_axis_off()

plt.tight_layout()
plt.show()

# Print the coordinates of the top 10 points
print("Coordinates of the top 10 points:")
for idx, row in top_10_gdf.iterrows():
    print(f"#{idx+1}: ({row.geometry.x}, {row.geometry.y})")

PLACES

The CDC PLACES (Population Level Analysis and Community Estimates) dataset is a collaboration between the Centers for Disease Control and Prevention (CDC), the Robert Wood Johnson Foundation, and the CDC Foundation. It provides model-based population-level analysis and community estimates of health indicators for all counties, places (incorporated and census designated places), census tracts, and ZIP Code Tabulation Areas (ZCTAs) across the United States.

Key Points

  1. Geographic Coverage: Entire United States, including all 50 states, the District of Columbia, and Puerto Rico.
  2. Geographic Granularity: Multiple levels including counties, cities/towns, census tracts, and ZIP codes.
  3. Health Indicators: Wide range of chronic disease measures related to health outcomes, prevention, and health risk behaviors.
  4. Data Sources:
    • Behavioral Risk Factor Surveillance System (BRFSS)
    • U.S. Census Bureau’s American Community Survey (ACS)
  5. Methodology: Uses small area estimation methods for small geographic areas.
  6. Health Measures Include:
    • Chronic diseases: e.g., asthma, COPD, heart disease, diabetes
    • Health risk behaviors: e.g., smoking, physical inactivity, binge drinking
    • Prevention practices: e.g., health insurance coverage, dental visits, cholesterol screening
  7. Socioeconomic Data: Includes some socioeconomic and demographic variables.
  8. Annual Updates: Providing recent estimates for local areas.

This dataset is particularly valuable for: - Public health researchers - Policymakers - Community organizations

It provides a standardized way to compare health indicators across different geographic areas and can be used to inform targeted interventions and policy decisions, especially in addressing health disparities at a local level.

Processing

import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import requests
import contextily as ctx

# Define the GeoJSON API endpoint
url = "https://data.cdc.gov/resource/cwsq-ngmh.geojson"

# Define the Detroit metro area counties
detroit_counties = ['Wayne', 'Oakland', 'Macomb', 'Livingston', 'St. Clair', 'Lapeer']

# Create the county filter string
county_filter = " OR ".join([f"countyname = '{county}'" for county in detroit_counties])

# Define the query parameters
params = {
    "$where": f"stateabbr = 'MI' AND ({county_filter})",
    "$limit": 50000  # Adjust if necessary
}

# Make the API request
response = requests.get(url, params=params)

if response.status_code == 200:
    data = response.json()
    print(f"Successfully retrieved data")
else:
    print(f"Failed to retrieve data. Status code: {response.status_code}")
    print(response.text)

# Convert to GeoDataFrame
gdf = gpd.read_file(response.text)

# Print available health measures
print("\nAvailable health measures:")
print(gdf['measure'].unique())

# Print the first few rows to see the structure of the data
print("\nFirst few rows of the GeoDataFrame:")
print(gdf.head())

# Print basic information about the GeoDataFrame
print("\nGeoDataFrame Info:")
print(gdf.info())

# Create a sample map for one health measure (e.g., Current asthma)
fig, ax = plt.subplots(figsize=(15, 15))

# Filter for the specific measure and ensure data_value is numeric
gdf_asthma = gdf[gdf['measure'] == 'Current asthma among adults'].copy()
gdf_asthma['data_value'] = pd.to_numeric(gdf_asthma['data_value'], errors='coerce')

# Plot the asthma data
gdf_asthma.plot(column='data_value', 
                ax=ax, 
                legend=True, 
                legend_kwds={'label': 'Asthma Prevalence (%)', 'orientation': 'horizontal'},
                cmap='YlOrRd',
                missing_kwds={'color': 'lightgrey'})

# Add county boundaries
#gdf_asthma.dissolve(by='countyname').boundary.plot(ax=ax, color='black', linewidth=0.5)

# Add basemap
ctx.add_basemap(ax, crs=gdf_asthma.crs.to_string(), source=ctx.providers.OpenStreetMap.Mapnik)

# Set the extent to match the Detroit metro area
ax.set_xlim(gdf_asthma.total_bounds[0], gdf_asthma.total_bounds[2])
ax.set_ylim(gdf_asthma.total_bounds[1], gdf_asthma.total_bounds[3])

plt.title('Asthma Prevalence in Detroit Metro Area', fontsize=16)
ax.axis('off')

plt.tight_layout()
plt.show()

# Print some statistics for the asthma data
print("\nAsthma Statistics:")
print(f"Average asthma prevalence: {gdf_asthma['data_value'].mean():.2f}%")
print(f"Minimum asthma prevalence: {gdf_asthma['data_value'].min():.2f}%")
print(f"Maximum asthma prevalence: {gdf_asthma['data_value'].max():.2f}%")

# Print the number of census tracts per county
print("\nNumber of census tracts per county:")
print(gdf_asthma['countyname'].value_counts())
import numpy as np
import geopandas as gpd
import matplotlib.pyplot as plt
import contextily as ctx
from scipy.interpolate import griddata
import rasterio
from rasterio.transform import from_origin

# Assuming gdf_asthma is already created and contains the asthma data

# Ensure gdf_asthma is in EPSG:4326
gdf_asthma = gdf_asthma.to_crs(epsg=4326)

# Extract coordinates and values
X = gdf_asthma.geometry.x.values
Y = gdf_asthma.geometry.y.values
Z = gdf_asthma['data_value'].values

# Remove any NaN values
mask = ~np.isnan(Z)
X, Y, Z = X[mask], Y[mask], Z[mask]

# Create a grid to interpolate over
grid_resolution = 0.025  # in degrees
x_min, y_min, x_max, y_max = gdf_asthma.total_bounds
grid_x = np.arange(x_min, x_max, grid_resolution)
grid_y = np.arange(y_min, y_max, grid_resolution)
grid_xx, grid_yy = np.meshgrid(grid_x, grid_y)

# Perform IDW interpolation
points = np.column_stack((X, Y))
grid_z = griddata(points, Z, (grid_xx, grid_yy), method='linear')

# Create the plot
fig, ax = plt.subplots(figsize=(15, 15))

# Plot the interpolated data
im = ax.imshow(grid_z, extent=[x_min, x_max, y_min, y_max], 
               origin='lower', cmap='YlOrRd', alpha=0.7)

# Add colorbar
cbar = plt.colorbar(im, ax=ax, label='Asthma Prevalence')

# Add basemap
ctx.add_basemap(ax, crs=gdf_asthma.crs.to_string(), source=ctx.providers.OpenStreetMap.Mapnik)

# Set the extent to match the Detroit metro area
ax.set_xlim(x_min, x_max)
ax.set_ylim(y_min, y_max)

plt.title('IDW Interpolated Asthma Prevalence in Detroit Metro Area', fontsize=16)
ax.axis('off')

plt.tight_layout()
plt.show()

# Save the interpolated raster
transform = from_origin(x_min, y_max, grid_resolution, grid_resolution)
with rasterio.open('idw_asthma.tif', 'w', 
                   driver='GTiff', 
                   height=grid_z.shape[0], 
                   width=grid_z.shape[1], 
                   count=1, 
                   dtype=grid_z.dtype, 
                   crs='EPSG:4326',  # Explicitly set the CRS
                   transform=transform) as dst:
    dst.write(grid_z, 1)

# Print some statistics about the interpolated data
print(f"Minimum interpolated value: {np.nanmin(grid_z):.2f}")
print(f"Maximum interpolated value: {np.nanmax(grid_z):.2f}")
print(f"Mean interpolated value: {np.nanmean(grid_z):.2f}")
Data Review

The Toxics Release Inventory (TRI) and the Integrated Compliance Information System for Air (ICIS-AIR) are two important but distinct environmental reporting systems maintained by the U.S. Environmental Protection Agency (EPA). They have several key differences:

  • Regulatory Basis
    • TRI: Established under the Emergency Planning and Community Right-to-Know Act (EPCRA) of 1986
    • ICIS-AIR: Part of the Clean Air Act (CAA) compliance and enforcement program
  • Focus
    • TRI: Tracks the management of certain toxic chemicals that may pose a threat to human health and the environment
    • ICIS-AIR: Focuses specifically on air quality and emissions from facilities regulated under the Clean Air Act
  • Reported Information
    • TRI: Facilities report on releases, waste management, and pollution prevention activities for specific toxic chemicals
    • ICIS-AIR: Tracks emissions data, compliance status, and enforcement actions related to air quality regulations
  • Facility Coverage
    • TRI: Covers facilities in specific industries that manufacture, process, or use TRI-listed chemicals above certain thresholds
    • ICIS-AIR: Includes a broader range of facilities that emit air pollutants, regardless of the specific chemicals involved
  • Reporting Thresholds
    • TRI: Has specific chemical thresholds that trigger reporting requirements
    • ICIS-AIR: Generally doesn’t have chemical-specific thresholds; requirements are based on overall emissions and facility type
  • Public Accessibility
    • TRI: Designed with a strong focus on public right-to-know, with data easily accessible to the public
    • ICIS-AIR: While public, it’s primarily designed for regulatory and enforcement purposes
  • Data Frequency
    • TRI: Annual reporting is required for covered facilities
    • ICIS-AIR: May involve more frequent reporting, depending on permit requirements and compliance status
  • Scope of Pollutants
    • TRI: Focuses on a specific list of toxic chemicals and chemical categories
    • ICIS-AIR: Covers a wider range of air pollutants, including criteria air pollutants and hazardous air pollutants
  • Use in Environmental Management
    • TRI: Often used for assessing long-term trends in toxic chemical releases and waste management practices
    • ICIS-AIR: More commonly used for day-to-day air quality management and enforcement activities
  • Geographic Coverage
    • TRI: Nationwide program with consistent reporting across states
    • ICIS-AIR: While national, implementation can vary more by state or local air quality management district

Deployment

Congratulations! …. Now you should be able to:

  • Test test…

Lesson 3

In this lesson, we explored ….

Lesson 3