import arcgis
from arcgis.gis import GIS
# Standard libraries
import os
import glob
import re
from zipfile import ZipFile
from dotenv import load_dotenv
import pprint
import requests
# Data manipulation and plotting
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as colors
from matplotlib.ticker import FuncFormatter
# Geospatial libraries
import geopandas as gpd
import arcgis
from arcgis.gis import GIS
from arcgis.features import GeoAccessor, GeoSeriesAccessor
from arcgis.raster import Raster
import arcgis.mapping # For using WebMap, MapView, etc.
# Earthdata access
import earthaccess as ea
# IPUMS API and DDI access
from ipumspy.api import IpumsApiClient
from ipumspy import AggregateDataExtract, readers, ddi
from ipumspy import readers, ddi
from ipumspy.api import IpumsApiClient
from ipumspy import AggregateDataExtract
from ipumspy.api.extract import NhgisDataset
import seaborn as sns
import matplotlib.pyplot as plt
Who is Exposed to Coastal Hazards in Puerto Rico?
Datasets:
- VIIRS Nighttime Lights
- LECZ
- NHGIS
- LiDAR SAR
Areas of Interest (AOIs):
- Puerto Rico (PRI)
Functions:
- Image segmentation
- Validation with SAR or LiDAR
Overview
In this lesson, you will use a dataset delineating Low Elevation Coastal Zones from the NASA Socioeconomic Data and Applications Center (SEDAC) website along with census data from the US Census Bureau from the IPUMS National Historical Geographic Information System (NHGIS).
You will perform various preprocessing tasks to combina tabular and spatial boundary data for analysis. These steps include exploring the dataset with visualizations and thematic mapping tools. You will then learn how to generate summary statistics based on combinations of these two layers, over two different time periods.
You will learn how to perform numerous data manipulations, create statistical summaries of population-at-risk (and related housing characteristics), and examine a decade of change throughout this lesson.
Learning Objectives
- Become familiar with Census Block and County census data for 5-year age groups from the U.S. Census.
- Become familiar with Low Elevation Coastal Zones (LECZs) in Puerto Rico and explain their significance (as well as limitations) in assessing coastal hazard exposure.
- Access, integrate, explore, and use LECZ data from NASA SEDAC and demographic data from the US Census Bureau for Puerto Rico.
- Assess decadal changes (2010–2020) in population and housing characteristics in coastal versus non-coastal zones in Puerto Rico.
- Create regional and local scale maps and statistical figures of exposure and decadal change.
- Identify venues for sharing output (for example, discussion board associated with data, policy briefs, op-eds).
Introduction
Low Elevation Coastal Zones (LECZs)
Low Elevation Coastal Zones (LECZs) have been defined globally, with population estimates for areas below 5m and 10m elevations (McGranahan, Balk, and Anderson 2007; MacManus et al. 2021).
In the continental U.S. (CONUS), 1 in 10 people live in the 10m LECZ, and studies highlight that urban residents, people of color, and older adults are disproportionately exposed. For instance, about 1 in 5 urban Black residents live in this zone (Tagtachian and Balk 2023; Hauer et al. 2020).
This is the sneak peak of what LECZ looks like:
You may wonder why studies of the “entire” US often restrict themselves to the CONUS? The simplest answer is limitations either data or computational power. For example, this happens because of incomplete coverage in one data set or another. Some US territories may not collect the full suite of census variables that are collected in CONUS.
For example the detail on housing characteristics is limited in Guam, Northern Mariana Islands, US Virgin Islands and American Samoa, and the Census’ American Community Survey is not conducted in any of the territories, though Puerto Rico conducts its own Community Survey. In some other cases, data collected from satellites (such as SRTM) have variable accuracy toward polar regions U.S Census Bureau.
Another possible reason for omission outside of CONUS could be computational challenges or limitations. For instance US territories are subject to different map projections, which implies the need for additional functions in processing algorithms to account for spatial variations and to unify spatial structures.
“What is a Map Projection?”
Map projections have existed for thousands of years. They help map makers render the spherical surface of the Earth flat – so it can be seen on a piece of paper or computer screen, and so that the unit of measure is uniform throughout the area of interest. As a result, map projections distort something – area, shape, distance, direction – to make this possible.
Here are some resources to learn more about map projections:
There are many resources to guide a new learner, so enjoy learning!
Accessing Data
This lesson uses the Python language. These are the equired packages to run this lesson impumspy
, pandas
, arcgis
, zipfile
, and numpy
. Optional packages include glob
, geopandas
, and earthaccess
.
Use import
to import the necessary packages:
This lesson uses arcgis version of 2.4.0 or higher:
# Check the arcgis version for mapping properly
arcgis.__version__
'2.4.1.1'
If arcgis version is lower, use pip install
:
==2.4.1.1 pip install arcgis
Using IPUMS API to pull U.S Census Data for Puerto Rico
Registering to IPUMS and the National Historical Geographic Information System (NHGIS)
In order to retrieve an IPUMS API Key, you will have to register for an account for IPUMS and request your API Key.
Additionally register, to The National Historical Geographic Information System (NHGIS)
After you requested your IPUMS API, to the NHGIS, store it in os.env
format. You will need your registration email and the API Key:
# Step 1: Load API key from .env file
load_dotenv()= os.getenv("EMAIL", "API_KEY")
IPUMS_API_KEY
if IPUMS_API_KEY is None:
raise ValueError("API key not found. Make sure IPUMS_API_KEY is defined in your .env file.")
# Step 2: Initialize IPUMS client
= IpumsApiClient(IPUMS_API_KEY) ipums
Getting shapefile metadata in order to get the filename for downloading the shapefile in the below chunk.
This block will be returning the shapefile name of interest so that we can download it in the next block
# After registering to NHGIS, please run this code
for page in ipums.get_metadata_catalog("nhgis", metadata_type="shapefiles"):
for shapefile in page["data"]:
if shapefile["extent"] == "Puerto Rico":
if shapefile["geographicLevel"] == "Block Group" and shapefile["year"] == "2010":
print( "Name: " + shapefile["name"] + " | Year: " + shapefile["year"])
Name: 720_blck_grp_2010_tl2010 | Year: 2010
Name: 720_blck_grp_2010_tl2020 | Year: 2010
Downlaoding shapefiles from IPUMS
With this API key, we can extract geospatial data from the IPUMS API. Using the geo level blck_grp
, we can speicify that we want to extract data at the Block Group level.
The AggregateDataExtract
function specifies the collection to use, in this case NHGIS, give it a human-readable label for the extract request, and requests the 2010 and 2020 Summary File 1 (SF1a) dataset with tables P12 (sex by age) and H3 (vacancy status) at the block group geographic level. It also limits the extract to geographic extent code “720” (Puerto Rico).
Getting data from 2010 and 2020 with the variables for 5-year Age Groups (P12) and the Housing (H3).
# Submit extraction data to IPUMS portal
= AggregateDataExtract(
extract ="nhgis", # Use NHGIS collection
collection="Puerto Rico 2010–2020 vacancy", # Extract label
description=[
datasets
NhgisDataset(="2010_SF1a", # 2010 dataset
name=["P12", "H3"], # Tables: sex by age, vacancy
data_tables=["blck_grp"] # At block group level
geog_levels
),
NhgisDataset(="2020_DHCa", # 2020 dataset
name=["P12", "H3"], # Same tables
data_tables=["blck_grp"] # Same level
geog_levels
),
],=["720"] # Puerto Rico only
geographic_extents# shapefiles=["720_blck_grp_2020_tl2020"] # Optional: include shapefile
)
This code sends the extract request to IPUMS, prints the unique extract ID so you can track it, and sets up to wait until the extract is finished.
# Submit the extract request
# Send request to IPUMS
ipums.submit_extract(extract) print(f"Extract ID: {extract.extract_id}") # Print the extract ID
# Wait for the extract to finish
This code sets up the folder where the extract will be saved, creates it if it doesn’t already exist, and downloads the extract from IPUMS to that location:
# Download the extract
= os.getcwd() # Get current working directory
current = os.path.join(f"{current}/data/ipums/block") # Set download path
DOWNLOAD_DIR
=True) # Create folder if needed
os.makedirs(DOWNLOAD_DIR, exist_ok
=DOWNLOAD_DIR) # Download files to folder ipums.download_extract(extract, download_dir
After downloading the extract, this code navigates to the download directory, identifies the ZIP file containing the CSV data, and inspects its contents. It then locates the specific CSV files for the years 2010 and 2020 using filename patterns, and reads them directly from the ZIP archive into pandas DataFrames—no need to manually unzip anything!
= os.getcwd() # Get current working directory
current = os.path.join(f"{current}/data/ipums/block") # Set path to downloaded extract
DOWNLOAD_DIR = os.listdir(DOWNLOAD_DIR) # List files in download folder
file_list = [f for f in file_list if f.endswith('_csv.zip')] # Find ZIP with CSVs
csv_zip = f"{DOWNLOAD_DIR}/{csv_zip[0]}" # Get full path to ZIP
csv
# Read zip data file in the extract
with ZipFile(csv) as z:
= z.namelist() # List files inside ZIP
csv_data print("Contents of zip: ", csv_data)
# Find the correct CSVs using filename patterns
= next(f for f in csv_data if '2020' in f and f.endswith('.csv')) # 2020 file
file_2020 = next(f for f in csv_data if '2010' in f and f.endswith('.csv')) # 2010 file
file_2010
# Read CSVs into DataFrames
with z.open(file_2020) as f:
= pd.read_csv(f) # Load 2020 data
df_2020
with z.open(file_2010) as f:
= pd.read_csv(f) # Load 2010 data df_2010
Contents of zip: ['nhgis0012_csv/nhgis0012_ds258_2020_blck_grp_codebook.txt', 'nhgis0012_csv/nhgis0012_ds258_2020_blck_grp.csv', 'nhgis0012_csv/nhgis0012_ds172_2010_blck_grp.csv', 'nhgis0012_csv/nhgis0012_ds172_2010_blck_grp_codebook.txt']
This section uses NHGIS Codebook file(s) that were automatically included in your data extract to rename cryptic column codes in the 2010 and 2020 datasets to human-readable labels. These codes correspond to census tables on sex by age and housing occupancy. Renaming makes analysis and visualization much easier later in your workflow.
Look for the .txt file(s) in the zipped file you downloaded, and they will shed some light on your data.
# The NHGIS codes are as follows in the documentation which is downloaded from the IPUMS API
# Rename columns for dataframe 2020
''' Table 1: Sex by Age for Selected Age Categories
Universe: Total population
Source code: P12
NHGIS code: U7S
U7S001: Total
U7S002: Male
U7S003: Male: Under 5 years
U7S004: Male: 5 to 9 years
U7S005: Male: 10 to 14 years
U7S006: Male: 15 to 17 years
U7S007: Male: 18 and 19 years
U7S008: Male: 20 years
U7S009: Male: 21 years
U7S010: Male: 22 to 24 years
U7S011: Male: 25 to 29 years
U7S012: Male: 30 to 34 years
U7S013: Male: 35 to 39 years
U7S014: Male: 40 to 44 years
U7S015: Male: 45 to 49 years
U7S016: Male: 50 to 54 years
U7S017: Male: 55 to 59 years
U7S018: Male: 60 and 61 years
U7S019: Male: 62 to 64 years
U7S020: Male: 65 and 66 years
U7S021: Male: 67 to 69 years
U7S022: Male: 70 to 74 years
U7S023: Male: 75 to 79 years
U7S024: Male: 80 to 84 years
U7S025: Male: 85 years and over
U7S026: Female
U7S027: Female: Under 5 years
U7S028: Female: 5 to 9 years
U7S029: Female: 10 to 14 years
U7S030: Female: 15 to 17 years
U7S031: Female: 18 and 19 years
U7S032: Female: 20 years
U7S033: Female: 21 years
U7S034: Female: 22 to 24 years
U7S035: Female: 25 to 29 years
U7S036: Female: 30 to 34 years
U7S037: Female: 35 to 39 years
U7S038: Female: 40 to 44 years
U7S039: Female: 45 to 49 years
U7S040: Female: 50 to 54 years
U7S041: Female: 55 to 59 years
U7S042: Female: 60 and 61 years
U7S043: Female: 62 to 64 years
U7S044: Female: 65 and 66 years
U7S045: Female: 67 to 69 years
U7S046: Female: 70 to 74 years
U7S047: Female: 75 to 79 years
U7S048: Female: 80 to 84 years
U7S049: Female: 85 years and over
Table 2: Occupancy Status
Universe: Housing units
Source code: H3
NHGIS code: U9X
U9X001: Total
U9X002: Occupied
U9X003: Vacant
'''
= {
rename_2020 "U7S001": "Total_Population",
"U7S002": "Male",
"U7S003": "Male: Under 5 years",
"U7S004": "Male: 5 to 9 years",
"U7S005": "Male: 10 to 14 years",
"U7S006": "Male: 15 to 17 years",
"U7S007": "Male: 18 and 19 years",
"U7S008": "Male: 20 years",
"U7S009": "Male: 21 years",
"U7S010": "Male: 22 to 24 years",
"U7S011": "Male: 25 to 29 years",
"U7S012": "Male: 30 to 34 years",
"U7S013": "Male: 35 to 39 years",
"U7S014": "Male: 40 to 44 years",
"U7S015": "Male: 45 to 49 years",
"U7S016": "Male: 50 to 54 years",
"U7S017": "Male: 55 to 59 years",
"U7S018": "Male: 60 and 61 years",
"U7S019": "Male: 62 to 64 years",
"U7S020": "Male: 65 and 66 years",
"U7S021": "Male: 67 to 69 years",
"U7S022": "Male: 70 to 74 years",
"U7S023": "Male: 75 to 79 years",
"U7S024": "Male: 80 to 84 years",
"U7S025": "Male: 85 years and over",
"U7S026": "Female",
"U7S027": "Female: Under 5 years",
"U7S028": "Female: 5 to 9 years",
"U7S029": "Female: 10 to 14 years",
"U7S030": "Female: 15 to 17 years",
"U7S031": "Female: 18 and 19 years",
"U7S032": "Female: 20 years",
"U7S033": "Female: 21 years",
"U7S034": "Female: 22 to 24 years",
"U7S035": "Female: 25 to 29 years",
"U7S036": "Female: 30 to 34 years",
"U7S037": "Female: 35 to 39 years",
"U7S038": "Female: 40 to 44 years",
"U7S039": "Female: 45 to 49 years",
"U7S040": "Female: 50 to 54 years",
"U7S041": "Female: 55 to 59 years",
"U7S042": "Female: 60 and 61 years",
"U7S043": "Female: 62 to 64 years",
"U7S044": "Female: 65 and 66 years",
"U7S045": "Female: 67 to 69 years",
"U7S046": "Female: 70 to 74 years",
"U7S047": "Female: 75 to 79 years",
"U7S048": "Female: 80 to 84 years",
"U7S049": "Female: 85 years and over",
"U9X001": "Total_Housing_Units",
"U9X002": "Occupied",
"U9X003": "Vacant"
}
#Rename columns for dataframe 2010
''' Table 1: Housing Units
Universe: Housing units
Source code: H1
NHGIS code: IFC
IFC001: Total
Table 2: Occupancy Status
Universe: Housing units
Source code: H3
NHGIS code: IFE
IFE001: Total
IFE002: Occupied
IFE003: Vacant'''
= {
rename_2010 "H76001": "Total_Population",
"H76002": "Male",
"H76003": "Male: Under 5 years",
"H76004": "Male: 5 to 9 years",
"H76005": "Male: 10 to 14 years",
"H76006": "Male: 15 to 17 years",
"H76007": "Male: 18 and 19 years",
"H76008": "Male: 20 years",
"H76009": "Male: 21 years",
"H76010": "Male: 22 to 24 years",
"H76011": "Male: 25 to 29 years",
"H76012": "Male: 30 to 34 years",
"H76013": "Male: 35 to 39 years",
"H76014": "Male: 40 to 44 years",
"H76015": "Male: 45 to 49 years",
"H76016": "Male: 50 to 54 years",
"H76017": "Male: 55 to 59 years",
"H76018": "Male: 60 and 61 years",
"H76019": "Male: 62 to 64 years",
"H76020": "Male: 65 and 66 years",
"H76021": "Male: 67 to 69 years",
"H76022": "Male: 70 to 74 years",
"H76023": "Male: 75 to 79 years",
"H76024": "Male: 80 to 84 years",
"H76025": "Male: 85 years and over",
"H76026": "Female",
"H76027": "Female: Under 5 years",
"H76028": "Female: 5 to 9 years",
"H76029": "Female: 10 to 14 years",
"H76030": "Female: 15 to 17 years",
"H76031": "Female: 18 and 19 years",
"H76032": "Female: 20 years",
"H76033": "Female: 21 years",
"H76034": "Female: 22 to 24 years",
"H76035": "Female: 25 to 29 years",
"H76036": "Female: 30 to 34 years",
"H76037": "Female: 35 to 39 years",
"H76038": "Female: 40 to 44 years",
"H76039": "Female: 45 to 49 years",
"H76040": "Female: 50 to 54 years",
"H76041": "Female: 55 to 59 years",
"H76042": "Female: 60 and 61 years",
"H76043": "Female: 62 to 64 years",
"H76044": "Female: 65 and 66 years",
"H76045": "Female: 67 to 69 years",
"H76046": "Female: 70 to 74 years",
"H76047": "Female: 75 to 79 years",
"H76048": "Female: 80 to 84 years",
"H76049": "Female: 85 years and over",
"IFC001": "Total_Housing",
"IFE001": "Total_Housing_Units",
"IFE002": "Occupied",
"IFE003": "Vacant"
}
# Apply renaming to both datasets
=rename_2010, inplace=True) # Rename 2010 columns
df_2010.rename(columns=rename_2020, inplace=True) # Rename 2020 columns df_2020.rename(columns
This step filters both datasets to include only records from Puerto Rico, which is identified in the IPUMS data by STATEA == 72
. It also removes any columns that are completely empty (all values are NaN), which helps clean up the data for analysis.
# Subset Puerto Rico (STATEA code 72)
= df_2010[df_2010["STATEA"] == 72] # Filter 2010 data for Puerto Rico
pr_df_2010 = df_2020[df_2020["STATEA"] == 72] # Filter 2020 data for Puerto Rico
pr_df_2020
# Drop columns with all missing values
= pr_df_2010.dropna(axis=1, how='all') # Clean 2010 data
pr_df_2010 = pr_df_2020.dropna(axis=1, how='all') # Clean 2020 data pr_df_2020
This step calculates the total population aged 60 and over by summing the relevant male and female age group columns. It then computes two new indicators for both 2010 and 2020:
AgedRatio: the share of the total population that is 60+
VacantRatio: the share of housing units that are vacant
These ratios help measure aging and housing vacancy patterns in Puerto Rico over time.
# Define age columns for population 60+
= [
pop60plus_cols "Female: 60 and 61 years",
"Female: 62 to 64 years",
"Female: 65 and 66 years",
"Female: 67 to 69 years",
"Female: 70 to 74 years",
"Female: 75 to 79 years",
"Female: 80 to 84 years",
"Female: 85 years and over",
"Male: 60 and 61 years",
"Male: 62 to 64 years",
"Male: 65 and 66 years",
"Male: 67 to 69 years",
"Male: 70 to 74 years",
"Male: 75 to 79 years",
"Male: 80 to 84 years",
"Male: 85 years and over"
]
# Calculate totals and ratios for 2010
"Pop60plus_total"] = pr_df_2010[pop60plus_cols].sum(axis=1) # Sum 60+ pop
pr_df_2010["AgedRatio"] = pr_df_2010["Pop60plus_total"] / pr_df_2010["Total_Population"] # 60+ share
pr_df_2010["VacantRatio"] = pr_df_2010["Vacant"] / pr_df_2010["Total_Housing_Units"] # Vacancy share
pr_df_2010[
# Calculate totals and ratios for 2020
"Pop60plus_total"] = pr_df_2020[pop60plus_cols].sum(axis=1)
pr_df_2020["AgedRatio"] = pr_df_2020["Pop60plus_total"] / pr_df_2020["Total_Population"]
pr_df_2020["VacantRatio"] = pr_df_2020["Vacant"] / pr_df_2020["Total_Housing_Units"] pr_df_2020[
Lets create a scatter plot to explore the relationship between aging and housing vacancy in Puerto Rico in 2010. Each point represents a geographic unit (e.g., block group), where the x-axis shows the Vacant Ratio (share of housing units that are vacant), and the y-axis shows the Aged Ratio (share of population aged 60+).
This kind of visualization helps reveal spatial patterns or clusters related to population aging and housing dynamics.
# Plot Aged Ratio vs. Vacant Ratio for 2010
"VacantRatio"], pr_df_2010["AgedRatio"], alpha=0.3) # Transparent points
plt.scatter(pr_df_2010[
"Aged Ratio (60+) over Vacancy Ratio") # Plot title
plt.title("Vacant Ratio") # X-axis label
plt.xlabel("Aged Ratio") # Y-axis label
plt.ylabel(True) # Show grid
plt.grid(# Adjust layout
plt.tight_layout() # Display plot plt.show()
This step prepares for comparison between the 2010 and 2020 datasets.
It selects key columns (like total population, aged population, and vacancy) from the 2010 data and merges them with the full 2020 data using GISJOIN as a unique geographic identifier.
The resulting merged_df contains side-by-side values for both years, using suffixes to distinguish 2010 and 2020 columns.
# Key columns to compare between years
= [
columns_used "Total_Population", "Pop60plus_total", "AgedRatio",
"Total_Housing_Units", "Occupied", "Vacant", "VacantRatio"
]
# Merge 2020 data with selected 2010 columns using GISJOIN
= pr_df_2020.merge(
merged_df "GISJOIN"] + columns_used], # Keep GISJOIN + key columns from 2010
pr_df_2010[[="GISJOIN", # Join on geographic ID
on="inner", # Keep only matching rows
how=("_2020", "_2010") # Label columns by year
suffixes )
Comparing Time Series
To understand demographic and housing trends over time, we compare key indicators from different time periods. By calculating the absolute changes we can assess patterns of population decline, aging, and housing dynamics at the local level.
This step calculates absolute changes between 2010 and 2020 for several key indicators: total population, population aged 60+, vacant housing units, and the ratios of aged population and vacancy. These new columns will help identify trends across geographic areas. Missing values in the AgedRatio_Change column are filled with 0 to ensure clean outputs.
# Calculate absolute changes between 2010 and 2020
"Total_Pop_Change"] = merged_df["Total_Population_2020"] - merged_df["Total_Population_2010"] # Change in total population
merged_df["Pop60plus_Change"] = merged_df["Pop60plus_total_2020"] - merged_df["Pop60plus_total_2010"] # Change in 60+ population
merged_df["Vacant_Change"] = merged_df["Vacant_2020"] - merged_df["Vacant_2010"] # Change in vacant units
merged_df["AgedRatio_Change"] = merged_df["AgedRatio_2020"] - merged_df["AgedRatio_2010"] # Change in aged ratio
merged_df["AgedRatio_Change"].fillna(0, inplace=True) # Fill missing with 0
merged_df["VacantRatio_Change"] = merged_df["VacantRatio_2020"] - merged_df["VacantRatio_2010"] # Change in vacancy ratio
merged_df[
# View merged dataset
print(merged_df.head())
GISJOIN YEAR STUSAB GEOID GEOCODE REGIONA \
0 G72000109563001 2020 PR 1500000US720019563001 720019563001 9
1 G72000109563002 2020 PR 1500000US720019563002 720019563002 9
2 G72000109564001 2020 PR 1500000US720019564001 720019564001 9
3 G72000109564002 2020 PR 1500000US720019564002 720019564002 9
4 G72000109565001 2020 PR 1500000US720019565001 720019565001 9
DIVISIONA STATE STATEA COUNTY ... AgedRatio_2010 \
0 0 Puerto Rico 72 Adjuntas Municipio ... 0.167204
1 0 Puerto Rico 72 Adjuntas Municipio ... 0.202309
2 0 Puerto Rico 72 Adjuntas Municipio ... 0.173866
3 0 Puerto Rico 72 Adjuntas Municipio ... 0.242345
4 0 Puerto Rico 72 Adjuntas Municipio ... 0.164762
Total_Housing_Units_2010 Occupied_2010 Vacant_2010 VacantRatio_2010 \
0 854 737 117 0.137002
1 881 751 130 0.147560
2 801 677 124 0.154806
3 493 418 75 0.152130
4 827 682 145 0.175333
Total_Pop_Change Pop60plus_Change Vacant_Change AgedRatio_Change \
0 -287 199 -33 0.131097
1 -181 130 -74 0.083981
2 -220 161 -50 0.122090
3 -130 93 -35 0.122907
4 -220 177 -32 0.113430
VacantRatio_Change
0 -0.034187
1 -0.077296
2 -0.051742
3 -0.059537
4 -0.037191
[5 rows x 93 columns]
Plotting Graphs from IPUMS data:
This visualization step uses a Seaborn pairplot
to explore the relationships between key change variables from 2010 to 2020.
By plotting scatterplots for every pair of variables and histograms along the diagonal we can quickly spot correlations or patterns in population decline, aging, and housing vacancy across Puerto Rico’s geographic units.
# Choose key variables to compare
= [
vars_to_plot "Total_Pop_Change",
"Pop60plus_Change",
"Vacant_Change",
"AgedRatio_Change",
"VacantRatio_Change"
]
# Subset DataFrame to those variables
= merged_df[vars_to_plot]
data_subset
# Create pairwise plots with histograms on the diagonal
="hist", corner=True)
sns.pairplot(data_subset, diag_kind
"Pairwise Scatterplots and Histograms", y=1.02) # Set overall title
plt.suptitle(
plt.tight_layout()# Display the plot plt.show()
This histogram shows the distribution of changes in the number of vacant housing units from 2010 to 2020 across Puerto Rico.
It helps identify whether most areas experienced increases or decreases in vacancy—and how large those shifts were.
"Vacant_Change"] .hist()
merged_df[
"Vacancy Ratio Change 2010-2020") plt.title(
Text(0.5, 1.0, 'Vacancy Ratio Change 2010-2020')
By looking more closely at the histogram and comparing vacancy differences between 2010 and 2020, it becomes evident that quantile-based classification may not accurately reflect true differences across areas. In cases with extreme outliers, quantile breaks can distort interpretation.
To better evaluate how unusual each area’s change is, we compute z-scores for each variable. This standardizes the values by subtracting the mean and dividing by the standard deviation, allowing for comparisons across indicators on a consistent scale.
A z-score tells us how many standard deviations a value is from the mean. A positive z-score means the value is above the mean. A negative z-score means it’s below the mean. Z-scores help us compare variables on different scales and identify outliers—typically anything above +2 or below –2 is considered unusually high or low.
# Compute means and standard deviations
= merged_df[vars_to_plot].mean()
means = merged_df[vars_to_plot].std()
stds
# Create z-score columns
for var in vars_to_plot:
= var.replace("_Change", "_dz")
z_col = (merged_df[var] - means[var]) / stds[var] merged_df[z_col]
This histogram displays the z-score distribution of vacant housing unit change from 2010 to 2020. Standardizing vacancy change in this way helps identify how extreme the change in each area is compared to the average:
"Vacant_dz"] .hist()
merged_df[
"Male/Female Ratio Total Change 2010-2020 (In thousands) Z-score") plt.title(
Text(0.5, 1.0, 'Male/Female Ratio Total Change 2010-2020 (In thousands) Z-score')
This pairplot
function uses the z-score standardized versions of the change variables to compare their relative behavior across Block Groups in a scatterplot matrix.
By putting all variables on the same scale (mean = 0, std = 1), we can asily spot strong linear relationships, identify clusters or outliers, and ompare change intensity across different domains (e.g., aging vs. vacancy).
This is especially useful when your original variables had different units or ranges.
Can you spot any linear trends in the scatterplots?
#
# Select all z-score columns (those ending in "_dz")
= [col for col in merged_df.columns if col.endswith("_dz")]
zscore_cols = merged_df[zscore_cols] # Subset for plotting
data_subset
# Create pairplot of standardized change variables
="hist", corner=True) # No duplicate plots, hist on diagonal
sns.pairplot(data_subset, diag_kind
"Pairwise Scatterplots and Histograms (Z-Scores)", y=1.02) # Title above plot
plt.suptitle(
plt.tight_layout()# Display the plot plt.show()
This scatterplot shows the relationship between changes in vacancy rates and aging ratios (ages 60+) using their z-scores, allowing for meaningful comparison on a standardized scale for the two variables.
This type of scaling:
Keeps the area near zero linear (for interpretability)
Compresses extreme values
Handles both positive and negative changes symmetrically.
This visualization helps identify whether increases in older populations are associated with changes in housing vacancy—and how extreme those shifts are across Puerto Rico.
# Create a scatter plot of z-score changes
= plt.subplots(figsize=(8, 6))
fig, ax
ax.scatter("VacantRatio_dz"], # X-axis: vacancy ratio z-score
merged_df["AgedRatio_dz"], # Y-axis: aging ratio z-score
merged_df[=0.6,
alpha="mediumseagreen",
color="black"
edgecolor
)
# Apply symmetric log scaling to both axes
"symlog", linthresh=1) # Linear near 0, log beyond ±1
ax.set_xscale("symlog", linthresh=1)
ax.set_yscale(
# Add labels and title
"Vacant Housing Ratio Change") # X-axis label
ax.set_xlabel("Age (60+) Ratio Change") # Y-axis label
ax.set_ylabel("Vacant Ratio Change vs Aged Ratio (60+) Change (in Z-Scores)")
ax.set_title(
# Grid and layout
True)
ax.grid(
plt.tight_layout() plt.show()
Generating Age Pyramids
Another useful way to analyze popylation is with population pyramids for Puerto Rico using mean percentages of male and female population by age group, based on census block-level data. The pyramid compares the age structure visually between sexes, helping identify trends like population aging or gender imbalances in specific cohorts.
Key Steps in the Code:
First, Extract age-by-sex columns from the dataset.
# Define the relevant columns
= [col for col in merged_df.columns if col.startswith("Male:") or col.startswith("Female:")]
joined_columns
# Create DataFrame for percentage values
= merged_df[["Total_Population_2020"] + joined_columns].copy() pct_df
Convert raw population counts into percentage values by age group, making them easier to compare across geographic units regardless of their total population size. Then, it separates the age group columns into male and female categories to prepare for a population pyramid:
# 4. Compute percentage for each age group column
for col in joined_columns:
f"{col}_pct"] = pct_df[col] / pct_df["Total_Population_2020"] * 100
pct_df[
# 5. Drop original count columns, keep only percent columns
= pct_df[[col for col in pct_df.columns if col.endswith("_pct")]]
pct_df
# 6. Separate by sex
= []
male_age_percentages = [] female_age_percentages
This following part calculates the average percent of total population for each age group, separately for males and females, across all geographic units (e.g., blocks).
It also creates a clean DataFrame that aligns age group labels with their corresponding male and female percentages, setting the foundation for the population pyramid.
for col in pct_df.columns:
= round(pct_df[col].mean(), 2)
mean_pct if col.startswith("Male:"):
male_age_percentages.append(mean_pct)elif col.startswith("Female:"):
female_age_percentages.append(mean_pct)
# 7. Extract age labels
= [col.split(": ")[1].replace("_pct", "") for col in pct_df.columns if col.startswith("Male:")]
new_age_columns
# 8. Create the base DataFrame
= pd.DataFrame({
population_df "Age": new_age_columns,
"Male": male_age_percentages,
"Female": female_age_percentages
})= population_df.copy() population
We can cambine narrow age bands into broader, more interpretable age groups (like 15–19 or 20–24) to improve the readability of the population pyramid.
In the plot, we manually add percentages for adjacent rows as the age group label.
After combining, it prepares the data for bidirectional horizontal bar plotting by making male values negative (so they show on the left), keeping female values positive (on the right), and creating helper columns for width and placement of bars.
# Combine select age groups
# Combine 15-17 and 18-19 into 15-19
3, 0] = "15 to 19 years"
population.iloc[3, 1] += population.iloc[4, 1]
population.iloc[3, 2] += population.iloc[4, 2]
population.iloc[= population.drop(index=4).reset_index(drop=True)
population
# Combine 20, 21, 22–24 into 20–24
4, 0] = "20 to 24 years"
population.iloc[4, 1] += population.iloc[5, 1] + population.iloc[6, 1]
population.iloc[4, 2] += population.iloc[5, 2] + population.iloc[6, 2]
population.iloc[= population.drop(index=[5, 6]).reset_index(drop=True)
population
# Combine 60–61 and 62–64 into 60–64
12, 0] = "60 to 64 years"
population.iloc[12, 1] += population.iloc[13, 1]
population.iloc[12, 2] += population.iloc[13, 2]
population.iloc[= population.drop(index=13).reset_index(drop=True)
population
# Combine 65–66 and 67–69 into 65–69
13, 0] = "65 to 69 years"
population.iloc[13, 1] += population.iloc[14, 1]
population.iloc[13, 2] += population.iloc[14, 2]
population.iloc[= population.drop(index=14).reset_index(drop=True)
population
# Prepare for plotting
"Female_Left"] = 0
population["Female_Width"] = population["Female"]
population["Male_Left"] = -population["Male"]
population["Male_Width"] = population["Male"]
population[
# Round for labels
"Male"] = population["Male"].round(2)
population["Female"] = population["Female"].round(2) population[
The horizontal population pyramid displays the average percent of Puerto Rico’s population in each age group, broken down by sex. Male bars extend left using negative values while Female bars extend right using positive values.
A custom title, axis formatting, and color scheme are applied for clarity and polish.
# 11. Plot the population pyramid
= plt.figure(figsize=(15, 10)) # Set figure size
fig
# Plot female bars to the right
plt.barh(=population["Age"],
y=population["Female_Width"],
width="#ee7a87",
color="Female"
label
)
# Plot male bars to the left
plt.barh(=population["Age"],
y=population["Male_Width"],
width=population["Male_Left"],
left="#4682b4",
color="Male"
label
)
# Add gender labels to plot
-5, len(population) - 1, "Male", fontsize=25, fontweight="bold")
plt.text(4, len(population) - 1, "Female", fontsize=25, fontweight="bold")
plt.text(
# Add percentage labels to each bar
for idx in range(len(population)):
# Male label (on the left)
plt.text(=population["Male_Left"][idx] - 0.1,
x=idx,
y="{}%".format(population["Male"][idx]),
s="right", va="center",
ha=15, color="#4682b4"
fontsize
)# Female label (on the right)
plt.text(=population["Female_Width"][idx] + 0.1,
x=idx,
y="{}%".format(population["Female"][idx]),
s="left", va="center",
ha=15, color="#ee7a87"
fontsize
)
# Format axes and labels
-7, 7) # Set x-axis limits
plt.xlim(range(-7, 8), ["{}%".format(i) for i in range(-7, 8)]) # Format tick labels as percents
plt.xticks(="best") # Add legend
plt.legend(loc"Percent (%)", fontsize=16, fontweight="bold")
plt.xlabel("Age Range", fontsize=16, fontweight="bold")
plt.ylabel(
# Add title
plt.title("Puerto Rico Mean Age Distribution (Census Blocks) 2020",
="left", pad=20,
loc=25, fontweight="bold"
fontsize
)
# Optimize spacing
plt.tight_layout() # Display the plot plt.show()
Select an individual census block
We can subset the dataframe to select a single census block and generate an age pyramid.
# --- STEP 1: Select a census block by GEOID ---
= "G72002100310212" # Replace with any valid GEOID
block_id = merged_df[merged_df["GISJOIN"] == block_id].copy()
block_df
assert len(block_df) == 1, f"Expected one row for GEOID {block_id}, but got {len(block_df)}"
# --- STEP 2: Identify male and female age columns ---
= [col for col in block_df.columns if col.startswith("Male:")]
male_cols = [col for col in block_df.columns if col.startswith("Female:")]
female_cols = [col.split(": ")[1] for col in male_cols] # Extract age ranges
age_labels
# --- STEP 3: Get values (choose percent or raw counts) ---
= True # Set to False if you want raw counts
use_percent
if use_percent:
= block_df[male_cols + female_cols].sum(axis=1).values[0]
total_pop = (block_df[male_cols].values[0] / total_pop) * 100
male_vals = (block_df[female_cols].values[0] / total_pop) * 100
female_vals = "Percent (%)"
value_label else:
= block_df[male_cols].values[0]
male_vals = block_df[female_cols].values[0]
female_vals = "Population Count"
value_label
# --- STEP 4: Build plotting dataframe ---
= pd.DataFrame({
population "Age": age_labels,
"Male": male_vals,
"Female": female_vals
})
"Female_Left"] = 0
population["Female_Width"] = population["Female"]
population["Male_Left"] = -population["Male"]
population["Male_Width"] = population["Male"]
population[
# --- STEP 5: Plot the pyramid ---
= plt.figure(figsize=(12, 8))
fig
=population["Age"], width=population["Female_Width"], color="#ee7a87", label="Female")
plt.barh(y=population["Age"], width=population["Male_Width"], left=population["Male_Left"],
plt.barh(y="#4682b4", label="Male")
color
-5, len(population) - 1, "Male", fontsize=20, fontweight="bold")
plt.text(4, len(population) - 1, "Female", fontsize=20, fontweight="bold")
plt.text(
for idx in range(len(population)):
=population["Male_Left"][idx] - 0.1, y=idx, s=f"{round(population['Male'][idx], 1)}",
plt.text(x="right", va="center", fontsize=12, color="#4682b4")
ha=population["Female_Width"][idx] + 0.1, y=idx, s=f"{round(population['Female'][idx], 1)}",
plt.text(x="left", va="center", fontsize=12, color="#ee7a87")
ha
= max(population[["Male", "Female"]].max()) * 1.2
xlim_val -xlim_val, xlim_val)
plt.xlim(
plt.xticks([])="best")
plt.legend(loc
=14)
plt.xlabel(value_label, fontsize"Age Range", fontsize=14)
plt.ylabel(f"Census Block {block_id} Age Distribution 2020", fontsize=18, fontweight="bold")
plt.title(
plt.tight_layout() plt.show()
Select a large Block Group
Age pyramids can also be generated by subsetting a Census Block Group by name, for example NAME == "Block Group 1"
or “Block Group 3”:
= "Block Group 3"
bg_name
= merged_df[merged_df["NAME"] == bg_name ]
df_gb
# 1. Define the relevant columns
= [col for col in df_gb.columns if col.startswith("Male:") or col.startswith("Female:")]
joined_columns
# 2. Ensure Total_Population exists (if not, create it)
if "Total_Population" not in df_gb.columns:
"Total_Population"] = df_gb[joined_columns].sum(axis=1)
df_gb[
# 3. Create DataFrame for percentage values
= df_gb[["Total_Population"] + joined_columns].copy()
pct_df
# 4. Compute percentage for each age group column
for col in joined_columns:
f"{col}_pct"] = pct_df[col] / pct_df["Total_Population"] * 100
pct_df[
# 5. Drop original count columns, keep only percent columns
= pct_df[[col for col in pct_df.columns if col.endswith("_pct")]]
pct_df
# 6. Separate by sex
= []
male_age_percentages = []
female_age_percentages
for col in pct_df.columns:
= round(pct_df[col].mean(), 2)
mean_pct if col.startswith("Male:"):
male_age_percentages.append(mean_pct)elif col.startswith("Female:"):
female_age_percentages.append(mean_pct)
# 7. Extract age labels
= [col.split(": ")[1].replace("_pct", "") for col in pct_df.columns if col.startswith("Male:")]
new_age_columns
# 8. Create the base DataFrame
= pd.DataFrame({
population_df "Age": new_age_columns,
"Male": male_age_percentages,
"Female": female_age_percentages
})= population_df.copy()
population
# 9. Combine select age groups
# Combine 15-17 and 18-19 into 15-19
3, 0] = "15 to 19 years"
population.iloc[3, 1] += population.iloc[4, 1]
population.iloc[3, 2] += population.iloc[4, 2]
population.iloc[= population.drop(index=4).reset_index(drop=True)
population
# Combine 20, 21, 22–24 into 20–24
4, 0] = "20 to 24 years"
population.iloc[4, 1] += population.iloc[5, 1] + population.iloc[6, 1]
population.iloc[4, 2] += population.iloc[5, 2] + population.iloc[6, 2]
population.iloc[= population.drop(index=[5, 6]).reset_index(drop=True)
population
# Combine 60–61 and 62–64 into 60–64
12, 0] = "60 to 64 years"
population.iloc[12, 1] += population.iloc[13, 1]
population.iloc[12, 2] += population.iloc[13, 2]
population.iloc[= population.drop(index=13).reset_index(drop=True)
population
# Combine 65–66 and 67–69 into 65–69
13, 0] = "65 to 69 years"
population.iloc[13, 1] += population.iloc[14, 1]
population.iloc[13, 2] += population.iloc[14, 2]
population.iloc[= population.drop(index=14).reset_index(drop=True)
population
# 10. Prepare for plotting
"Female_Left"] = 0
population["Female_Width"] = population["Female"]
population["Male_Left"] = -population["Male"]
population["Male_Width"] = population["Male"]
population[
# Round for labels
"Male"] = population["Male"].round(2)
population["Female"] = population["Female"].round(2)
population[
# 11. Plot the population pyramid
= plt.figure(figsize=(15, 10))
fig
=population["Age"], width=population["Female_Width"], color="#ee7a87", label="Female")
plt.barh(y=population["Age"], width=population["Male_Width"], left=population["Male_Left"],
plt.barh(y="#4682b4", label="Male")
color
-5, len(population) - 1, "Male", fontsize=25, fontweight="bold")
plt.text(4, len(population) - 1, "Female", fontsize=25, fontweight="bold")
plt.text(
# Add labels
for idx in range(len(population)):
=population["Male_Left"][idx] - 0.1, y=idx, s="{}%".format(population["Male"][idx]),
plt.text(x="right", va="center", fontsize=15, color="#4682b4")
ha=population["Female_Width"][idx] + 0.1, y=idx, s="{}%".format(population["Female"][idx]),
plt.text(x="left", va="center", fontsize=15, color="#ee7a87")
ha
-7, 7)
plt.xlim(range(-7, 8), ["{}%".format(i) for i in range(-7, 8)])
plt.xticks(="best")
plt.legend(loc"Percent (%)", fontsize=16, fontweight="bold")
plt.xlabel("Age Range", fontsize=16, fontweight="bold")
plt.ylabel(f"Puerto Rico Mean Age Distribution ({bg_name}) 2020", loc="left", pad=20, fontsize=25, fontweight="bold")
plt.title(
plt.tight_layout() plt.show()
Interactinv with ArcGIS Online Portal
ArcGIS Online or Enterprise portal are tools that retrieve a specific hosted feature layers. We use the arcgis
package to find features of U.S. counties (2020), and filter it to include only features where STATEFP = ‘72’, which corresponds to Puerto Rico. The resulting spatial data is converted into a Spatially Enabled DataFrame (sedf) using ArcGIS’ Python API.
# Get the layer from the published data
# Connect to ArcGIS Online (anonymous session)
= GIS()
gis
# Access a specific hosted feature layer by its Item ID
= gis.content.get("3132216944b249a08d13b1aa0ee6fda2").layers[0] # PR counties 2020 layer
layer
# Query Puerto Rico counties using the state FIPS code '72'
= layer.query(where="STATEFP = '72'").sdf # Convert to Spatially Enabled DataFrame sedf
We can merge the spatial county feature layer for Puerto Rico (2020) with your demographic data (2010–2020) using the GISJOIN
field as a common geographic identifier. The result, pr_sedf_ipums
, is a Spatially Enabled DataFrame that contains both geometry and attribute data—making it ready for mapping or spatial analysis!
#Merge the feature to the merged 2010-2020 df
= sedf.merge(merged_df, on = "GISJOIN", how = "inner") pr_sedf_ipums
Summarizing and Plotting DAta
Let’s combine the Block Groups into Counties and Map the results.
HIEU SUMMARIZE merged_df TO COUNTY (pd_county_df) using “sum” to add: “Total_Population”, “Male”, “Female”, “Pop60plus_total”, “Total_Housing_Units”, “Occupied”, “Vacant”, for all have _2010 and _2020 at the end.
Hieu Recalculate CHange
# pr_county_df["AgedRatio_2010"] = pr_county_df["Pop60plus_total_2010"] / pr_county_df["Total_Population_2010"]
# pr_county_df["VacantRatio_2010"] = pr_county_df["Vacant_2010"] / pr_county_df["Total_Housing_Units_2010"]
# pr_county_df["AgedRatio_2020"] = pr_county_df["Pop60plus_total_2020"] / pr_county_df["Total_Population_2020"]
# pr_county_df["VacantRatio_2020"] = pr_county_df["Vacant_2020"] / pr_county_df["Total_Housing_Units_2020"]
# pr_county_df["Total_Pop_Change"] = pr_county_df["Total_Population_2020"] - pr_county_df["Total_Population_2010"]
# pr_county_df["Pop60plus_Change"] = pr_county_df["Pop60plus_total_2020"] - pr_county_df["Pop60plus_total_2010"]
# pr_county_df["Vacant_Change"] = (pr_county_df["Vacant_2020"] - pr_county_df["Vacant_2010"])
# pr_county_df["VacantRatio_Change"] = pr_county_df["VacantRatio_2020"] - pr_county_df["VacantRatio_2010"]
# pr_county_df["AgedRatio_Change"] = (pr_county_df["AgedRatio_2020"] - pr_county_df["AgedRatio_2010"])
# pr_county_df
# # Compute means and standard deviations
# means = pr_county_df[vars_to_plot].mean()
# stds = pr_county_df[vars_to_plot].std()
# # Create z-score columns
# for var in vars_to_plot:
# z_col = var.replace("_Change", "_dz")
# pr_county_df[z_col] = (pr_county_df[var] - means[var]) / stds[var]
# Filter for lowest and highest vacancy quantile groups
# lowest = pr_county_df[pr_county_df["AgedRatio_dz"] == 1]
# highest = pr_county_df[pr_county_df["AgedRatio_dz"] == 5]
# # Use COUNTYA for grouping (could be COUNTY if preferred)
# # Step 1: Find common counties between the two groups
# common_counties = sorted(
# set(lowest["COUNTY"]).intersection(set(highest["COUNTY"]))
# )
# # Step 2: Group and sum vacant housing units for 2010 and 2020, reindexed by common counties
# vacant_low_2010 = lowest[lowest["COUNTY"].isin(common_counties)] \
# .groupby("COUNTY")["VacantRatio_2010"].sum().reindex(common_counties)
# vacant_low_2020 = lowest[lowest["COUNTYA"].isin(common_counties)] \
# .groupby("COUNTY")["VacantRatio_2020"].sum().reindex(common_counties)
# vacant_high_2010 = highest[highest["COUNTY"].isin(common_counties)] \
# .groupby("COUNTY")["VacantRatio_2010"].sum().reindex(common_counties)
# vacant_high_2020 = highest[highest["COUNTY"].isin(common_counties)] \
# .groupby("COUNTY")["VacantRatio_2020"].sum().reindex(common_counties)
# # Step 3: Plot
# x = np.arange(len(common_counties)) # consistent X positions
# fig, ax = plt.subplots(figsize=(12, 6))
# ax.scatter(x, vacant_low_2010.values, label='Lowest Age Ratio 2010', color='skyblue', marker='o', s=60)
# ax.scatter(x, vacant_low_2020.values, label='Lowest Age Ratio 2020', color='blue', marker='o', s=60)
# ax.scatter(x, vacant_high_2010.values, label='Highest Age Ratio 2010', color='orange', marker='^', s=60)
# ax.scatter(x, vacant_high_2020.values, label='Highest Age Ratio 2020', color='darkred', marker='^', s=60)
# # Formatting
# ax.set_ylabel('Vacant Housing Ratio')
# ax.set_xlabel('County (COUNTYA)')
# ax.set_title('Vacant Housing Units by County: Lowest vs Highest Age Ratio Change Quantile (2010 & 2020)')
# ax.set_xticks(x)
# ax.set_xticklabels(common_counties, rotation=90)
# ax.legend()
# ax.grid(True, axis='y')
# plt.tight_layout()
# plt.show()
Mapping data
Start preprocessing the Census data downloaded from IPUMS API
To map the shapefile to the map, we have to publish the shapefile as feature layer on MapViewer on ArcGIS Online by adding layer from your shapefile zipped folder.
This lesson is public facing so we already published the layers for the learners.
# # Explicit interactive authentication
# gis = GIS("https://www.arcgis.com", authenticate='interactive')
# # Get the layer from the published data
# results = gis.content.search("Census County Esri_US_Federal_Data", item_type="Feature Layer")
# for r in results:
# print(r)
Select the position of the “Cansus Zip Code Tabulation Areas” layer starting with position the first position as 0
.
# #select the appropiate item
# zip_item = results[3]
# item_id = zip_item.id
# print(f"Title: {zip_item.title}")
# print(f"ID: {item_id}")
With the item’s ID (ccfe5a9e3eea4ec68b7945b610422b0a) from ablove, we can use the layers for analysis
# from arcgis.features import FeatureLayer
# # Authenticate interactively
# gis = GIS("https://www.arcgis.com", authenticate='interactive')
# layer = gis.content.get(item_id).layers[0] # The layer of PR county 2020 published to the portal
# # Print all field names
# field_names = [field['name'] for field in layer.properties.fields]
# print(field_names)
# print(layer.url)
# layer = FeatureLayer(layer.url)
Hieu Change this for Change this for County to display the “COUNTY” name in the label and either VacancyRatio_dz
# # Build unique value info entries
# unique_value_infos = []
# for i in range(5):
# unique_value_infos.append({
# "value": i + 1,
# "label": quantile_labels[i],
# "symbol": symbols[i]
# })
# # Define the unique value renderer
# pr_uvr = {
# "type": "uniqueValue",
# "field1": "Total_Population",
# "uniqueValueInfos": unique_value_infos
# }
# # Define labeling info
# pr_labeling_info = [
# {
# "labelExpression": "[NAME_x]",
# "labelPlacement": "esriServerPolygonPlacementAlwaysHorizontal",
# "repeatLabel": True,
# "symbol": {
# "type": "esriTS",
# "color": [0, 0, 0, 255],
# "font": {
# "family": "Arial",
# "size": 12
# },
# "horizontalAlignment": "center",
# "kerning": True
# }
# }
# ]
# # Layer display options
# pr_options_dict = {
# "showLabels": True,
# "layerDefinition": {
# "drawingInfo": {
# "labelingInfo": pr_labeling_info,
# "renderer": pr_uvr
# }
# },
# "opacity": 0.5,
# "title": "Vacant Housing Units Decadal Change"
# }
# gis = GIS()
# m1 = gis.map("Puerto Rico")
# m1.content.add(pr_sedf_ipums, options = pr_options_dict)
# # update the legend label for LECZ layer in Notebooks
# m1.legend.enabled = True
# m1
Search for LECZ MERIT-DEM layer on ArcGIS Online’s Living Atlas portal.
With the arcgis package, we can search online for the polygons of the LECZ areas in Puerto Rico