Overview
Water distribution infrastructure in urban environments faces increasing challenges from aging pipes, material degradation, soil conditions, and topographic stress. Seattle's water system includes thousands of pipe segments varying in age, diameter, pressure class, and material — all factors that influence the likelihood of a main break, which can disrupt service, damage property, and require costly emergency response.
This project developed a reproducible, automated Python-based script tool for ArcGIS Pro that generates a composite failure risk index (RISK_IDX) for every water main segment in Seattle. The tool reads pipe attributes directly from Seattle Public Utilities' open dataset, normalizes each attribute into a risk metric, and computes a weighted composite score — producing a spatially interpretable risk layer across the entire city network.
The tool is designed to be highly portable: because it relies exclusively on attributes present in most utility datasets (age, material, diameter, pipe class, relining history), it can be adapted to any municipality that maintains comparable infrastructure records.
Study Area & Output Maps
Figure 1 — Study area: Seattle's water main network loaded from Seattle Public Utilities open data. Pipe segments vary in age, material, diameter, and pressure class.
Figure 2 — Water mains prior to risk scoring. Attribute fields reviewed: INSTALL_YR, DIAMETER, PIPE_CLASS, MATERIAL, RELINED_YR.
Figure 3 — Final RISK_IDX output. Older neighborhoods (Capitol Hill, Ballard, Fremont) show elevated risk from cast iron pipes. Relined central corridors appear lower risk.
The Risk Index Formula
The composite risk index is a weighted sum of five normalized component scores, each derived from pipe attribute fields. Weights reflect the relative importance of each factor to infrastructure failure probability.
0.30 × AGE_RISK — pipe age normalized over 0–100 years
+ 0.25 × MAT_RISK — material type lookup (CI=0.8, DI=0.6, PVC=0.3)
+ 0.20 × DIAM_RISK — smaller diameter = higher risk (4–36")
+ 0.15 × CLASS_RISK — lower pressure class = higher risk (50–300)
+ 0.10 × RELINE_RISK — relined pipe = 0.2, unrelined = 1.0
)
| Component | Source Field(s) | Weight | Visual | Logic |
|---|---|---|---|---|
| AGE_RISK | INSTALL_YR / INSTALL_DT | 30% | Age in years ÷ 100, clamped 0–1. Falls back from year to datetime field. | |
| MAT_RISK | MATERIAL | 25% | Lookup table maps material code to risk value. Cast iron highest. | |
| DIAM_RISK | DIAMETER | 20% | Inverted normalization — 4" = highest risk, 36" = lowest. Clamped in inches. | |
| CLASS_RISK | PIPE_CLASS | 15% | Reverse-normalized 50–300 psi range. Lower pressure class = higher risk. | |
| RELINE_RISK | RELINED_YR / RELINED_DT | 10% | Binary: relined pipe = 0.2 (lower risk), unrelined = 1.0 (default). |
Material Risk Lookup
Pipe material is the second most heavily weighted risk factor. The lookup table encodes known corrosion and failure susceptibility across material types commonly found in Seattle's aging network.
| Code | Material | Risk Value | Bar | Category |
|---|---|---|---|---|
| CI / CAST IRON | Cast Iron | 0.80 | High | |
| AC | Asbestos Cement | 0.75 | Med-High | |
| DI | Ductile Iron | 0.60 | Med-High | |
| STEEL | Steel | 0.50 | Moderate | |
| PVC | PVC | 0.30 | Low | |
| HDPE | HDPE | 0.30 | Low |
Output Risk Scale — RISK_IDX Results
The final RISK_IDX ranges from 0.372 to 0.794 across Seattle's water main network. Five natural-break classes from the output legend define the risk tiers visible in the map.
Methodology — Development Workflow
-
01
Data preparation in ArcGIS Pro
Water_Mains.shp loaded into a file geodatabase. Attribute fields reviewed: INSTALL_YR, INSTALL_DT, DIAMETER (confirmed in inches), PIPE_CLASS, MATERIAL, RELINED_YR, RELINED_DT. Manual exploratory calculations verified that age could be consistently derived from either year or datetime field.
-
02
Jupyter notebook prototyping
Risk logic first developed and tested in a standard Python notebook using pandas DataFrames. This allowed rapid iteration on normalization functions, material lookup values, and weight combinations before translation to the ArcGIS Pro environment.
-
03
Translation to ArcGIS Pro script tool
Notebook logic re-implemented using arcpy — GetParameterAsText() for inputs, CopyFeatures() for output, and UpdateCursor() for row-by-row field writing. Key fix: a helper function add_double_field() prevents re-adding existing fields across runs.
-
04
Debugging — parameter indexing & unit errors
Two critical bugs resolved: (1) Parameter count mismatch causing "Error in getting parameter as text" — fixed by aligning exactly two parameters (input FC, output FC). (2) Diameter normalization over mm scale caused all Seattle pipes to flag as high risk — corrected to clamp 4–36 inches.
-
05
Output generation & cartography
Tool produces Water_MainsRisk feature class with 7 new fields: AGE_YRS, AGE_RISK, DIAM_RISK, MAT_RISK, CLASS_RISK, RELINE_RISK, RISK_IDX. Results exported and styled in ArcGIS Pro with 5 natural-break classes from the RISK_IDX range.
Python — Core Script Tool
The complete tool runs as an ArcGIS Pro script tool with exactly two parameters. The core logic uses arcpy.da.UpdateCursor to iterate over every pipe segment and write all five component risk values plus the composite index.
Water main risk index (attribute-only).
Param 0: in_mains (Feature Class / Layer)
Param 1: out_mains (Feature Class)
"""
import arcpy
import datetime
material_risk_lookup = {
"CI": 0.8, "CAST IRON": 0.8,
"AC": 0.75, "DI": 0.6,
"STEEL": 0.5, "PVC": 0.3, "HDPE": 0.3
}
# Composite risk index (weights sum to 1.0)
risk_idx = (
0.30 * age_risk +
0.25 * mat_risk +
0.20 * diam_risk +
0.15 * class_risk +
0.10 * reline_risk
)
# Helper: add DOUBLE field only if not already present
def add_double_field(fc, name):
if name not in [f.name for f in arcpy.ListFields(fc)]:
arcpy.management.AddField(fc, name, "DOUBLE")
# Output fields written per segment:
# AGE_YRS · AGE_RISK · DIAM_RISK · MAT_RISK
# CLASS_RISK · RELINE_RISK · RISK_IDX
Dataset Inventory
| Dataset | Source | Type | CRS | Status |
|---|---|---|---|---|
Water_Mains.shp |
Seattle Public Utilities data.seattle.gov |
Shapefile | NAD83 HARN WA North | Active |
Break / Outage Events |
Seattle Public Utilities Outage Viewer |
Feature Service | WGS84 Web Mercator | Planned |
SSURGO Soil Polygons |
USDA NRCS websoilsurvey.sc.egov.usda.gov |
Shapefile | NAD83 | Planned |
1m DTM (King County West 2021) |
WA LiDAR Portal lidarportal.dnr.wa.gov |
Raster (GeoTIFF) | NAD83 HARN WA North | Planned |
WSDOT Traffic Sections (AADT) |
WA Spatial Data Hub geo.wa.gov |
Shapefile | NAD83 HARN WA North | Planned |
City_Limits.shp |
Seattle Open Data data.seattle.gov |
Shapefile | NAD83 HARN WA North | Active |
Limitations
- —Denver and Los Angeles were originally targeted but lacked publicly accessible water main datasets with the required attributes (installation year, material, diameter, relining history). Seattle was selected for its unusually complete open infrastructure data.
- —Slope and soil environmental predictors were acquired and conceptually integrated but excluded from the operational tool. The 1m DTM raster is too large for in-tool raster sampling without performance instability. SSURGO requires multi-table relational joins before a corrosivity index can be derived.
- —The composite index uses linearly normalized, human-defined weights rather than statistically calibrated failure probabilities. It reflects the priorities embedded in the model but is not yet a validated predictive model.
Future Work
Preprocess DTM using Add Surface Information to create a SLOPE_MEAN field per pipe. Normalize to represent topographic pressure stress as an additional risk component.
Derive corrosivity index from SSURGO component tables, spatially joined to water mains. Accounts for environmental degradation processes in buried iron pipe environments.
Combine historical break records with attribute and environmental variables to train a gradient boosting or random forest model using scikit-learn and ArcGIS Spatially Enabled DataFrames.
The tool's modular structure and reliance on universally available pipe attributes (age, material, diameter, relining, class) make it directly replicable in any municipality that maintains comparable infrastructure records — positioning it for broader use in both academic and professional utility contexts.