Usability of Municipal AI Policy Documents: A Heuristic Evaluation and NLP Analysis Across 20 U.S. Cities
Nikhil Mehra
Affiliation: Ethical Culture Fieldston School
IJSCAR Vol. 3, Issue 2 (2026) · pp. 23–39
DOI: 10.67149/yhjs2024.5/r4d2w9ky
Abstract
As artificial intelligence tools become embedded in government operations U.S. municipalities are rapidly publishing policies to govern their use with over twenty cities and counties now maintaining public-facing AI governance documents. However no systematic evaluation has examined whether these documents are actually usable by the employees contractors and members of the public they are meant to guide. This gap is consequential: even a substantively strong policy fails if its intended audience cannot find what they need understand what they find or act on what they understand. We evaluate 20 municipal AI policy documents spanning large medium and small cities across five geographic regions using a 30-heuristic framework grounded in HCI usability principles and government plain-language standards complemented by an NLP-based complexity analysis we call the Composite Legal Readability Score (CLRS). Our central finding is a systematic infrastructure-interface gap: cities build stronger governance scaffolding (organizational structure and visual design) than user-facing communication (plain language findability audience awareness actionability). The gap is statistically significant (Delta=0.63 p<.001 Cohen’s d=2.53) and observed in all 20 cities and survives sensitivity analysis under heuristic reweighting category reassignment and leave-one-out perturbation. Actionability is the worst-performing category (M=2.28 SD=0.30) more than a full severity point above the next-worst. Decomposed into five sub-dimensions the actionability deficit is uniform: every document in our corpus has a minimum severity of 2 on procedural temporal implementation and enforcement clarity; only norm clarity (must/should/can language) is largely solved. Readability and actionability correlate strongly (r=0.87) indicating that complex language and missing compliance guidance co-occur as a compounding problem rather than a tradeoff. We complement these findings with a worked before/after redesign of a Baltimore-style passage and a task-based walkthrough connecting heuristic scores to predicted user friction; both are illustrations of the rubric’s internal logic not validations. All quantitative claims rest on a single-evaluator scoring of 30 heuristics across 20 documents; the limits this places on every claim in the paper are addressed in Section 6.
Keywords: HCI, heuristic evaluation, AI governance, document usability, plain language, municipal policy, NLP, readability, legal text complexity