SDWA MCL Reference Mapping for Automated Compliance Pipelines

SDWA Maximum Contaminant Level (MCL) reference mapping is the governed translation layer that converts heterogeneous utility telemetry into deterministic federal compliance decisions. This set of pages covers how to model, version, and evaluate MCL thresholds so that every reading from a SCADA historian, laboratory information management system (LIMS), or legacy compliance archive is judged against the exact rule that was in force at the moment it was sampled. It is written for water utility operations staff, environmental compliance teams, and the municipal Python developers who build the automation between them, and it sits inside the broader Core Architecture & SDWA Compliance Taxonomy, which defines the shared vocabulary and lineage rules the rest of this reference consumes. Regulatory baselines are sourced from the EPA National Primary Drinking Water Regulations and mirrored into local reference tables without manual transcription.

Telemetry sources feed a governed MCL reference-mapping layer that emits auditable compliance decisions.

Regulatory & Standards Foundation

An MCL is not a single number; it is a number plus an averaging rule, a unit of measure, a monitoring frequency, and an effective date, each of which can change under a rulemaking amendment. A reference-mapping layer that stores only the numeric limit will silently mis-evaluate the moment the EPA revises a rule or a contaminant moves from an interim to a final standard. The mapping must therefore capture the full regulatory tuple for every contaminant and preserve superseded values as historical rows rather than overwriting them.

The averaging period is the single most consequential attribute because it dictates what is compared to the limit. A single-sample MCL (for example, acute nitrate) compares an individual result directly. A running annual average (RAA) — used for total trihalomethanes and haloacetic acids — compares the mean of four consecutive quarterly averages. For a locational running annual average (LRAA), the same calculation is applied per monitoring location. The RAA that the evaluation engine must reproduce exactly is:

\text{RAA}_{q} = \frac{1}{4}\sum_{i=q-3}^{q} \bar{C}_{i}

where $\bar{C}_{i}$ is the arithmetic mean of the valid samples in quarter $i$ and $q$ is the current quarter. Binding this formula to the correct contaminant is a mapping concern, not an application concern: the reference table declares which averaging period applies, and the evaluator dispatches on that declaration. The statutory monitoring cadence that determines when each $\bar{C}_{i}$ is even complete is owned by the Monitoring Frequency Scheduling module, so the reference table and the scheduler share the same contaminant keys.

Because primacy agencies may adopt limits stricter than the federal floor, the reference table must support jurisdictional overrides keyed to the reporting zone. The mapping layer resolves the more stringent of the federal and state value at evaluation time rather than baking a single number into code.

Architecture & Design Decisions

The governing design decision is a strict boundary between measurement ingestion and regulatory evaluation. Raw sensor payloads carry no regulatory context; the reference-mapping layer is where context is attached. Conflating the two — for example, hardcoding 0.010 as the arsenic limit inside a telemetry parser — is the anti-pattern that produces schema drift and un-auditable history. The relational modeling that enforces this separation is detailed in How to Map EPA MCLs to Relational Database Schemas, which specifies the slowly-changing-dimension tables the evaluation logic described here reads from.

Three data contracts cross this boundary. Entering the layer is a normalized reading: a contaminant key, a value, a unit, a monitoring-point identifier, a UTC sample timestamp, and a quality flag. Leaving the layer is a compliance decision: the resolved MCL version, the computed statistic, the pass/fail status, and a hash of the inputs. The layer itself owns the third contract — the versioned reference table — which it treats as a released artifact, not mutable configuration.

Timestamps arriving from field instrumentation are frequently irregular and locally-zoned; they must be normalized before any temporal comparison. That normalization is performed upstream by the Time-Series Alignment Strategies module, so the reference-mapping layer can assume every incoming timestamp is already tz-aware UTC. Downstream, a pass/fail status that resolves to a violation is handed to Violation Code Classification for statutory coding and, ultimately, to the MCL Exceedance Logic Implementation engine that drives real-time alerting. Keeping these responsibilities in separate modules means a rule amendment touches only the reference table, never the surrounding code.

Phase-by-Phase Implementation

The pipeline is built in five phases: model the reference records, resolve the record in force for a given sample, normalize units, evaluate the averaging rule, and emit an audit record. The stage sequence below maps directly onto those phases.

Five-stage MCL reference-mapping pipeline from ingestion to continuous verification.

Phase 1 — Model the reference record

Every regulated contaminant is represented by a validated record that carries the full regulatory tuple. Modeling this with Pydantic gives type coercion, range validation, and a serializable schema for free, and rejects malformed rows at load time rather than at evaluation time.

Define an enum of averaging periods so the evaluator can dispatch on a closed set.
Model the record with Decimal limits (never float, to avoid rounding drift on regulatory numbers).
Validate that every limit is positive and that expiration follows effective date.

from datetime import date
from decimal import Decimal
from enum import Enum
from typing import Optional

from pydantic import BaseModel, model_validator, field_validator


class AveragingPeriod(str, Enum):
    SINGLE = "single_sample"
    RAA = "running_annual_average"
    LRAA = "locational_running_annual_average"
    P90 = "ninetieth_percentile"


class MCLRecord(BaseModel):
    contaminant_id: str            # canonical internal key
    epa_regulatory_id: str         # EPA contaminant code, e.g. "1005"
    name: str
    unit: str                      # canonical unit: "mg/L", "NTU", "pCi/L"
    mcl_value: Decimal
    averaging_period: AveragingPeriod
    effective_date: date
    expiration_date: Optional[date] = None
    reporting_zone: Optional[str] = None   # set for jurisdictional overrides

    @field_validator("mcl_value")
    @classmethod
    def _positive(cls, v: Decimal) -> Decimal:
        if v <= 0:
            raise ValueError("MCL value must be positive")
        return v

    @model_validator(mode="after")
    def _ordered_dates(self) -> "MCLRecord":
        if self.expiration_date is not None and self.expiration_date <= self.effective_date:
            raise ValueError("expiration_date must be after effective_date")
        return self

Phase 2 — Resolve the record in force

For any sample, exactly one MCL should govern the decision: the record whose validity window contains the sample date. When windows overlap because of a late-loaded amendment, the most recently effective record wins.

from datetime import date
from typing import Iterable, Optional


def resolve_active_mcl(
    records: Iterable[MCLRecord],
    contaminant_id: str,
    as_of: date,
    reporting_zone: Optional[str] = None,
) -> Optional[MCLRecord]:
    """Return the single MCL in force for a contaminant on a given date.

    A zone-specific override, when present and more stringent, takes precedence
    over the federal record for the same contaminant and date.
    """
    in_force = [
        r for r in records
        if r.contaminant_id == contaminant_id
        and r.effective_date <= as_of
        and (r.expiration_date is None or as_of < r.expiration_date)
    ]
    if not in_force:
        return None

    zoned = [r for r in in_force if r.reporting_zone == reporting_zone]
    federal = [r for r in in_force if r.reporting_zone is None]
    candidate_pool = zoned or federal
    if not candidate_pool:
        return None

    # Most recently effective record wins on overlap; then the stricter limit.
    return min(
        candidate_pool,
        key=lambda r: (-r.effective_date.toordinal(), r.mcl_value),
    )

Phase 3 — Normalize units against the canonical unit

The reference record declares the canonical unit; incoming readings must be converted into it before comparison. Conversions are explicit and allow-listed — an unknown unit pair raises rather than guessing, which is what stops a µg/L reading from being compared against an mg/L limit 1000× too loosely.

from decimal import Decimal

# Only dimensionally valid, pre-approved conversions are permitted.
UNIT_CONVERSIONS: dict[tuple[str, str], Decimal] = {
    ("ug/L", "mg/L"): Decimal("0.001"),
    ("ppb", "mg/L"): Decimal("0.001"),
    ("mg/L", "ug/L"): Decimal("1000"),
}


def to_canonical(value: Decimal, from_unit: str, to_unit: str) -> Decimal:
    if from_unit == to_unit:
        return value
    try:
        factor = UNIT_CONVERSIONS[(from_unit, to_unit)]
    except KeyError as exc:
        raise ValueError(
            f"No approved conversion from {from_unit} to {to_unit}"
        ) from exc
    return value * factor

Phase 4 — Evaluate the averaging rule

The evaluator dispatches on the record’s declared averaging period. Single-sample limits compare directly; RAA and LRAA reproduce the four-quarter mean shown above. The evaluation of a completed statistic against the limit is deliberately trivial — the difficulty lives in assembling a valid, complete window, which the next section covers. Precise quarter boundaries are computed with Python’s datetime module so rolling windows align with regulatory periods rather than calendar approximations.

from decimal import Decimal


def running_annual_average(quarterly_means: list[Decimal]) -> Decimal:
    if len(quarterly_means) != 4:
        raise ValueError("RAA requires exactly four consecutive quarters")
    return sum(quarterly_means, Decimal(0)) / Decimal(4)


def evaluate(
    value: Decimal,
    mcl: MCLRecord,
    quarterly_history: list[Decimal],
) -> str:
    """Return COMPLIANT, VIOLATION, or INSUFFICIENT_DATA for one decision."""
    if mcl.averaging_period is AveragingPeriod.SINGLE:
        return "VIOLATION" if value > mcl.mcl_value else "COMPLIANT"

    if mcl.averaging_period in (AveragingPeriod.RAA, AveragingPeriod.LRAA):
        if len(quarterly_history) < 4:
            return "INSUFFICIENT_DATA"
        statistic = running_annual_average(quarterly_history[-4:])
        return "VIOLATION" if statistic > mcl.mcl_value else "COMPLIANT"

    raise NotImplementedError(f"Unhandled averaging period: {mcl.averaging_period}")

Phase 5 — Emit an immutable audit record

Every decision produces a record that ties the raw inputs to the applied rule version, so a primacy reviewer can reconstruct the calculation. Hashing the canonicalized inputs gives a tamper-evident anchor.

import hashlib
import json
from datetime import datetime, timezone


def audit_record(value, mcl: MCLRecord, status: str, sample_ts: datetime) -> dict:
    payload = {
        "contaminant_id": mcl.contaminant_id,
        "mcl_value": str(mcl.mcl_value),
        "mcl_effective_date": mcl.effective_date.isoformat(),
        "value": str(value),
        "status": status,
        "sample_timestamp": sample_ts.astimezone(timezone.utc).isoformat(),
    }
    digest = hashlib.sha256(
        json.dumps(payload, sort_keys=True).encode("utf-8")
    ).hexdigest()
    return {**payload, "input_hash": digest, "evaluated_at": datetime.now(timezone.utc).isoformat()}

Validation, Quality Flags & Edge Cases

A reading is only eligible for compliance evaluation after it clears a quality gate. The mapping layer honours the quality flag set by the upstream parser and refuses to average SUSPECT or OFFLINE values into a quarterly mean. The state machine below shows how a decision transitions from a raw reading through evaluation to a final, auditable status.

Lifecycle of a single MCL decision, from a received reading to a routed violation.

The edge cases that break naive implementations are almost all temporal:

Partial windows. An RAA computed from fewer than four completed quarters is not “compliant by default” — it is INSUFFICIENT_DATA. Treating an incomplete window as passing hides genuine exceedances until enough data accrues.
Quarter boundaries across DST. Quarterly means are bucketed on UTC sample timestamps. Bucketing on local wall-clock time shifts one hour of samples into the wrong quarter twice a year; always normalize to UTC before assigning a sample to a compliance period.
Leap years. Rolling twelve-month windows defined by day arithmetic drift on leap years. Anchor windows to quarter labels (year, quarter) rather than to a fixed number of days.
Rule changes mid-window. When an MCL amendment takes effect partway through an averaging period, each sample is evaluated against the record in force on its own sample date, and the window statistic is compared against the record in force at window close. resolve_active_mcl is therefore called per sample, not once per report.
Detection-limit substitution. A non-detect must be substituted per the analytical method’s rule (typically zero or one-half the reporting limit) before it enters a mean; substituting the reporting limit itself inflates the average and manufactures phantom violations.

Handling of missing readings so that data gaps do not themselves trigger false violations is developed further in Monitoring Gap Detection Algorithms.

Deployment & Integration Patterns

The reference table is deployed as a governed artifact, not as editable runtime config. It lives in version control, is diffed against the EPA’s published tables on a fixed schedule, and is promoted through the same validation pipeline as application code. A typical deployment ships the table as an immutable, checksummed data file baked into the evaluator’s container image, with the container filesystem mounted read-only so no process can mutate the loaded rules in place.

The evaluator itself is best run as a stateless service behind a message broker: normalized readings arrive on an ingestion topic, decisions and audit records are published to a compliance topic, and quarterly aggregates are materialized separately. Statelessness lets the service scale horizontally and lets a reference-table upgrade roll out as a blue/green image swap rather than a live mutation. Reference-table versions are pinned per deployment so that a replay of historical data reproduces the exact decisions originally made — a hard requirement for defensible recordkeeping under the Core Architecture & SDWA Compliance Taxonomy lineage rules. Because the evaluator sits on the compliance side of the network boundary described in Security Boundary Design, it consumes already-sanitized, read-only measurement data and never reaches back into the OT control network.

Production Validation Checklist

Failure Modes & Gotchas

The single most consequential failure is a reference table that drifts out of sync with the current Code of Federal Regulations. When an EPA amendment lowers a limit or changes an averaging rule and the local table is not updated, the pipeline keeps returning COMPLIANT against a stale threshold — a missed violation that no exception is raised for and no alert fires on, because from the code’s perspective everything is working. It surfaces only at a primacy review, as a finding. The mirror-image failure is inconsistent unit handling, which produces phantom violations that erode operator trust in the automation.

Both are caught the same way: treat the reference table as governed data. Schedule an automated diff of the local table against the EPA’s published values, fail the build when they diverge, and require an explicit, reviewed promotion to change a limit. The release process that governs application code must also govern regulatory data — the two are equally load-bearing, and only one of them is enforced by a compiler.

SDWA MCL Reference Mapping for Automated Compliance Pipelines

Related pages