Hi, just want to flag an issue noticed in the nhs-number Python package in case fixing it will create any issues for people.
opened 12:01PM - 27 Apr 26 UTC
## Summary
The boundary between the first `UNALLOCATED` band and the Scotland C… HI number range is **off by an order of magnitude** between `nhs_number/constants.py`'s module docstring and the actual `Range` definitions.
External references on the CHI number format (`DDMMYYxxxc`, where the first 6 digits are the date of birth) confirm the **docstring** is correct: the smallest plausible CHI is `0101000000` (1 Jan 1900), so Scotland CHI numbers begin around `100_000_000`, not `10_000_000`. The code value puts the start of Scotland an order of magnitude too low and silently classifies a band of unallocated numbers (`9_999_999 < n < 100_000_000`) as Scotland CHI numbers.
## Discrepancy
**Module docstring** ([`nhs_number/constants.py:15-16`](https://github.com/uk-fci/nhs-number/blob/a6d0aa7cc7ab506ad9eb5ce97fee921706199e4f/nhs_number/constants.py#L15-L16)):
```
0000000010 - 0099999999 Unreserved (not in use)
0100000000 - 3112999999 Scotland CHI numbers (DDMMYYxxxn)
```
→ boundary at **99,999,999 / 100,000,000** (~100M) ✅ matches CHI spec.
**Range definitions** ([`nhs_number/constants.py:72-82`](https://github.com/uk-fci/nhs-number/blob/a6d0aa7cc7ab506ad9eb5ce97fee921706199e4f/nhs_number/constants.py#L72-L82)):
```python
RANGE_UNALLOCATED_1 = Range(
start=10,
end=9_999_999,
label="Unallocated 1 (not in use)",
)
RANGE_SCOTLAND = Range(
start=10_000_000,
end=3_112_999_999,
label="Scotland CHI numbers",
)
```
→ boundary at **9,999,999 / 10,000,000** (~10M) ❌ ten times too low.
## Why it matters
`Range.contains_number` (used by [`Region.contains_number`](https://github.com/uk-fci/nhs-number/blob/a6d0aa7cc7ab506ad9eb5ce97fee921706199e4f/nhs_number/constants.py#L65-L69) and by both `is_valid(..., for_region=...)` and `NhsNumber`) will:
- Return `True` for Scotland CHI on numbers in `[10_000_000, 99_999_999]` that are actually unallocated.
- Return `False` for Scotland CHI on numbers in `[100_000_000, ?` that... actually still match because the code's range extends all the way down. So the *real* effect is: the code's Scotland range is **a strict superset** of the spec's Scotland range — it falsely includes the band `[10_000_000, 99_999_999]`.
Concretely:
```python
>>> from nhs_number import is_valid
>>> from nhs_number.constants import REGION_SCOTLAND
>>> # 0050000000 — should be in the unreserved band per CHI spec
>>> is_valid("0050000003", for_region=REGION_SCOTLAND) # checksum-valid example
True # bug: should be False (this number is in the unreserved band)
```
## Suspected cause
Likely a typo when transcribing the documented `0099999999` / `0100000000` 10-digit boundary into Python ints — one zero was dropped from each. Other boundaries in the file (3112999999, 3199999999, 3200000000, etc.) all match the docstring exactly, so this is a localised error.
## Proposed fix
Update the two values in code to match the docstring (and the CHI spec):
```python
RANGE_UNALLOCATED_1 = Range(
start=10,
end=99_999_999, # was 9_999_999
label="Unallocated 1 (not in use)",
)
RANGE_SCOTLAND = Range(
start=100_000_000, # was 10_000_000
end=3_112_999_999,
label="Scotland CHI numbers",
)
```
## Risk / behaviour change
This is a **behaviour change**. Any caller currently relying on the buggy behaviour (treating numbers in `[10_000_000, 99_999_999]` as Scotland CHI) will see a result flip from `True` to `False` for `is_valid(..., for_region=REGION_SCOTLAND)` and `NhsNumber.region` will become `REGION_UNALLOCATED` instead of `REGION_SCOTLAND` for those numbers.
In practice this band is unlikely to contain any real-world CHIs (CHIs derived from real dates of birth start with at least `0101…` = ~100M+), so the blast radius is probably small — but flagging here for visibility before changing it.
## Suggested actions
1. Confirm the docstring/spec value is correct (cross-check with another CHI source).
2. Apply the two-line fix above.
3. Add boundary tests for every Range edge (already planned as item #2 of `test-improvements.md` — those tests should pin the **corrected** values once this issue is resolved).
4. Mention the behaviour change in the release notes / next version bump.
## Notes
- No external bug report exists for this — discovered while planning boundary tests for the test-coverage improvements (cf. `test-improvements.md`).
- All other boundaries in the file appear correct against the docstring.
It’s my belief that since no valid CHI numbers are in the range affected, the fix will not cause any behaviour change in downstream programs. But I am sense-checking that with the community first.