Data Quality Metrics

Transparent metrics on scraping accuracy, data freshness, validation processes, and quality control procedures.

Last updated: June 1, 2026

97.2%
Scrape Success Rate
Percentage of scheduled scrapes that complete without errors
94%
Verified (7 Days)
Rates verified within the last 7 days
95%
Fresh (<24h)
Rates observed within the last 24 hours
<3
Open Anomalies
Flagged records pending review

Scraping Frequency

RateAPI maintains a continuous scraping schedule to ensure data freshness while respecting source website resources.

Source Type
Frequency
Coverage
Credit Union Rate Pages
Daily
4,300+ institutions
PDF Rate Sheets
Daily
~800 institutions
Embedded Widgets
Daily
~200 institutions
High-Traffic Institutions
Twice Daily
Top 100 by volume

Validation Process

Every scraped rate passes through multiple validation layers before being served via the API.

1

Schema Validation

All extracted data must conform to our schema: rate (number), apr (number or null), points (number), term (months), productType (canonical category).

2

Range Checks

Rates must fall within expected ranges: mortgage rates 3-12%, auto loans 2-25%, HELOCs 5-15%, personal loans 5-36%. Out-of-range values are flagged.

3

Consistency Checks

APR must be greater than or equal to rate. 15-year rates should be lower than 30-year rates. Jumbo rates should be near conforming rates.

4

Historical Comparison

Changes exceeding 50 basis points in 24 hours trigger review. Complete rate disappearance triggers investigation.

5

Cross-Source Verification

When possible, rates are verified against multiple pages within the same institution's website.

Error Handling

When errors occur, we follow a systematic process to minimize impact on data quality.

Scrape Failures

If a scrape fails, we retry with exponential backoff (1h, 4h, 12h). After 3 failures, the source is marked for manual review. Previous valid data is retained with a staleness flag.

Parsing Errors

When page structure changes break our parsers, we detect this via empty or malformed results. AI-assisted extraction attempts recovery. Human review follows if needed.

Data Anomalies

Anomalous data (rate jumps, impossible values) is quarantined and excluded from API responses until reviewed. We never serve unverified anomalous data.

Historical Accuracy

We track accuracy by comparing our scraped rates against manually verified samples.

Metric
Value
Period
Rate Accuracy
99.2%
Last 30 days
APR Accuracy
98.7%
Last 30 days
Product Classification
99.5%
Last 30 days
False Anomaly Rate
2.1%
Last 30 days

Accuracy is measured by random sampling and manual verification against source websites. Discrepancies are investigated and corrected.

Quality Guarantees

No Stale Data

Data older than 7 days is marked with a staleness flag. Data older than 14 days is excluded from default API responses.

Source Attribution

Every rate includes the source URL where it was observed, allowing independent verification.

Timestamp Transparency

Every response includes observed_at timestamp showing exactly when the data was collected.

Correction Lineage

When corrections are made, we maintain full audit history. Original values are preserved with correction reason codes.

Questions About Data Quality?

We're committed to transparency. If you have questions about our data collection or find discrepancies, please reach out.