OSHA intelligence that transforms public employer safety data into actionable underwriting decisions for workers' compensation.
MarisRisk is a web-based underwriting decision-support platform that consolidates publicly available U.S. employer safety data into searchable, scored employer profiles. It is purpose-built for workers' compensation underwriters, loss-control consultants, and safety analysts who need to evaluate an employer's safety posture quickly and accurately.
Instead of manually searching OSHA's website, cross-referencing inspection records with injury reports, and estimating peer comparisons in spreadsheets, MarisRisk does it all automatically. Every employer in the database has a composite risk score, an enforcement timeline, injury rate trends, peer benchmarks, hazard categorization, ML-powered forward-risk predictions, and auto-generated underwriter questions — all on a single page.
MarisRisk ingests data from multiple authoritative public sources. Each source is pulled on a regular cadence, hashed for idempotency, and tracked in an immutable audit trail (source snapshots) so you always know exactly where the data came from and when it was last updated.
Source: U.S. Department of Labor API v4 (apiprod.dol.gov)
The enforcement dataset is the backbone of MarisRisk. It contains every OSHA inspection conducted at U.S. workplaces — both by federal OSHA and by the 29 state plan agencies — along with every citation (violation) issued during those inspections.
What's in an inspection record:
What's in a violation record:
Volume: Over 5.2 million inspections and 13.2 million violations dating back decades. Updated via nightly incremental syncs with rate limiting and exponential backoff to respect DOL API limits.
Source: OSHA SIR dataset (osha.gov ZIP download)
Since 2015, employers are required to report within 24 hours any workplace fatality, amputation, loss of eye, or hospitalization. These reports are separate from inspections — an SIR can be filed even if no inspection follows.
What's in an SIR record:
Volume: 103,000+ events. Coverage caveat: SIR data is federal-jurisdiction only — state plan SIR data is not included in the federal database.
Source: OSHA ITA annual files (osha.gov, 2016–present)
The ITA dataset captures establishment-level injury and illness summaries that certain employers must submit annually. This is the only source for TRIR and DART rate calculations.
What's in an ITA record:
Volume: 1.4 million+ summaries. Coverage caveat: ITA participation is not universal — many industries are exempt. Absence of ITA data does not mean zero injuries.
Source: FMCSA MCMIS bulk CSV files
For employers that operate commercial motor vehicles, MarisRisk ingests the full FMCSA carrier census plus inspection, violation, crash, and BASIC safety score records. This dataset is linked to OSHA establishments via DOT number, DUNS, EIN, or fuzzy matching.
What's included:
Every data source follows the same ingestion pipeline framework:
Key design decisions:
The same employer can appear with different names, addresses, and identifiers across OSHA enforcement, ITA, SIR, FMCSA, and credit data. MarisRisk uses a multi-stage entity resolution engine to link all of these records to a single canonical employer profile.
If a source record shares a unique business identifier with an existing establishment, it is linked immediately:
Before fuzzy matching, all text is normalized:
PostgreSQL's pg_trgm extension computes trigram similarity between the normalized name and all existing establishments. The top 20 candidates with similarity > 0.15 are returned using a GIN index for speed, even at 4 million+ establishments.
Each candidate is scored on a 0.0–1.0 scale using six weighted components:
| Component | Weight | Method |
|---|---|---|
| Name similarity | 35% | Jaro-Winkler with token Jaccard floor (if token overlap < 50%, cap JW at 0.5 to prevent false positives) |
| Address similarity | 25% | Token overlap with street number exact-match bonus (+0.3) or penalty (different street number = 0.1) |
| Geo distance | 15% | Haversine: <100m=1.0, <500m=0.8, <1km=0.6, <5km=0.3, <20km=0.1, else 0.0 |
| ZIP code | 10% | Exact 5-digit match |
| City | 10% | Exact string match |
| State | 5% | Exact 2-letter code match |
If both records lack geolocation data, the 15% geo weight is redistributed proportionally to the other components.
After all data is loaded, a two-phase batch dedup job runs:
Every match decision (auto or human) is recorded in an audit trail with confidence scores, method, and rationale.
Every establishment receives a risk score from 0 (lowest risk) to 100 (highest risk), powered by a three-stage actuarial model grounded in credibility theory. The score reflects how risky an employer is relative to its NAICS industry peers, with appropriate shrinkage for employers with limited data.
| Stage | Model | Purpose |
|---|---|---|
| 1. Negative Binomial GLM | Generalized Linear Model with NAICS-2 fixed effects and log(hours) exposure offset | Establishes baseline injury frequency prediction accounting for industry risk structure and employer size (exposure hours). |
| 2. Buhlmann-Straub Credibility | Z = hours / (hours + k), where k ≈ 50,000 hours (~25 FTE-years) | Blends individual employer experience with the NAICS-4 group rate. Small employers (Z ≈ 0.17) lean heavily on class priors; large employers (Z > 0.8) are predominantly experience-rated. This is the actuarial standard for balancing credibility vs. class rating. |
| 3. LightGBM Residual Booster | Gradient-boosted model on Pearson residuals from Stage 1 | Captures non-linear interactions across 27 features: prior TRIR/DART, enforcement history, SIR events, violation severity, penalty trends, illness ratios, and more. Optuna-tuned hyperparameters with temporal validation. |
Single-location organizations inherit their establishment score directly. Multi-location organizations use an exposure-weighted average of child location scores (weighted by hours worked where ITA data is available, equal weight otherwise), plus additive overlay adjustments:
| Overlay | Adjustment | Trigger |
|---|---|---|
| Systemic Violations | +5 to +15 | Serious+ violations at 3, 5, or 8+ locations in last 3 years |
| Concentration Risk | +0 to +10 | Worst location score > 2x the average |
| Financial Stress | +5 or +10 | Credit score below 50 or below 30 |
| Bankruptcy | +10 | Bankruptcy filing on record |
| Active Litigation | +5 | 5+ active court cases |
The credibility weight Z determines how much the score relies on employer-specific experience vs. NAICS group priors. Each scored employer is tagged with a data sufficiency tier:
The core scoring model predicts next-year Total Recordable Incident Rate (TRIR) for each establishment using the three-stage pipeline described in Section 6. It is trained on multi-year ITA panel data joined with enforcement and SIR history.
Training pipeline:
Features (27):
Beyond the core score, each employer profile includes 12-month event probabilities for three outcomes:
Each probability includes a confidence level (high/moderate/low) and a data basis statement explaining the underlying evidence.
The forward risk panel identifies recurrent OSHA standards — specific regulations cited across multiple inspections — and computes inter-inspection interval trends (accelerating, stable, or decelerating enforcement frequency).
Beyond the numeric score, MarisRisk generates defensible flag indicators that highlight specific risk signals an underwriter should investigate.
The main search bar accepts employer names, EINs, addresses, or DOT numbers. Results are powered by PostgreSQL trigram similarity on normalized text, meaning typos, abbreviations, and partial names still return relevant results.
Two search modes:
Filters: State, city, ZIP code. All filters are combinable.
Match confidence: Every result shows a confidence percentage indicating how closely the search query matches the employer name or address. Results below 30% similarity are filtered out.
Recent searches: Your last several searches are saved locally for quick access (logged-in users only).
If you enter a 9-digit number (with or without the dash), MarisRisk performs an exact EIN match first, bypassing fuzzy search entirely. This is the fastest way to find a specific legal entity.
Enter a 5-7 digit DOT number to find the FMCSA carrier and all OSHA establishments linked to it.
The employer profile page is the core of MarisRisk. It consolidates everything known about a single establishment into a multi-section dashboard. Here is what each section shows:
When an employer operates multiple locations, MarisRisk groups them under a parent organization. The organization view provides a portfolio-level risk assessment:
For employers that operate commercial motor vehicles, the carrier search and detail pages provide a complete FMCSA safety profile:
The NAICS browser lets you explore employers by industry classification. Navigate through a three-level hierarchy:
The Energy/Mining category also includes MSHA mine data with coal vs. metal/nonmetal toggles and status filters (active, abandoned, etc.).
Your portfolio is your book of quoted and bound accounts. Add any employer or organization directly from their profile page — this creates a policy record with pre-matched locations.
Click "Add to Portfolio" on any employer or organization page to create a quoted policy. The form pre-fills the insured name and automatically links all matching establishment locations. Your portfolio is accessible from the "Portfolio" link in the navigation bar.
All quoted and bound accounts are automatically monitored daily. When new OSHA activity is detected, it is flagged on your portfolio dashboard. Monitored events include:
Policies past their expiration date are automatically moved to expired status.
MarisRisk supports two export formats from any employer profile:
Organization-level PDF exports are also available, aggregating data across all child locations.
Admins can create, edit, and deactivate user accounts. Three roles control access: User (standard access), Manager (expanded access), and Admin (full access including user management and system configuration). Admins can reset passwords, toggle active status, and change roles inline.
When the entity resolution engine produces a low-confidence match (0.75–0.84), it is queued for human review. The review page shows side-by-side comparisons of the source record and candidate establishment (name, address, city, state, ZIP, confidence score). Admins approve or reject each match, and all decisions are logged in the audit trail.
A real-time admin dashboard (auto-refreshes every 30 seconds) showing:
MarisRisk includes enforcement data from both federal OSHA and all 29 state plan agencies (22 covering private + public sector, 7 covering public sector only). The Coverage page on the website shows real-time status for each data source and state plan jurisdiction.
Important caveats to understand:
| Term | Definition |
|---|---|
| TRIR | Total Recordable Incident Rate. (Total recordable cases / total hours worked) x 200,000. Industry-standard measure of workplace injury frequency. |
| DART Rate | Days Away, Restricted, or Transferred rate. ((DAFW cases + DJTR cases) / total hours worked) x 200,000. Measures more serious injuries that impact work capacity. |
| SIR | Severe Injury Report. OSHA-mandated 24-hour reporting of workplace fatalities, amputations, loss-of-eye, and hospitalizations. |
| ITA | Injury Tracking Application. OSHA's system for collecting establishment-level annual injury/illness summaries. |
| NAICS | North American Industry Classification System. 6-digit code classifying businesses by industry (e.g., 238210 = Electrical Contractors). |
| SIC | Standard Industrial Classification. Older 4-digit industry code system, still used by some OSHA data sources. |
| EIN | Employer Identification Number. IRS-assigned 9-digit number uniquely identifying a legal business entity. |
| DOT Number | Department of Transportation number assigned to commercial motor carriers by FMCSA. |
| DUNS | Dun & Bradstreet Universal Numbering System. 9-digit identifier for business entities. |
| FMCSA | Federal Motor Carrier Safety Administration. Regulates commercial motor vehicle safety. |
| BASIC | Behavioral Analysis and Safety Improvement Categories. FMCSA's 7 safety performance categories used in the SMS system. |
| OOS | Out of Service. FMCSA designation when a driver or vehicle is removed from service for safety violations. |
| Willful Violation | The most serious OSHA citation — employer intentionally violated a standard or showed plain indifference. Penalties up to $156,259 per violation. |
| Repeat Violation | Employer was previously cited for a substantially similar hazard. Penalties up to $156,259. |
| Serious Violation | Hazard that could cause death or serious physical harm that the employer knew or should have known about. Penalties up to $15,625. |
| Failure to Abate | Employer did not correct a previously cited hazard by the abatement deadline. Penalties up to $15,625 per day. |
| Fat/Cat Inspection | Fatality/Catastrophe inspection triggered by a workplace death or hospitalization of 3+ employees. |
| Jaro-Winkler | String similarity algorithm (0.0–1.0) optimized for short strings like names. Used in entity resolution to score match candidates. |
| Trigram Similarity | PostgreSQL pg_trgm extension that compares strings by their 3-character subsequences. Used for fuzzy search indexing. |
| Buhlmann-Straub | Actuarial credibility method that blends individual experience with group priors. Z = exposure / (exposure + k), where k is estimated from within- vs. between-entity variance. |
| Credibility Weight (Z) | 0-1 weight determining how much an employer's score relies on its own data vs. NAICS class rate. Z=0 means pure class rating; Z=1 means pure experience rating. |
| SHAP Values | SHapley Additive exPlanations. ML explainability method showing how each feature contributes to an individual prediction. |
| MSHA | Mine Safety and Health Administration. Federal agency regulating mine safety. |
| VADER | Valence Aware Dictionary and sEntiment Reasoner. Rule-based sentiment analysis tool used for scoring news article sentiment. |
Employer Safety Intelligence for Workers' Comp Underwriting
Updated March 2026. Features and data volumes are updated continuously.