IT/OT SOC Integration Playbook
Getting OT visibility into an enterprise SOC sounds straightforward until you try it. Raw industrial telemetry fed into an IT SIEM produces one of two failure modes: alert fatigue from normal engineering activity that looks like attacks to an analyst who has never seen a Modbus write command, or silence because someone filtered the OT data too aggressively to avoid the noise. Neither produces useful detections. This guide covers the full integration arc — translating OT protocol behavior into security-relevant context, building triage logic that distinguishes maintenance from malicious, choosing the right SOC model for your environment, and measuring whether the integration is actually working.
Why direct OT-to-SIEM integration fails
Most OT monitoring platforms can forward syslog or CEF events to a SIEM. The challenge is context.
An IT analyst looking at a SIEM alert sees an event type, a source, a destination, and a severity score. They apply learned pattern recognition: this looks like lateral movement, this looks like credential stuffing, this looks like a port scan. That pattern recognition was built on IT network behavior.
OT networks behave differently in ways that break IT pattern recognition at every level. A PLC that sends the same command to the same register every 100 milliseconds is not executing a brute force attack — it is running a control loop. A historian that polls 400 devices simultaneously is not conducting reconnaissance — it is doing its job. An engineering workstation that connects to 30 PLCs in sequence is not spreading malware — it is running a maintenance script.
Every one of those behaviors generates alerts in an IT SIEM with default rules. Every one of those alerts is a false positive that erodes analyst trust in the OT data feed. Within weeks, analysts learn to ignore OT alerts entirely — which means the integration has made your security posture worse, not better.
Tuning the SIEM won't help. The solution is to build a translation and filtering layer between OT monitoring and the SIEM that converts raw industrial telemetry into security-relevant context before it ever reaches an analyst queue.
The jargon translation layer
The translation layer maps OT protocol behavior to the security event taxonomy your analysts already understand. Without it, an analyst seeing a raw OT alert has no framework for deciding whether it matters.
Protocol anomaly to security alert mapping
| OT event | Raw SIEM interpretation | Correct security framing |
|---|---|---|
| Modbus FC 05 — Force Single Coil | Unknown write command to field device | Unauthorized actuator command — potential physical impact |
| Modbus FC 08 — Diagnostics | Diagnostic query to PLC | Reconnaissance of device capabilities |
| DNP3 Unsolicited Response outside normal cycle | Unexpected outbound communication from RTU | Possible C2 callback or device compromise |
| EtherNet/IP — Unconnected Message to new device | Connection attempt to unregistered asset | Asset discovery or lateral movement attempt |
| Multiple failed auth attempts to HMI | Brute force | Brute force — same interpretation, but physical access impact |
| Engineering workstation connecting to PLC outside maintenance window | Lateral movement | Unauthorized configuration access — requires immediate triage |
| New device appearing on OT segment | Rogue device | Rogue device — same interpretation, elevated severity given OT context |
| Firmware download to field device | Software installation | Firmware modification — potential persistent implant |
Building the baseline
The translation layer requires a behavioral baseline for your specific environment. You cannot distinguish a malicious Modbus write from a legitimate one without knowing what legitimate looks like in your environment.
Baseline collection requires a minimum of four weeks of passive monitoring before any alerting is configured. During that period:
- Document every device, every communication pair, and every protocol in use
- Map scheduled maintenance windows and the engineering activity that occurs during them
- Identify every historian poll, every control loop, and every vendor connection
- Note anomalies that appear during baseline — these are candidates for immediate investigation before alerting goes live
The baseline is the foundation of your suppression rules. Any event that matches a baselined behavior pattern is a candidate for suppression or reduced severity. Any event that deviates from the baseline is a candidate for escalation.
Alert triage architecture
The triage architecture defines what the SIEM receives, what analysts see, and what gets automatically suppressed or escalated. It has three layers.
Events that are definitively normal OT behavior with no security relevance are suppressed at the OT monitoring platform before they are forwarded to the SIEM. These are not logged at reduced severity — they are not forwarded at all.
Suppression candidates:
- Scheduled historian polls matching the baseline pattern
- Control loop traffic on known good communication pairs
- Vendor connections during approved maintenance windows
- Routine diagnostic traffic from known engineering workstations to known assets
Suppression rules require regular review. A suppressed communication pair that changes behavior — new timing, new destination, new protocol — should trigger a rule exception and forward the anomaly.
Events that are forwarded to the SIEM carry enrichment that gives analysts the context to triage without OT expertise. At minimum, every forwarded OT event should include:
| Field | Content |
|---|---|
| Asset criticality | High / Medium / Low — based on your asset classification |
| Physical impact potential | Yes / No — does compromise of this asset affect the physical process? |
| Baseline deviation | First seen / Rare / Common — how often does this event type occur? |
| Maintenance window status | In window / Out of window — is this during an approved maintenance period? |
| Owning team | OT engineering / IT security / Vendor — who should be notified? |
An analyst who sees "EtherNet/IP connection to High criticality PLC, physical impact: Yes, out of maintenance window, first seen" has enough context to triage correctly without understanding EtherNet/IP.
Every OT alert category that reaches the analyst queue needs a playbook. The playbook defines:
- What the event means in OT context
- What questions to ask before escalating
- Who to contact for OT-specific expertise
- What constitutes a confirmed incident versus a false positive
- What the response action is
The most important playbook element is the escalation path to OT engineering. IT analysts should be able to identify and triage OT alerts, but confirmation of whether an alert represents malicious behavior versus legitimate OT activity requires an OT engineer in the loop. The IT analyst should never be the final decision-maker on an OT alert.
SOC model options
Three models exist for OT security operations. None is universally correct.
OT alerts flow into the enterprise SIEM alongside IT alerts. A single analyst team handles both, supported by OT-specific playbooks and an on-call OT engineering escalation path.
Best fit: organizations with a mature IT SOC, limited OT security budget, and OT environments that are already well-segmented and baselined.
Risk: IT analysts who are not trained on OT context will mishandle OT alerts regardless of playbook quality. This model requires sustained training investment and clear escalation discipline.
A separate SOC function handles OT alerts exclusively, with OT-trained analysts and OT-specific tooling. May share infrastructure with the IT SOC but operates with separate queues, separate playbooks, and separate escalation paths.
Best fit: large industrial operators with significant OT footprint, high-consequence environments (nuclear, utilities, critical manufacturing), and organizations under strict compliance frameworks requiring segregated evidence chains.
Risk: dedicated OT SOC teams are expensive and difficult to staff. The skills gap in OT security operations is real. Most organizations that attempt a dedicated OT SOC end up with a team of one or two people who are overwhelmed.
A managed detection and response provider with OT specialization handles OT alert monitoring and initial triage. Internal team handles escalations and response. Examples: Dragos Managed Services, Claroty MDR, Fortinet FortiGuard OT MDR.
Best fit: organizations that need OT SOC coverage but cannot staff it internally. Particularly useful during the transition period when OT monitoring is first deployed and alert volumes are high and unpredictable.
Risk: MDR providers vary significantly in OT depth. A provider with strong IT MDR capabilities but limited OT protocol knowledge produces the same context gap problem you were trying to solve. Require OT-specific references and ask about analyst training on industrial protocols before contracting.
Decision framework
| Factor | Converged SOC | Dedicated OT SOC | Co-managed MDR |
|---|---|---|---|
| Budget | Low | High | Medium |
| OT environment complexity | Low-medium | High | Any |
| Compliance requirements | Standard | Strict (NERC CIP High/Medium) | Any |
| Internal OT expertise | Available for escalation | Full team required | Minimal required |
| Time to operational | Fast | Slow | Fast |
Implementation sequence
Stand up OT visibility in a SIEM in this order. Skipping steps is how you end up with alert fatigue or a failed integration.
Before any monitoring is deployed, complete and classify your asset inventory. The translation layer and suppression rules cannot be built without knowing what is on the network and how critical each asset is. Use the asset scoping methodology as the starting point.
Deploy your OT monitoring platform in passive, detection-only mode. No forwarding to SIEM yet. Run for a minimum of four weeks to build the behavioral baseline.
From the baseline data, define the normal behavior patterns for your environment. Build suppression rules for definitively normal traffic. Document every rule with the justification — suppression rules that cannot be justified are suppression rules that may be hiding real events.
Configure context enrichment for events that will be forwarded. Map OT event types to the security taxonomy your analysts use. Test enrichment output with your analyst team before go-live.
Every OT alert category that will reach the analyst queue needs a playbook before the first alert arrives. An analyst who receives an unfamiliar alert type with no playbook will either ignore it or escalate inappropriately. Both outcomes damage the integration.
Enable forwarding for two or three high-confidence alert categories first — unauthorized firmware modifications, new devices on OT segments, connections outside maintenance windows. Validate that analysts are triaging correctly before expanding coverage.
Add alert categories one at a time. For each new category, validate the suppression rules, confirm the playbook, and review the first two weeks of alerts before treating the category as operational.
Monthly review of suppression rules, false positive rates by alert category, and playbook effectiveness. The integration is not a one-time deployment — it requires ongoing tuning.
Metrics that matter
Most OT SOC integrations are declared successful at go-live and never measured again. These are the metrics that tell you whether the integration is actually working.
What percentage of your OT assets are generating telemetry that reaches the SIEM? An integration that covers 40% of your OT environment is not providing OT visibility — it is providing a false sense of it.
Track false positives per alert category, not in aggregate. An aggregate false positive rate of 20% looks manageable. A false positive rate of 80% on your highest-volume alert category means analysts have learned to ignore that category entirely.
How long does it take an analyst to make a triage decision on an OT alert? A mean time above 30 minutes suggests the enrichment and playbooks are insufficient — analysts are spending time researching OT context they should have been given.
Of alerts escalated to OT engineering, what percentage required OT engineering involvement? A low rate means analysts are over-escalating. A high rate means analysts are under-escalating — OT alerts that need specialist review are being closed without it.
What percentage of alerts fire during approved maintenance windows? A high percentage suggests suppression rules are insufficient. Normal maintenance activity should not be generating analyst alerts.
For confirmed OT security incidents, how long between the first indicator in the OT monitoring data and analyst detection? This is the metric that tells you whether the integration is achieving its security purpose.
Enter your current metrics across the six dimensions above. The scorecard benchmarks each one against typical OT SOC integration performance and identifies the specific areas where your integration needs improvement.
Open the scorecard