Guide

IT/OT SOC Integration Playbook

Getting OT visibility into an enterprise SOC sounds straightforward until you try it. Raw industrial telemetry fed into an IT SIEM produces one of two failure modes: alert fatigue from normal engineering activity that looks like attacks to an analyst who has never seen a Modbus write command, or silence because someone filtered the OT data too aggressively to avoid the noise. Neither produces useful detections. This guide covers the full integration arc — translating OT protocol behavior into security-relevant context, building triage logic that distinguishes maintenance from malicious, choosing the right SOC model for your environment, and measuring whether the integration is actually working.

Contents

Why direct OT-to-SIEM integration fails
The jargon translation layer
Alert triage architecture
SOC model options
Implementation sequence
Metrics that matter

Why direct OT-to-SIEM integration fails

Most OT monitoring platforms can forward syslog or CEF events to a SIEM. The challenge is context.

An IT analyst looking at a SIEM alert sees an event type, a source, a destination, and a severity score. They apply learned pattern recognition: this looks like lateral movement, this looks like credential stuffing, this looks like a port scan. That pattern recognition was built on IT network behavior.

OT networks behave differently in ways that break IT pattern recognition at every level. A PLC that sends the same command to the same register every 100 milliseconds is not executing a brute force attack — it is running a control loop. A historian that polls 400 devices simultaneously is not conducting reconnaissance — it is doing its job. An engineering workstation that connects to 30 PLCs in sequence is not spreading malware — it is running a maintenance script.

Every one of those behaviors generates alerts in an IT SIEM with default rules. Every one of those alerts is a false positive that erodes analyst trust in the OT data feed. Within weeks, analysts learn to ignore OT alerts entirely — which means the integration has made your security posture worse, not better.

Tuning the SIEM won't help. The solution is to build a translation and filtering layer between OT monitoring and the SIEM that converts raw industrial telemetry into security-relevant context before it ever reaches an analyst queue.

The jargon translation layer

The translation layer maps OT protocol behavior to the security event taxonomy your analysts already understand. Without it, an analyst seeing a raw OT alert has no framework for deciding whether it matters.

Protocol anomaly to security alert mapping

OT event	Raw SIEM interpretation	Correct security framing
Modbus FC 05 — Force Single Coil	Unknown write command to field device	Unauthorized actuator command — potential physical impact
Modbus FC 08 — Diagnostics	Diagnostic query to PLC	Reconnaissance of device capabilities
DNP3 Unsolicited Response outside normal cycle	Unexpected outbound communication from RTU	Possible C2 callback or device compromise
EtherNet/IP — Unconnected Message to new device	Connection attempt to unregistered asset	Asset discovery or lateral movement attempt
Multiple failed auth attempts to HMI	Brute force	Brute force — same interpretation, but physical access impact
Engineering workstation connecting to PLC outside maintenance window	Lateral movement	Unauthorized configuration access — requires immediate triage
New device appearing on OT segment	Rogue device	Rogue device — same interpretation, elevated severity given OT context
Firmware download to field device	Software installation	Firmware modification — potential persistent implant

Building the baseline

The translation layer requires a behavioral baseline for your specific environment. You cannot distinguish a malicious Modbus write from a legitimate one without knowing what legitimate looks like in your environment.

Baseline collection requires a minimum of four weeks of passive monitoring before any alerting is configured. During that period:

Document every device, every communication pair, and every protocol in use
Map scheduled maintenance windows and the engineering activity that occurs during them
Identify every historian poll, every control loop, and every vendor connection
Note anomalies that appear during baseline — these are candidates for immediate investigation before alerting goes live

The baseline is the foundation of your suppression rules. Any event that matches a baselined behavior pattern is a candidate for suppression or reduced severity. Any event that deviates from the baseline is a candidate for escalation.

Alert triage architecture

The triage architecture defines what the SIEM receives, what analysts see, and what gets automatically suppressed or escalated. It has three layers.

Layer 1 Suppression at source

Events that are definitively normal OT behavior with no security relevance are suppressed at the OT monitoring platform before they are forwarded to the SIEM. These are not logged at reduced severity — they are not forwarded at all.

Suppression candidates:

Scheduled historian polls matching the baseline pattern
Control loop traffic on known good communication pairs
Vendor connections during approved maintenance windows
Routine diagnostic traffic from known engineering workstations to known assets

Suppression rules require regular review. A suppressed communication pair that changes behavior — new timing, new destination, new protocol — should trigger a rule exception and forward the anomaly.

Layer 2 Context enrichment before forwarding

Events that are forwarded to the SIEM carry enrichment that gives analysts the context to triage without OT expertise. At minimum, every forwarded OT event should include:

Field	Content
Asset criticality	High / Medium / Low — based on your asset classification
Physical impact potential	Yes / No — does compromise of this asset affect the physical process?
Baseline deviation	First seen / Rare / Common — how often does this event type occur?
Maintenance window status	In window / Out of window — is this during an approved maintenance period?
Owning team	OT engineering / IT security / Vendor — who should be notified?

An analyst who sees "EtherNet/IP connection to High criticality PLC, physical impact: Yes, out of maintenance window, first seen" has enough context to triage correctly without understanding EtherNet/IP.

Layer 3 Playbook-driven triage

Every OT alert category that reaches the analyst queue needs a playbook. The playbook defines:

What the event means in OT context
What questions to ask before escalating
Who to contact for OT-specific expertise
What constitutes a confirmed incident versus a false positive
What the response action is

The most important playbook element is the escalation path to OT engineering. IT analysts should be able to identify and triage OT alerts, but confirmation of whether an alert represents malicious behavior versus legitimate OT activity requires an OT engineer in the loop. The IT analyst should never be the final decision-maker on an OT alert.

SOC model options

Three models exist for OT security operations. None is universally correct.

Model 01

Converged IT/OT SOC

OT alerts flow into the enterprise SIEM alongside IT alerts. A single analyst team handles both, supported by OT-specific playbooks and an on-call OT engineering escalation path.

Best fit: organizations with a mature IT SOC, limited OT security budget, and OT environments that are already well-segmented and baselined.

Risk: IT analysts who are not trained on OT context will mishandle OT alerts regardless of playbook quality. This model requires sustained training investment and clear escalation discipline.

Model 02

Dedicated OT SOC

A separate SOC function handles OT alerts exclusively, with OT-trained analysts and OT-specific tooling. May share infrastructure with the IT SOC but operates with separate queues, separate playbooks, and separate escalation paths.

Best fit: large industrial operators with significant OT footprint, high-consequence environments (nuclear, utilities, critical manufacturing), and organizations under strict compliance frameworks requiring segregated evidence chains.

Risk: dedicated OT SOC teams are expensive and difficult to staff. The skills gap in OT security operations is real. Most organizations that attempt a dedicated OT SOC end up with a team of one or two people who are overwhelmed.

Model 03

Co-managed MDR

A managed detection and response provider with OT specialization handles OT alert monitoring and initial triage. Internal team handles escalations and response. Examples: Dragos Managed Services, Claroty MDR, Fortinet FortiGuard OT MDR.

Best fit: organizations that need OT SOC coverage but cannot staff it internally. Particularly useful during the transition period when OT monitoring is first deployed and alert volumes are high and unpredictable.

Risk: MDR providers vary significantly in OT depth. A provider with strong IT MDR capabilities but limited OT protocol knowledge produces the same context gap problem you were trying to solve. Require OT-specific references and ask about analyst training on industrial protocols before contracting.

Decision framework

Factor	Converged SOC	Dedicated OT SOC	Co-managed MDR
Budget	Low	High	Medium
OT environment complexity	Low-medium	High	Any
Compliance requirements	Standard	Strict (NERC CIP High/Medium)	Any
Internal OT expertise	Available for escalation	Full team required	Minimal required
Time to operational	Fast	Slow	Fast

Implementation sequence

Stand up OT visibility in a SIEM in this order. Skipping steps is how you end up with alert fatigue or a failed integration.

Step 01

Asset inventory and classification

Before any monitoring is deployed, complete and classify your asset inventory. The translation layer and suppression rules cannot be built without knowing what is on the network and how critical each asset is. Use the asset scoping methodology as the starting point.

Step 02

Deploy OT monitoring in passive mode

Deploy your OT monitoring platform in passive, detection-only mode. No forwarding to SIEM yet. Run for a minimum of four weeks to build the behavioral baseline.

Step 03

Build the baseline and suppression rules

From the baseline data, define the normal behavior patterns for your environment. Build suppression rules for definitively normal traffic. Document every rule with the justification — suppression rules that cannot be justified are suppression rules that may be hiding real events.

Step 04

Build the translation and enrichment layer

Configure context enrichment for events that will be forwarded. Map OT event types to the security taxonomy your analysts use. Test enrichment output with your analyst team before go-live.

Step 05

Deploy playbooks before enabling forwarding

Every OT alert category that will reach the analyst queue needs a playbook before the first alert arrives. An analyst who receives an unfamiliar alert type with no playbook will either ignore it or escalate inappropriately. Both outcomes damage the integration.

Step 06

Pilot forwarding with limited alert categories

Enable forwarding for two or three high-confidence alert categories first — unauthorized firmware modifications, new devices on OT segments, connections outside maintenance windows. Validate that analysts are triaging correctly before expanding coverage.

Step 07

Expand coverage incrementally

Add alert categories one at a time. For each new category, validate the suppression rules, confirm the playbook, and review the first two weeks of alerts before treating the category as operational.

Step 08

Establish review cadence

Monthly review of suppression rules, false positive rates by alert category, and playbook effectiveness. The integration is not a one-time deployment — it requires ongoing tuning.

Metrics that matter

Most OT SOC integrations are declared successful at go-live and never measured again. These are the metrics that tell you whether the integration is actually working.

Detection coverage

What percentage of your OT assets are generating telemetry that reaches the SIEM? An integration that covers 40% of your OT environment is not providing OT visibility — it is providing a false sense of it.

False positive rate by alert category

Track false positives per alert category, not in aggregate. An aggregate false positive rate of 20% looks manageable. A false positive rate of 80% on your highest-volume alert category means analysts have learned to ignore that category entirely.

Mean time to triage

How long does it take an analyst to make a triage decision on an OT alert? A mean time above 30 minutes suggests the enrichment and playbooks are insufficient — analysts are spending time researching OT context they should have been given.

Escalation accuracy

Of alerts escalated to OT engineering, what percentage required OT engineering involvement? A low rate means analysts are over-escalating. A high rate means analysts are under-escalating — OT alerts that need specialist review are being closed without it.

Maintenance window alignment

What percentage of alerts fire during approved maintenance windows? A high percentage suggests suppression rules are insufficient. Normal maintenance activity should not be generating analyst alerts.

Mean time to detect (OT-specific)

For confirmed OT security incidents, how long between the first indicator in the OT monitoring data and analyst detection? This is the metric that tells you whether the integration is achieving its security purpose.

Companion tool

OT SOC Integration Health Scorecard

Enter your current metrics across the six dimensions above. The scorecard benchmarks each one against typical OT SOC integration performance and identifies the specific areas where your integration needs improvement.

Open the scorecard