Cybersecurity

Automotive Penetration Testing: A Methodology for ECUs, Gateways, and V2X

By Shreyansh, Founder & CTO, Agnile Technologies·April 30, 2026·12 min read

Key Takeaways

TL;DR — A modern automotive Penetration Testing engagement runs through six explicit phases — scoping, target analysis, interface exercise, exploitation, post-exploitation, and reporting with remediation verification — across four attack surfaces: physical, wireless, backend and cloud, and supply chain. Done well, it produces evidence that maps directly into ISO/SAE 21434 Clause 11 (verification) and feeds Clause 8 (continual cybersecurity and vulnerability management). The 2015 Jeep Cherokee, 2019–2020 keyless relay attacks on premium EVs, the 2020 Volvo XC60 CAN-injection disclosure, the 2022 Hyundai/Kia immobiliser-absence cases, and the 2022 Honda rolling-code vulnerability remain the canonical reference points the methodology must be able to reproduce.

1.A defensible pentest splits into six explicit phases — scoping is where most engagements succeed or fail, because it determines whether findings will be admissible as ISO/SAE 21434 Clause 11 evidence.
2.The four attack surfaces — physical (OBD-II, JTAG, CAN/LIN/FlexRay/Ethernet), wireless (BLE, Wi-Fi, NFC, UWB, V2X, cellular, GNSS), backend and cloud, and supply chain — should each have a documented test plan and a tool mapping.
3.The 2015 Jeep Cherokee remote-to-CAN attack remains the canonical illustration that a single internet-exposed component can pivot to safety-critical buses if isolation is not enforced architecturally.
4.Severity scoring should adapt CVSS v3.1 or v4.0 with automotive context — asset CIA on the vehicle network, safety impact, and CAL or ASIL relevance — rather than using IT CVSS unchanged.
5.UDS (ISO 14229) diagnostic abuse, secure-boot bypass, and HSM key extraction continue to be the highest-yield ECU-level attack classes in published research.
6.V2X testing must verify IEEE 1609.2 certificate validation, message replay protection, pseudonym handling, and misbehaviour detection — not just signal injection on the radio link.
7.Pentest findings are evidence for ISO/SAE 21434 Clause 11 (verification) and are the principal input to Clause 8 (continual cybersecurity activities, including vulnerability management).

At a Glance

One-Sentence Answer: Automotive penetration testing should be scoped from architecture, interfaces, trust boundaries, assets, and safety constraints — not treated as generic IT testing.
Who This Is For: Vehicle cybersecurity teams, penetration testers, ECU teams, suppliers, validation teams, and product security engineers.
Last Reviewed: May 2026
Primary References: ISO/SAE 21434 verification activities, automotive security testing, embedded system testing, vehicle interfaces.
Practical Use: Use this guide to plan security testing that is safe, scoped, evidence-driven, and relevant to real vehicle architectures.

Automotive Penetration Testing is no longer optional. UNECE R155 and ISO/SAE 21434 require verification of cybersecurity-critical items, and a structured pentest is the most common technique programmes use to discharge that obligation. But the discipline is genuinely different from IT pentesting: targets have wheels, consequences are physical, attack surfaces include radios that never appear in an enterprise threat model, and the tester cannot simply pivot to the next box and try again. This post describes the six-phase methodology our team uses, the four attack surfaces a credible engagement must cover, and how the evidence produced lines up with the standards that Type Approval authorities will eventually look at.

Why automotive Penetration Testing is different from IT pentest

IT Penetration Testing assumes a network-reachable target, software that can be patched in days, and consequences that are measured in dollars. Automotive Penetration Testing assumes physical access to the target, software that may not be patched for months, and consequences that include loss of vehicle control on a public road. Three differences shape the engagement:

Lifetime. Vehicles are in service for fifteen to twenty years. A vulnerability in a 2026 ECU may still be exploitable in 2041. Findings must be documented with that horizon in mind.
Physical access is realistic.Threat models that exclude the OBD-II port, the door handle, or the charging inlet do not survive contact with reality. The 2022 Hyundai and Kia “TikTok challenge” cases were a reminder that immobiliser absence and physical-key attack chains still matter.
Safety couples to security. A successful exploit can violate Safety Goals defined under ISO 26262. Reporting must reach the safety team, not just the security team, and Risk Treatment decisions must consider both.

The six-phase methodology

Every credible automotive Penetration Testing engagement runs through the same six phases. Skipping any one of them is the fastest way to produce findings that the OEM's product cybersecurity team will reject.

1. Scoping

Scoping defines the item under test using the same item definition the TARA used (ISO/SAE 21434 Clause 15 alignment), identifies the cybersecurity goals and CAL of each asset, records the test boundary (component, ECU, gateway, vehicle), fixes the access level (white-box, grey-box, black-box), and names the rules of engagement. Findings produced outside an agreed scope are not Clause 11 evidence.

2. Target analysis

Architecture review, data-flow diagrams, attack-surface enumeration, and a working hypothesis of the most attractive attack paths. This phase also confirms that the inputs from Threat Analysis and Risk Assessment are still valid; if the architecture has drifted since TARA, the test plan adjusts.

3. Interface exercise

Every documented interface is exercised against its specification — UDS services, CAN message authentication, BLE pairing, OTA update endpoints. The objective is not yet exploitation; it is to confirm that the implemented behaviour matches the specified behaviour and to record the deltas.

4. Exploitation

Where deltas or weaknesses are found, the team attempts to turn them into a working exploit. This is where fuzzing, glitching, chip-off firmware extraction, key-recovery research, and protocol abuse happen. Each successful exploit is reproduced on a clean target so the writeup is repeatable.

5. Post-exploitation

Lateral movement, persistence, and the safety-impact chain. The question is not “can we run code on this ECU?” but “from this ECU, what else can we reach, what can we make the vehicle do, and how durable is our access across reboots and OTAs?” The 2015 Jeep Cherokee attack is the canonical illustration of post-exploitation: the initial foothold was on the head unit; the consequence was control of the CAN buses that drive the vehicle.

6. Reporting and remediation verification

A defensible report contains: an executive summary suitable for programme leadership, a per-finding writeup with reproduction steps and CVSS-adapted scoring, a mapping of findings to ISO 21434 Clause 11 verification objectives, and a remediation tracker that re-tests each fix. Engagements that stop at “we delivered the report” produce findings that never close. Remediation verification is part of the scope.

Test scope pyramid: from component to vehicle

Engagements scale up. A typical roadmap moves from component tests on the bench through ECU and gateway integration up to the assembled vehicle. The pyramid below is the planning artefact we use to decide how much of each layer to budget for.

Test scope pyramid — component up through vehicle. Each layer inherits findings from the layers below; engagements that start at the top without component-level evidence rarely survive Type Approval review.

Attack-surface taxonomy

The four surfaces below cover everything a credible engagement must touch. Each requires a different lab discipline; each maps to different Threat Scenarios in the TARA.

Physical

OBD-II, JTAG/SWD, UART, SPI, I2C, CAN, CAN-FD, LIN, FlexRay, and Automotive Ethernet (100BASE-T1, 1000BASE-T1). Physical testing includes electrical interface fuzzing, secure-debug bypass attempts, fault injection (voltage, clock, EM), and decapsulation/chip-off when justified.

Wireless

BLE pairing and GATT abuse, Wi-Fi infrastructure attacks, NFC tag cloning, UWB ranging manipulation, V2X (DSRC and C-V2X), cellular (LTE, 5G), and GNSS spoofing. The 2019–2020 keyless relay attacks on premium EVs and the 2022 Honda rolling-code vulnerability are the public reference points; relay attacks on passive entry remain a high-yield class.

Backend and cloud

Telematics backend APIs, OTA infrastructure, mobile companion apps, and the identity systems behind them. The 2015 Jeep Cherokee disclosure and the 2015 BMW ConnectedDrive disclosure both began on the backend; treating the cloud surface as out-of-scope is a common engagement defect.

Supply chain

Firmware dependencies, third-party libraries, and the SBOM that describes them. Supply-chain testing is largely document-and- build review plus targeted analysis of high-risk components, but it is the surface most often skipped and the one ISO 21434 Clause 15 increasingly expects evidence on.

ECU-level testing

At the ECU level the highest-yield attack classes in published research remain: firmware extraction (chip-off, BootROM exploits, debug-port abuse), Secure Boot bypass (rollback to a vulnerable signed image, signature-verification logic flaws, fault injection through the verification routine), cryptography implementation review (constant-time properties, key handling across the AUTOSAR Crypto Stack), UDS abuse (unauthenticated SecurityAccess, RoutineControl services that expose privileged operations, RequestDownload sequences that accept unsigned blobs), and MPU/TrustZone analysis to verify isolation between trusted and untrusted execution contexts.

Where the ECU integrates a Hardware Security Module, a separate line of testing exercises the host-to-HSM interface for command injection, key-handle confusion, and side-channel leakage. See the companion post on HSM integration for automotive ECUs for the architectural baseline this testing assumes.

Gateway and central-compute testing

The gateway is the unit of cross-domain isolation. Testing focuses on CAN routing rules (which IDs are allowed to traverse which buses), security-gateway tunnelling (whether a permitted channel can carry forbidden traffic), firewall rules on the Ethernet domain, and the integrity of the diagnostic routing table. The 2020 Volvo XC60 disclosure showed how a permitted path through a gateway component can carry unintended payloads if the parser does not validate the inner content.

V2X testing

V2X is more than radio fuzzing. The protocol-level checks that matter are IEEE 1609.2 certificate validation (chain construction, expiry, revocation), message replay protection (sequence numbers and timestamp tolerances), pseudonym certificate rotation behaviour, misbehaviour detection and report generation, and the host stack's response when the radio delivers malformed certificates at line rate. Position spoofing via GNSS is the second axis and is exercised against the navigation stack rather than the V2X radio.

Wireless testing patterns

Three patterns recur across engagements: relay attacks against passive-entry/passive-start (PEPS) systems, where the attacker extends the apparent range of the key fob; rolling-code replay, where weaknesses in the synchronisation window allow re-use of captured codes; and BLE pairing downgrades, where companion apps fall back to insecure modes that disclose the link key. Each pattern has a published reference incident and each should be in the test plan whenever the relevant radio is present.

Tool categories by surface

The matrix below is the planning checklist we use to confirm the lab is provisioned before the engagement starts. Specific brands are not the point — the right category, validated against the silicon and the bus in scope, is the point.

Tool category	Physical	Wireless	Backend	Supply chain
CAN/CAN-FD/LIN/FlexRay analysers	●	—	—	—
Protocol/UDS fuzzing rigs	●	●	●	—
JTAG/SWD debuggers, logic analysers	●	—	—	—
Hardware analysis (chip-off, decap, fault injection)	●	—	—	—
SDR (sub-GHz, 2.4 GHz, V2X bands)	—	●	—	—
BLE/Wi-Fi/NFC/UWB toolchains	—	●	—	—
OTA / cloud / mobile-app pentest stack	—	—	●	—
SBOM/SCA, binary diffing, build reproducibility	—	—	●	●

Tool-category matrix by attack surface. Categories listed neutrally — the lab inventory is selected against the silicon and protocols in scope, not the brand.

Reporting: severity, mapping, and fixability

Severity scoring on automotive findings is where IT pentest templates fail loudest. Default CVSS v3.1 base scoring uses network attack vector and confidentiality weighting that do not describe a CAN-bus injection well. Two adjustments make the score useful: first, calibrate Attack Vector and User Interaction against the realistic access required (physical versus adjacent network versus internet); second, add a safety-impact dimension and a CAL or ASIL relevance modifier so the result aligns with the OEM's product cybersecurity risk register. CVSS v4.0 (FIRST, Nov 2023) handles some of this natively with the Safety supplemental metrics; engagements that have moved to v4.0 carry less custom-mapping baggage.

Each finding writeup should answer five questions: what was tested, what was found, how it was reproduced, what the safety and privacy impact is, and how the team verified the fix. The fifth question is the one that distinguishes a defensible report from a one-shot deliverable.

Mapping pentest results to ISO/SAE 21434

The pentest report is Clause 11 (verification) evidence. The individual findings are Clause 8 (continual cybersecurity activities, vulnerability management) inputs. The remediation tracker is the bridge: each finding moves through the vulnerability management workflow, accumulates a fix release, and closes only after re-test verification. Programmes that treat pentest output as a static deliverable accumulate aged findings; programmes that treat it as a feed into the vulnerability management pipeline close them. See our companion guide on the 42 ISO/SAE 21434 Work Products and the ISO/SAE 21434 implementation guide for the full work-product map.

Public case studies the methodology must reproduce

Any methodology worth using must be able to find the classes of defect that the public record contains. The reference list our engagements rehearse against:

2015 Jeep Cherokee (Miller and Valasek): remote-to-CAN pivot via the head unit; the canonical illustration of post-exploitation lateral movement on an under-segmented vehicle network.
2016–2017 Tesla Model S/X (Tencent Keen Security Lab): chained browser, kernel, and gateway vulnerabilities to reach the CAN bus from the head unit; worked example of chained ECU-to-gateway-to-CAN exploitation.
2019–2020 Tesla Model 3 keyless relay (NCC Group): BLE relay against passive entry; the reference case for PEPS-relay testing.
2020 Volvo XC60 (Pen Test Partners): CAN injection through a permitted path; the reference case for gateway parser and routing-rule testing.
2022 Hyundai/Kia (NHTSA recall 23V-021): immobiliser absence on certain trims, weaponised socially via the “TikTok challenge”; reminder that physical-key threat models are not legacy.
2022 Honda rolling code: rolling-code synchronisation weakness allowing replay; reference case for rolling-code analysis.
2015 BMW ConnectedDrive and 2015 FCA: backend and cellular-channel disclosures that re-frame cloud as a vehicle attack surface.
2020–2022 Daimler key-fob disclosures: a running reminder that wireless entry remains an attractive target.
Tesla Autopilot adversarial-example research: worked examples of perception-stack manipulation, relevant to ADAS pentest scopes.

An engagement that cannot reproduce the relevant subset of these findings on its own equipment is not yet ready to certify evidence to Type Approval auditors.

Take-aways

Automotive Penetration Testing earns its place in the cybersecurity lifecycle when the methodology is explicit, the scope is anchored in a current TARA, and the output feeds the vulnerability management process rather than ending in a PDF. For deeper context on the threat-modelling step that frames the test plan, see STRIDE Threat Modeling for automotive ECUs; for the TARA process pentest evidence verifies, see What is TARA in automotive cybersecurity.

Agnile Technologies runs structured automotive Penetration Testing engagements for global OEMs and Tier-1 suppliers against ECUs, gateways, V2X stacks, and full vehicles. Reports map findings to ISO/SAE 21434 Clause 11 evidence and to the vulnerability management process under Clause 8.

Frequently Asked Questions

Is automotive Penetration Testing required by ISO/SAE 21434?

ISO/SAE 21434 Clause 11 requires verification of cybersecurity-critical items, and Penetration Testing is the most common technique used to satisfy that requirement for items rated CAL 2 and above. The standard does not name pentest by trade name, but the work products produced by a structured pentest — verification specifications, test results, anomaly reports — are exactly what Clause 11 expects to see.

What is the difference between an ECU pentest and a vehicle-level pentest?

Scope. An ECU pentest exercises one component on the bench across every reachable surface — physical interfaces, firmware, and the protocols the ECU implements. A vehicle-level pentest treats the assembled vehicle as the target, which adds gateways, multi-bus lateral movement, wireless attack chains, and backend interaction. Both are valid; vehicle-level work usually depends on already-cleared ECU pentests of the cybersecurity-critical components.

How long does an ECU pentest typically take?

Four to ten weeks for a single ECU is the common range, depending on the access provided. White-box engagements with firmware, schematics, and source can finish at the lower end. Black-box engagements where the team must extract firmware and reverse-engineer protocols sit at the upper end. Vehicle-level engagements with multiple ECUs and wireless surfaces routinely run twelve weeks or more.

What tools are most important for an automotive pentest?

Categories matter more than brand names. A capable lab needs CAN/CAN-FD/LIN/FlexRay analysers, JTAG/SWD debug probes for the silicon families in scope, software-defined radios for sub-GHz and 2.4 GHz protocols, logic analysers, fuzzers (CAN, UDS, HTTP, BLE), chip-off and decapsulation rigs for advanced firmware extraction, and OTA simulators for backend-side testing.

Do you need source code to run a pentest?

Helpful for white-box work, not strictly required. Most automotive engagements run grey-box: the team gets schematics, JTAG access, and partial documentation but not source. Black-box engagements take longer and find fewer logic flaws but better simulate the resources of a determined external attacker.

How are pentest findings rated?

Use CVSS v3.1 or v4.0 adapted with automotive context. The default IT-centric base score under-weights physical-access requirements and over-weights confidentiality on a vehicle network. The adapted score combines the standard CVSS metrics with safety impact, asset CIA on the vehicle network, and a CAL or ASIL relevance modifier so that the result lines up with how the OEM's product team triages defects.

Where do pentest findings go after the engagement closes?

Into the cybersecurity Vulnerability Management process under ISO/SAE 21434 Clause 8. Each finding becomes a tracked defect with traceable remediation, an owner, a target fix release, and a re-test verification step. The pentest report itself is retained as Clause 11 verification evidence; the post-fix re-test memo closes the loop and is filed alongside the original report.

Need Help Applying This to a Real Programme?

Agnile supports engineering teams from architecture and requirements through implementation, validation, release, and evidence preparation.