From Pass/Fail to Defensible Evidence: What the EU Cyber Resilience Act Means for Your Test Workflows

§1. Five Questions CRA Asks That Your Test Evidence Probably Cannot Answer

Every embedded product team with a hardware-in-the-loop (HIL) rig today can tell you whether their last test passed or failed. The green checkmark appears, the log file is timestamped, and the team moves on to the next sprint. The Cyber Resilience Act makes that workflow obsolete.

The CRA raises the evidence bar from “we have a pass/fail report” to “we can trace every run, every version, and every toolchain component.” This is not a subtle shift. It is a fundamental change in what counts as admissible evidence during a market-surveillance audit. A regulator will not ask whether your product works. They will ask: Which version of the firmware was under test? Which compiler flags built that binary? Which test bench configuration exercised those inputs? Which operator ran the sequence? And where is the audit trail that connects all of them?

Roughly 90% of products under the CRA will self-declare under Module A – no third-party certification is required. This sounds like a relief to most engineering teams. It should not be. Self-declaration means you, the manufacturer, must assemble a technical documentation package that satisfies regulatory scrutiny without a third party holding your hand through the process.

Here is the paradox that catches most teams: there is no mandated SBOM format and no mandated set of test coverage metrics in the regulation. The regulation is principles-based, not prescriptive. That gives flexibility – but flexibility in regulation means ambiguity in compliance. Without a checklist to follow, engineering teams must translate legal language into engineering practice on their own. Most are discovering that their existing evidence pipelines answer none of the five questions a regulator will ask first.

Pick one embedded test workflow and try to answer these five questions:

1. Which firmware version was tested on which run? If the answer is “the engineer knows” or “it’s in a Slack message” – that is a gap. The CRA expects version identity to survive beyond the person who ran the test.

2. Which tool versions were used in each run? LabVIEW, TestStand, Vector CANoe, Python scripts – all have versions. In our experience, most teams we work with cannot answer this question for a single run, let alone across a regression suite. If the tool versions are not logged per run, a reviewer cannot confirm the test environment was correct.

3. Can you generate an SBOM from the test evidence you already have? If the answer is no, the regulatory deadline is closer than the toolchain migration timeline. The SBOM requirement (per Article 3(36) and Annex I Part II of the CRA) applies to the software in the product, but the test system itself is part of the secure development evidence chain. An auditor will want to see that the test environment is reproducible from version records.

4. Can a reviewer reconstruct the run six months later? Evidence that only makes sense to the person who ran the test is not defensible evidence. In our experience, most teams we work with can answer zero of these questions without manual effort – and manual effort does not scale to a regulatory review.

5. Can you prove which artifacts belonged to the same test run? A pass/fail report lives in one system. Bus traffic logs live in another. Temperature readings live in a third. Firmware build artifacts live in a fourth. If nothing ties them together, they are separate facts, not a chain of evidence. The CRA expects you to show that the test was coherent – that the firmware, the environment, the stimulus, and the result all came from the same execution.

If any of these questions made you pause, keep reading. The gap is not in your tools. It is in what connects them.

For source context, keep the European Commission’s Cyber Resilience Act overview open alongside this note. For SBOM format context, see ECMA-424 for CycloneDX and the SPDX project, which describes SPDX as ISO/IEC 5962:2021. These links are reference anchors, not legal advice or a compliance claim.

§2. Safety Does Not Cover Security

A common reaction we hear is: “We already have ISO 26262 certification. Our test process is safe. Surely that covers us?”

It does not.

Functional safety standards like ISO 26262 (automotive) or IEC 61508 (industrial) focus on systematic and random hardware failures. They ask: did the system behave correctly under known fault conditions? The CRA asks a different question: can you show that the software in your product is free of known exploitable vulnerabilities, and that your test evidence can survive independent scrutiny?

Safety compliance proves the system does not fail dangerously. It does not prove that your test evidence is traceable, versioned, and defensible after the engineer who ran it has left the company.

In our experience, the labs that are most confident about their safety-certified processes are often the ones with the biggest evidence gaps. They have thick binders for hazard analysis and risk assessment. They have thin – or nonexistent – records of which tool version ran which test step on which day.

A device that passes functional-safety testing may still fail CRA evidence requirements because security demands traceability of every software component, not just hazard-related functions. Under ISO 26262, your traceability effort is proportional to the Automotive Safety Integrity Level (ASIL) of each function. A non-safety-critical audio codec receives minimal traceability investment. Under the CRA, that same codec is an attack surface. You must trace its version, its provenance, and its test coverage.

The tooling ecosystem has not caught up. Vendors like NI, dSPACE, and LDRA currently have no publicly available CRA guidance pages – testing teams cannot rely on tool vendors to solve this for them. These are excellent tools for what they were designed to do. But none of them was designed to produce a CRA technical documentation package.

Engineering teams must therefore treat CRA evidence as a separate architectural concern – not an extension of their safety process, not a feature request to their tool vendor, but a first-class engineering requirement with its own data model, its own pipeline, and its own verification criteria.

§3. The Gap Is Not in Your Tools – It’s Between Them

If the problem is not your safety process and not any single tool, where is it? The answer is uncomfortable for teams that have invested heavily in test infrastructure: the gap is in the seams.

The gap is not in your tools – it’s between them. No single tool in the chain is broken, but the links that connect test execution, version control, and artifact storage rarely meet CRA traceability standards.

In our experience, most labs already have the right tools. They do not have the evidence chain that connects setup, versions, run data, and review across them.

Consider a concrete example. Imagine a HIL regression for an automotive ECU. The test sequence runs in TestStand 2024, orchestrating a series of stimulus-response steps. Vector CANoe 16 captures CAN bus traffic throughout the run. Separately, a Python 3.11 script logs temperature readings from a chamber controller via serial. The firmware under test was built with GCC 12.2 from a CI pipeline that tags the binary with a commit hash.

The test passes. All four tools produced output.

Now prove those four artifacts belonged to the same run.

A pass/fail log from LabVIEW or TestStand proves only that a test ran – not which version of the firmware, which compiler flags, or which test bench configuration was in use. The log file might contain a timestamp. It rarely contains a firmware build ID, a test bench serial number, a compiler version string, or an operator identifier.

TestStand log says the sequence completed at 14:32:11. But does it record which version of the TestStand sequence file ran?
CANoe trace file covers 14:30:00 to 14:35:00. But does its metadata include the DBC version, the CANoe config version, or a run ID that links back to TestStand?
Python temperature log records “14:30–14:35, 23.4°C mean.” But does it capture which script commit was used, or what serial port configuration was active?
GCC 12.2 binary has a commit hash. But does the test evidence link that hash to the pass/fail result?

In practice, the answer to every question is no. The four artifacts exist in four independent silos. When an auditor or a reviewer asks “was this run valid?”, someone must manually assemble the chain – assuming anyone still remembers where the pieces are.

This is the handoff problem. Every tool does its job. Nobody owns the boundary between them.

§4. The Public Timeline Is Shorter Than It Sounds

The most dangerous sentence an engineering leader can hear in 2026 is: “We have until 2027, let’s focus on shipping.”

The public CRA timeline matters: reporting obligations apply from 11 September 2026, and the main obligations apply from 11 December 2027. By the time you read this, the reporting date may already be close, and the full obligations are near enough that evidence architecture decisions need to start now.

The gap between reporting obligations and full obligations creates a dangerous planning trap: teams delay architecture changes, ignoring that the vulnerability reporting pipeline needs evidence about versions, provenance, and tested artifacts before the broader compliance package is due. You cannot report a vulnerability you cannot trace. You cannot trace a vulnerability you cannot identify. And you cannot identify a vulnerability in a component whose version and provenance you do not know.

Penalties for non-compliance can reach €15 million or 2.5% of annual worldwide turnover – whichever is higher. For a mid-market industrial automation company with €500M revenue, that is a €12.5M penalty for a single product line that has not yet implemented basic software-component traceability.

The market is already signalling that existing workflows are insufficient. Organisations such as TÜV SÜD and Vector now offer CRA gap analysis and consulting services – an indicator that existing workflows are unlikely to pass without targeted investment. When the certification bodies themselves are building practices around a gap they see in the field, the gap is real.

Consider what needs to happen in the available time:

Map at least one representative workflow to understand where evidence currently breaks.
Identify the handoffs that lose version, identity, or context.
Decide which gaps to fix with process changes and which need tool support.
Implement the changes across the workflow.
Test that the new evidence chain actually survives a simulated review.

Most test engineering teams are already running at capacity. Release cycles do not pause for compliance upgrades. The available window is not a leisurely timeline – it is the minimum lead time to retrofit evidence traceability into workflows that were never designed for it.

Our diagnostic kit helps you map one workflow in 30–45 minutes. It is free. It does not ask you to share confidential data. It gives you a gap map, not a shopping list.

§5. Map One Workflow – Before You Buy Anything

A common instinct when facing regulatory pressure is to go shopping: a new ALM system, a new test management platform, a new requirements tool. These purchases are expensive, disruptive, and often premature.

Before buying any tool, map one end-to-end workflow and ask: can I prove which version of every component was in that test? The reason is that you do not yet know where your evidence actually breaks. The gap may be in a handoff between two tools you already own. It may be a missing version tag in a CI pipeline. It may be a review process that accepts a Slack screenshot as proof of a test run.

Here is a five-step process:

Pick one test workflow that your team runs regularly – ideally one that creates recurring confusion during release reviews.
List every tool that touches it – sequence engine, bus monitor, data logger, report generator, version control system.
For each tool, note what version information it captures and what it passes to the next step.
Identify the handoffs where context is lost – the manual setup step, the unlabelled log file, the script commit that nobody recorded.
Decide which gaps are worth fixing and in what order.

The mapping exercise reveals something important: products that use self-declaration (Module A) still need a technical documentation package that traces every OSS component, every compiler flag, and every test result back to a specific firmware version. There is no “light” version of CRA compliance for self-declared products. The evidence bar is the same – the difference is only whether a third party audits it before market placement or after.

A common source of confusion is the open-source exemption. Non-commercial open-source software distributed outside the course of a commercial activity is exempt – but if your product embeds open-source, the exemption does not apply to you as the product manufacturer. If you ship an embedded Linux device with a GPL-licensed kernel, a dozen MIT-licensed libraries, and a BSD-licensed networking stack, every one of those components must be traced, versioned, and included in your vulnerability-reporting pipeline. The exemption protects the open-source developer. It does not protect the commercial product that incorporates their work.

Similarly, domain-specific exemptions are narrower than most teams assume. The CRA explicitly excludes medical devices (MDR), aviation, and automotive type-approved products – but if your test system spans multiple regulatory domains, the CRA still covers the non-exempt portion. A HIL rig that tests both an ISO 26262 safety function and a non-safety connectivity stack falls under the CRA for the connectivity stack.

In our experience, the output of this mapping exercise is almost always the same: the evidence chain breaks at handoffs, not inside tools. The fix is rarely a tool replacement.

Our diagnostic kit gives you templates for exactly this exercise: a workflow map, a toolchain handoff map, and an evidence gap scorecard. It takes 30–45 minutes. No vendor involvement. No data shared. Just a map of where your evidence stands today.

If after mapping one workflow you still see a clear need for a new tool, you will know exactly what to buy and why. That decision is far more defensible – to your manager, your procurement team, and eventually to an auditor – than a purchase made from fear of a deadline.

The Bottom Line

The CRA raises the bar from “we passed the test” to “we can prove the test was valid, reproducible, and complete.” The gap is not in your tools – it is in the connections between them. The deadline is closer than it feels. Start with one workflow, map it, and decide from evidence, not vendor pressure.

If you want structured help, the diagnostic kit is free and takes under an hour. If you want a deeper dive, our workflow review is a paid, 30-minute fit assessment. Either way, the first step is the same: look at what your evidence actually looks like today, not what you assume it looks like.

CRA Evidence Self-Check

#	Question	✅ Yes	🔶 Partial	❌ No	🚩 What’s Missing
1	Which firmware version was tested on which run?	Version is auto-logged per test run	Version is in a commit message, not per run	Version is in Slack / engineer’s head	Per-run firmware version + audit trail
2	Which tool versions were used in each run?	Every tool logs its version to the evidence pack	Versions recorded in a setup document, not per run	Versions change without tracking	Tool version snapshots per run
3	Can you generate an SBOM from existing test evidence?	SBOM is produced automatically from the test run	SBOM exists separately (e.g. from CI build) not linked to test data	No SBOM at all	Automated SBOM generation tied to test execution
4	Can a reviewer reconstruct the run six months later?	Evidence pack contains setup, versions, run data, pass/fail, and sign-off	Review possible with help from original engineer	Only the original engineer can explain it	Self-documenting evidence chain
5	Is vulnerability tracking linked to tested versions?	Vulnerability scan results are timestamped and linked to firmware version	VEX/CVE data exists but not correlated with test runs	No vulnerability tracking connected to test evidence	Correlation between vulnerability state and test evidence

CRA Phased Enforcement Timeline

2023 – CRA proposal published by EU Commission
2024-12-10 – CRA enters into force. Manufacturers begin gap analysis.
2026-09-11 – Reporting obligations apply.
2027-12-11 – Main obligations apply.

Workflow Evidence Chain – HIL Test with Handoff Gaps

Readable workflow-level diagram of one HIL evidence chain showing build context, station setup, execution, review, and four CRA evidence handoff gaps. — Figure: A readable workflow-level view of one HIL evidence chain. The diagnostic target is the handoff context, not a replacement for TestStand, CANoe, Python scripts, or any existing tool.

June 14, 2026

Marcin June 14, 2026

Understand how the EU Cyber Resilience Act affects embedded test workflow evidence requirements.

From Pass/Fail to Defensible Evidence: What the EU Cyber Resilience Act Means for Your Test Workflows

§1. Five Questions CRA Asks That Your Test Evidence Probably Cannot Answer

§2. Safety Does Not Cover Security

§3. The Gap Is Not in Your Tools – It’s Between Them

§4. The Public Timeline Is Shorter Than It Sounds

§5. Map One Workflow – Before You Buy Anything

The Bottom Line

CRA Evidence Self-Check

CRA Phased Enforcement Timeline

Workflow Evidence Chain – HIL Test with Handoff Gaps

Continue the evidence trail

The difference between test logs and trusted evidence

Your export pipeline is lying to you. Not about the result – about the evidence.

Why HIL test repeatability is a workflow problem

Map this in one workflow.