Skip to content
K
EU AI Act: Not High Risk Q3

Training Effectiveness Agent

Measure L&D impact - beyond satisfaction scores.

Measures training impact across multiple levels: satisfaction, knowledge transfer, behavioural change, and business outcomes.

Analyse your process
Airbus Volkswagen Shell Renault Evonik Vattenfall Philips KPMG

Evaluation level per rules, feedback detection via AI, effectiveness evaluation

The agent sets evaluation level rule-based per training type (Kirkpatrick 1 to 4), extracts feedback and knowledge-transfer data via AI from surveys and assessments and correlates behavioural change with performance data - strategic learning decisions remain Human-in-the-Loop.

Outcome: According to ATD, trainings in 75 percent of companies are evaluated only at level 1 (satisfaction), although 10 to 20 percent of the training budget regularly flows into demonstrably ineffective formats.

0% Rules Engine
100% AI Agent
0% Human

The architecture turns training budget into an auditable investment with measurable learning return:

1,347 euros per head, only 10 to 20 percent reach the workplace

According to the IW Further Education Survey 2023, German companies invested an average of 1,347 euros per employee per year in further education (46 billion euros across the economy as a whole). At the same time, only 10 to 20 percent of what is learned actually reaches the workplace. Between these two numbers sits not a quality problem. There sits a measurement problem. Anyone who does not know what happens after the training cannot steer whether the investment is working.

The evaluation gap is well documented: 95 percent of L&D organisations cannot link their learning data to business outcomes. ROI measurement happens at only 5 to 8 percent of all training programmes. Between those two sit the levels that matter - knowledge transfer, behaviour change, business impact - and that is exactly where almost no one measures.

Why satisfaction says nothing about effectiveness

After every training, participants fill in a feedback form. Was the instructor good? Was the catering acceptable? Would you recommend the training? These so-called smiley sheets are nearly universal and nearly universally worthless for the question L&D actually has to answer: did this training change behaviour at work?

The Kirkpatrick model has described four evaluation levels since 1959: reaction, learning, behaviour, results. These levels are not theory. They are the industry standard. And yet three of the four levels remain unoccupied in most organisations.

Level         What is measured                Who typically measures it
────────────  ─────────────────────────────   ────────────────────────────
Reaction      Feedback forms after training    Almost all organisations
Learning      Tests, practical assignments     Few, mostly compliance
Behaviour     Behaviour change at work         Almost no one systematically
Results       Correlation with business KPIs   5-8 percent of programmes

The reason is not lack of interest. The reason is structural. Every level requires a different method, a different timing, and a different owner. Reaction can be captured on the day of the training. Knowledge transfer needs a defined interval - two to four weeks - and a test format. Behaviour change only shows after 60 to 90 days and needs the manager’s observation. Business impact requires linking training data with performance metrics that live in an entirely different system.

No L&D team with three learning specialists for 1,500 employees can manage this manually. So the smiley sheet remains.

The transfer problem is a timing problem

Hermann Ebbinghaus documented in the 1880s what every learning professional knows instinctively: within 24 hours, humans forget on average 70 percent of new information. After a week, 90 percent. The forgetting curve is not a weakness of individual learners. It is a neurological constant.

The consequence for training effectiveness is radical. When the first knowledge check happens three months after the training - because no one thinks of it earlier or lacks the capacity - it is not measuring knowledge transfer. It is measuring the forgetting curve. And when the behaviour observation never happens because the manager has no structured feedback format, the decisive question stays open: did the employee transfer what they learned into daily work or not?

The transfer problem is not an educational problem. It is an architectural problem. Who measures what, when, and how - these responsibilities are not defined in most organisations. That is why it does not happen.

Staged measurement as architecture

The Decision Layer solves this not through a better tool but through a different decision architecture. Each evaluation level is treated as an independent process step - with a defined time, a defined method, and a defined owner.

Day 0              Week 2-4           Month 2-3          Quarter+
Training ends      Knowledge check    Behaviour FB       Correlation
                                                         with KPIs
┌──────────┐     ┌──────────┐      ┌──────────┐      ┌──────────┐
│ Feedback │     │ Test or  │      │ Manager  │      │ Training │
│ collected│────▶│ practical│─────▶│ rates    │─────▶│ vs perf- │
│ automa-  │     │ assign-  │      │ behaviour│      │ ormance  │
│ tically  │     │ ment     │      │ change   │      │ compared │
└──────────┘     └──────────┘      └──────────┘      └──────────┘
     A                A                  H                 A

A = Agent / automation     H = Human decides

Reaction capture on the day of the training runs fully automatically. The agent aggregates feedback forms, identifies outliers, and produces a report. No human has to read free-text answers to see that an instructor is systematically rated poorly.

The knowledge check two to four weeks later follows a rule set: training type determines the test format. A compliance training has a multiple-choice test. A leadership development programme has a reflection assignment. A technical training has a practical exercise. The agent triggers the right check at the right time and analyses the results.

Behaviour observation after 60 to 90 days stays with humans. This is where the manager comes in - not with an elaborate assessment, but with three to five targeted questions: is the employee applying the new methods? In which situations? What has changed? The agent asks the questions at the right time, collects the answers, and attributes them to the training programme.

ROI calculation at the programme level is correlation analysis, not a causation claim. The agent links training participation with performance metrics - error rates, customer satisfaction, productivity development - and shows relationships. Whether a relationship is causal is interpreted by a human with context knowledge. Because if the error rate in the warehouse drops after a safety training, that can come from the training - or from the new shelving system introduced at the same time.

What this changes for budget conversations

When a Head of L&D stands in front of the executive team and has to defend next year’s budget, the data foundation determines the negotiating position. With smiley sheets, the conversation goes like this: participants were satisfied. We believe the trainings were useful. Please do not cut us.

With staged evaluation, it looks different: the leadership programme shows measurable behaviour change in 72 percent of participants after 90 days. The correlation between participation and customer satisfaction in sales is 0.4. The compliance training has 85 percent knowledge retention after four weeks. The SAP fundamentals training shows no measurable effect on processing times - we need a different format here.

This is no longer budget defence. This is an investment decision based on evidence.

The difference is not in the precision of the numbers. It is in the ability to distinguish between trainings that work and trainings that occupy people. Organisations that reach this maturity shift their budget not by broad distribution but by effectiveness. And they can be transparent with worker representatives about how evaluation data is used - in aggregate, at the programme level, without inferences about individual participants. (US: similar evidence demands are emerging under SEC human-capital disclosure rules, where L&D investment is increasingly expected to show workforce outcomes.)

Micro-Decision Table

Who decides in this agent?

5 decision steps, split by decider

0%(0/5)
Rules Engine
deterministic
100%(5/5)
AI Agent
model-based with confidence
0%(0/5)
Human
explicitly assigned
Human
Rules Engine
AI Agent
Each row is a decision. Expand to see the decision record and whether it can be challenged.
Collect reaction data Distribute and aggregate post-training satisfaction surveys AI Agent

Automated survey distribution and response collection

Decision Record

Model version and confidence score
Input data and classification result
Decision rationale (explainability)
Audit trail with full traceability

Challengeable: Yes - fully documented, reviewable by humans, objection via formal process.

Collect learning data Aggregate assessment results and certification outcomes AI Agent

Automated data collection from LMS and assessment systems

Decision Record

Model version and confidence score
Input data and classification result
Decision rationale (explainability)
Audit trail with full traceability

Challengeable: Yes - fully documented, reviewable by humans, objection via formal process.

Collect behaviour data Gather follow-up observations and manager feedback AI Agent

Automated survey and feedback collection at defined intervals

Decision Record

Model version and confidence score
Input data and classification result
Decision rationale (explainability)
Audit trail with full traceability

Challengeable: Yes - fully documented, reviewable by humans, objection via formal process.

Correlate with performance metrics Analyse relationship between training completion and outcomes AI Agent

Statistical correlation analysis controlling for confounding factors

Decision Record

Model version and confidence score
Input data and classification result
Decision rationale (explainability)
Audit trail with full traceability

Challengeable: Yes - fully documented, reviewable by humans, objection via formal process.

Generate effectiveness report Produce multi-level evaluation per programme AI Agent

Automated report generation with statistical summaries

Decision Record

Model version and confidence score
Input data and classification result
Decision rationale (explainability)
Audit trail with full traceability

Challengeable: Yes - fully documented, reviewable by humans, objection via formal process.

Decision Record and Right to Challenge

Every decision this agent makes or prepares is documented in a complete decision record. Affected employees can review, understand, and challenge every individual decision.

Which rule in which version was applied?
What data was the decision based on?
Who (human, rules engine, or AI) decided - and why?
How can the affected person file an objection?
How the Decision Layer enforces this architecturally →

Does this agent fit your process?

We analyse your specific HR process and show how this agent fits into your system landscape. 30 minutes, no preparation needed.

Analyse your process

Governance Notes

EU AI Act: Not High Risk
Not classified as high-risk under the EU AI Act - the agent evaluates programmes, not individuals. GDPR applies to individual-level training and performance data used in the analysis. Aggregation should be applied when programme-level rather than individual-level insight is the goal. Works council information rights may apply to the collection of behaviour change and performance data linked to training attendance.

Assessment

Agent Readiness 54-61%
Governance Complexity 31-38%
Economic Impact 44-51%
Lighthouse Effect 48-55%
Implementation Complexity 38-45%
Transaction Volume Quarterly

Prerequisites

  • Learning management system with completion and assessment data
  • Post-training survey infrastructure
  • Follow-up observation or feedback collection capability
  • Performance metrics accessible for correlation analysis
  • Multi-level evaluation framework definition
  • Statistical analysis capability for correlation and significance testing

Infrastructure Contribution

The Training Effectiveness Agent closes the L&D investment loop: needs analysis identifies gaps, learning path recommendations guide individual development, and effectiveness measurement validates that the investment produced results. This creates the evidence base for L&D budget decisions. Builds Decision Logging and Audit Trail used by the Decision Layer for traceability and challengeability of every decision.

What this assessment contains: 9 slides for your leadership team

Personalised with your numbers. Generated in 2 minutes directly in your browser. No upload, no login.

  1. 1

    Title slide - Process name, decision points, automation potential

  2. 2

    Executive summary - FTE freed, cost per transaction before/after, break-even date, cost of waiting

  3. 3

    Current state - Transaction volume, error costs, growth scenario with FTE comparison

  4. 4

    Solution architecture - Human - rules engine - AI agent with specific decision points

  5. 5

    Governance - EU AI Act, works council, audit trail - with traffic light status

  6. 6

    Risk analysis - 5 risks with likelihood, impact and mitigation

  7. 7

    Roadmap - 3-phase plan with concrete calendar dates and Go/No-Go

  8. 8

    Business case - 3-scenario comparison (do nothing/hire/automate) plus 3×3 sensitivity matrix

  9. 9

    Discussion proposal - Concrete next steps with timeline and responsibilities

Includes: 3-scenario comparison

Do nothing vs. new hire vs. automation - with your salary level, your error rate and your growth plan. The one slide your CFO wants to see first.

Show calculation methodology

Hourly rate: Annual salary (your input) × 1.3 employer burden ÷ 1,720 annual work hours

Savings: Transactions × 12 × automation rate × minutes/transaction × hourly rate × economic factor

Quality ROI: Error reduction × transactions × 12 × EUR 260/error (APQC Open Standards Benchmarking)

FTE: Saved hours ÷ 1,720 annual work hours

Break-Even: Benchmark investment ÷ monthly combined savings (efficiency + quality)

New hire: Annual salary × 1.3 + EUR 12,000 recruiting per FTE

All data stays in your browser. Nothing is transmitted to any server.

Training Effectiveness Agent

Initial assessment for your leadership team

A thorough initial assessment in 2 minutes - with your numbers, your risk profile and industry benchmarks. No vendor logo, no sales pitch.

30K120K
1%15%

All data stays in your browser. Nothing is transmitted.

Frequently Asked Questions

How does the agent measure 'behaviour change' after training?

Through a combination of follow-up surveys (asking participants and managers about application on the job), observable metric changes (where applicable), and longitudinal tracking. Behaviour measurement is imperfect - but even imperfect measurement is better than no measurement.

Can the agent prove causation between training and performance improvement?

The agent measures correlation, not causation. However, by controlling for confounding factors and comparing trained vs. untrained groups where possible, it provides the closest approximation to causal inference that is feasible in a workplace context.

What Happens Next?

1

30 minutes

Initial call

We analyse your process and identify the optimal starting point.

2

1 week

Discover

Mapping your decision logic. Rule sets documented, Decision Layer designed.

3

3-4 weeks

Build

Production agent in your infrastructure. Governance, audit trail, cert-ready from day 1.

4

12-18 months

Self-sufficient

Full access to source code, prompts and rule versions. No vendor lock-in.

Implement This Agent?

We assess your process landscape and show how this agent fits into your infrastructure.