Open-Source Next Gen Endpoint Detection & Response

Ahmed Khan

Genos v1.0 · IEEE AIIoT 2026 · Research Whitepaper

Abstract

Existing security solutions are expensive and often utilize kernel-level agents which, by design, introduce a high attack surface and are resource-intensive, rendering them impractical for low-overhead or constrained environments. This paper proposes and validates a novel open-source, kernel-driverless (Ring 3) intrusion detection and mitigation agent designed as a low-overhead alternative. We successfully integrated a RoBERTa-base classifier fine-tuned using LoRA (Low-Rank Adaptation), achieving 99.75% accuracy against real world tests. The system's operation is anchored by the analysis of select data gathered from Windows Event Logs (WEL), where the classifier performs 4-class syntactic analysis to distinguish malicious from benign commands. We demonstrated its capability to detect and mitigate three high-priority MITRE ATT&CK techniques — T1003.002 (OS Credential Dumping), T1562 (Impair Defenses), and T1134 (Access Token Manipulation) and benign commands — with near perfect fidelity. This work validates a significant contribution by establishing a highly effective and resource-efficient security monitoring approach that matches the detection rigor of kernel-level solutions while adhering to the critical constraint of low system overhead and minimal deployment complexity.

Keywords: Kernel-driverless agent, Windows Event Logs, MITRE ATT&CK, intrusion detection, RoBERTa, LoRA, Syntactic Analysis, Adversarial Training.

Contents

Introduction
Proposed NextGen EDR Framework
Methodology
Results
Discussions & Future Work
Publication Reference

1Introduction

This paper addresses the security challenges of achieving an effective and efficient threat detection and mitigation solution without compromising both system stability and security. Existing solutions that do this are known as endpoint detection and response solutions or EDR.

Three of the best EDR solutions in terms of both effectiveness and market share are CrowdStrike's "Falcon Sensor", IBM's "QRadar EDR" and SentinelOne's "Singularity Platform". These solutions are very expensive and rely on their respective agents to reside within the kernel. They need to be in the kernel in order to monitor key security events such as process creation, process mutation, as well as monitoring raw network traffic.

The kernel has the highest level of access in a given computer and hence the aforementioned EDR solutions need this data in order to have the highest level of detection possible. Since drivers operate within the kernel, a bug within the driver can cause a system-wide failure. It also expands the host attack surface as the kernel has the highest level of privilege within a given system. Our agent mitigates this risk by only needing to be run as an Administrator on a given Windows computer.

In July 2024, CrowdStrike had a bug within one of their kernel drivers for Windows. The bug was a logic flaw caused by a trivial programming error. This resulted in the blue screen of death across 8.5 million computers worldwide.

In summary, existing kernel-mode solutions provide the necessary fidelity but at the unacceptable cost of stability and an expanded TCB (Trusted Computing Base). This work addresses this critical gap by proposing and validating a novel, kernel-driverless agent that achieves high detection and automated mitigation efficacy through advanced WEL stream analysis, maintaining a highly constrained attack surface and an efficient performance overhead within the host computer. The main verifiable contributions of this work are:

A Novel, Kernel-Driverless Architecture: A security monitoring agent that runs solely in user space (Ring 3) with elevated privileges, significantly reducing the attack surface.
Effective Event Log Processing: A lightweight algorithm for real-time analysis of the Windows Event Log stream, demonstrating that high-priority threats can be accurately identified without requiring deep kernel hooks.
Open-Source Feasibility: Validation of an entirely open-source software stack that meets industrial detection efficacy standards, offering a cost-effective alternative for small and medium-sized enterprises.
Accurate labelling of both malicious and benign security events: A RoBERTa classifier trained with 4 labels — Benign, T1003.002, T1134, and T1562 — achieving near-perfect accuracy of 99.5% during real-world stress tests.

2Proposed NextGen EDR Framework

Based on the risks associated with an expanded Trusted Computing Base and performance overhead, we built the 'Next Generation End-point Detection Agent': a telemetry server and a text classifier model "RoBERTa" as a 3-part comprehensive security solution. The agent is implemented as a Python service operating entirely in user space on the Windows endpoint, requiring only administrative privileges. The agent sends a command to a Flask server that has imported the trained model; the model assigns a label and a confidence score to the command and pushes this data to the database; the server then renders that data in a manner that details each attack appropriately.

Windows Event Log

Events 4688, 4950, 4946, 4947, 4948

Ring 3 Agent

Event Subscription & Detection

Filter · extract command-line · lowercase normalization
Send to model server for inference

Flask Model Server

RoBERTa Classifier (LoRA)

4-class: Benign / T1003.002 / T1134 / T1562
Output: label + confidence score → MongoDB

React Telemetry Server

Dashboard & Mitigation

Incident timeline · Cyber Kill Chain · Process Ancestry
Remote mitigation commands

2.1System Architecture

The agent runs as a Python process in user space (Ring 3) with elevated privileges. This design inherently avoids the risks of kernel-mode drivers. The model classifies a command as malicious or benign and assigns a confidence score. The server reads detection and mitigation data sent from the agent via MongoDB, then renders it in the telemetry dashboard.

2.2Detection Methodology

The agent watches for events generated after program initialization and sends command-line data from specific events to the model for inference. The agent subscribes to the Windows Event Log on startup and watches for the following event IDs:

Event ID	Description
4688	A new process was created
4950	A firewall setting has changed
4946	A firewall rule was added
4947	A firewall rule was modified
4948	A firewall rule was deleted

If an event with ID 4688 has been caught, the agent checks if the command for that event contains any indicators of a registry hive being copied. Events 4950, 4946, 4947, and 4948 are parsed by default without a filter due to the nature of firewall attacks. The agent sends the command-line data to the model server for a label and confidence score. The label is one of the four classes: the three MITRE codes or Benign.

Attack	MITRE ATT&CK Technique ID
Credential Dumping	T1003.002
Token Access Manipulation	T1134
Impair Defenses	T1562
Benign	N/A

2.3RoBERTa Classifier

The classifier is the RoBERTa model developed by the University of Washington and Facebook AI, built on BERT (Bidirectional Encoder Representations from Transformers). RoBERTa builds on the performance of BERT by training longer, with bigger batches over more data; removing the next sentence prediction objective; training on longer sequences; and dynamically changing the masking pattern applied to training data. This enhanced version proved useful as the commands being parsed can be lengthy, and the model must be ready to classify long commands.

Command labelling: The model classifies attacks as malicious or benign and assigns a label in the form of a MITRE ATT&CK code if malicious, or Benign if the command is not a potential threat.
Confidence score: The classifier assigns a confidence score (percentage) indicating how confident the model is that an event is malicious or benign.

2.4Telemetry Server

Incident Timeline: A timeline of the detection of an attack followed by the mitigation steps.
Process Ancestry (Kill Chain): After a malicious process is caught, the program recursively checks for parent processes until the root process is found. This visualizes the attacker's full chain of execution.
Event Telemetry: The raw JSON data for the attack, formatted for readability.

2.5Mitigation Techniques

Locking the computer: Temporary measure to lock out an attacker currently logged into the computer.
Shutting down the computer: Computer is shut down remotely to prevent any further misuse.
Resetting firewall to its default state: Any modifications to the firewall are voided by resetting the firewall to its default state in Windows.
Changing the account password: Prevents future breaches should an administrator choose this option.

3Methodology

The methodology focuses on confirming the system's core functional reliability: its ability to successfully classify the four attacks in Table II with a high degree of accuracy.

3.1Data Generation

The data generation protocol was designed to address the inherent syntactic ambiguity of Windows command-line analysis by producing a highly balanced and structurally isolated training corpus.

Corpus Synthesis: The final dataset comprises 200,000 synthesized command-line entries, logically partitioned into training 80%, validation 10%, and test 10% subsets.

Structural Isolation Ratios: The training data enforces a stringent density ratio to reinforce critical decision boundaries: 70% Malicious TTPs / 20% Benign Hard Negatives / 10% Benign Pure Noise.

Toxin Removal Policy: To specifically eliminate the Catastrophic Forgetting failure mode observed in preliminary models (where the model confuses safe and malicious registry commands), a Toxin Removal policy was implemented. All ambiguous reg save/export structures were surgically eliminated from the Hard Negative class and replaced with placeholder commands (e.g., backup-utility) or safe queries (get-registry-value), ensuring that the reg save/export syntax is exclusively associated with T1003.002 payload delivery.

Preprocessing: All raw command input was subjected to a mandatory normalization step prior to tokenization: redundant whitespace was collapsed to a single space, and all characters were converted to lowercase, matching the agent's pre-inference preparation.

3.2Input Preprocessing

By default, all commands sent to the model for inference are normalized by lowering the case for each character. This ensures that if an attacker tries to change the case of a character within a command, the model won't be affected in terms of evaluation performance.

3.3Training Hyperparameters

Parameter	Value	Rationale
Base Model	RoBERTa-base	Optimal balance of robustness and efficiency for syntactic analysis
PEFT Method	LoRA (Low-Rank Adaptation)	Minimizes computational overhead and memory footprint required for training
LoRA Rank	16	Determined empirically to provide sufficient capacity for TTP semantic learning
Target Modules	query, value, key, dense	Targets both the attention mechanism and the final classification layers
Batch Size (Train/Eval)	32 / 32	Optimized for GPU memory utilization while allowing effective gradient accumulation
Learning Rate	2.0×10⁻⁴	Standard optimal rate for fine-tuning dense transformer models
Precision	bf16 (BFloat16)	Reduces memory usage and increases throughput on compatible hardware
Stopping Mechanism	Early stopping callback (Patience: 5)	Prevents overfitting and ensures the model stabilizes at its best F1 score

3.4Experimental Environment

Host running the agent: Windows 11 version 25H2 running on a Virtual Machine (VMware hypervisor).
Server running the model: GPU droplet (Virtual Private Server) hosted by Digital Ocean, paired with an NVIDIA RTX 4000 ADA.
Server running the telemetry server: React application that retrieves information from the MongoDB database and sends mitigation commands.
Database: MongoDB for storing incident data.

3.5Model Evaluation

There are two components to assess how well the model performs: (1) the post-training evaluation, which measures precision, recall, F1-score, and support; and (2) real-world tests in the form of randomly generated commands that resemble the four labels.

4Results

4.1Classification Performance

The model was evaluated against a held-out test set of 20,000 samples (10% of the total corpus). The system achieved an overall accuracy of 100% on post-training evaluation, with the agent successfully identifying all three malicious TTPs and distinguishing them from benign administrative activity.

Classification Label	Precision	Recall	F1-Score	Support
Benign	1.0000	1.0000	1.0000	1.0000
T1003.002	1.0000	1.0000	1.0000	1.0000
T1562	1.0000	1.0000	1.0000	1.0000
T1134	1.0000	1.0000	1.0000	1.0000

The resulting confusion matrix obtains a perfect diagonal, validating that the structural hardening and toxin removal policies successfully eliminated the overlap between high-privilege administrative tasks and malicious credential access payloads.

100% Post-Training Accuracy

99.5% Stress Test Success

5 ms Inference Latency

255 MB VRAM (stress test)

1,200 Adversarial Cases

4.2Adversarial Robustness and Stress Testing

To verify the model's reliability in a production-simulated environment, a rigorous adversarial stress test was performed. This involved 50 iterations of the "V7 Scenario Suite," comprising 1,200 individual test cases designed to represent critical boundary conditions, built around dynamic fuzzing to generate real-world edge cases and accurately check for possible overfitting.

A successful result is defined as a correct label with a confidence score of at least 80%.

Result	Number of Examples	Percentage (out of 1,200)
Success	1,194	99.5%
Failure	3	0.25%
Low Confidence	3	0.25%

Stress test failure cases (3 unique failures)

Exporting SYSTEM Hive (Malicious) | Predicted: Benign    | 0.98
PowerShell Wrapped Security Dump  | Predicted: Benign    | 0.51
Searching Logs (Safe Pattern)     | Predicted: T1562     | 0.65

4.3Low Performance Overhead

The model used 255 MB of VRAM during the stress test, making it suitable for edge deployment and low-end hardware.
The inference time was roughly 5 ms per inference task, making the fine-tuned model suitable for real-time threat detection.

5Discussions & Future Work

The experimental data confirms that high-fidelity security classification is primarily a data engineering challenge. While preliminary model architectures often struggle with "Confident Failures" or unoptimized latency, the fine-tuned RoBERTa-base model achieved 100% accuracy on evaluation and a 99.5% success rate under rigorous adversarial stress.

A critical takeaway was the necessity of Structural Hardening and Toxin Removal. By surgically isolating malicious syntax from benign administrative structures in the training corpus, the model moved from keyword-based pattern matching to a deeper syntactic understanding of command-line intent. This allowed the system to resolve critical boundary cases — such as distinguishing legitimate registry queries from malicious hive dumps — with confidence scores exceeding 0.90 in the majority of production-simulated scenarios.

The performance profiling validates the system's suitability for resource-constrained environments. With a median inference latency of 5 ms and a peak memory footprint of 255 MB, the solution offers a viable alternative to commercial kernel-level agents. This approach significantly reduces the host attack surface by operating entirely in the application layer while maintaining detection rigor that matches or exceeds traditional driver-based tools.

High-fidelity security classification is primarily a data engineering challenge. The model moved from keyword-based pattern matching to a deeper syntactic understanding of command-line intent through structural hardening and toxin removal.

Future development will focus on expanding the classification ontology to include broader persistence and lateral movement TTPs. Additionally, recursive training loops will be implemented to resolve the remaining 0.25% "Low Confidence" edge cases, and the integration of automated remediation orchestration directly into the host agent's response logic will be explored. These directions are addressed in Genos v1.1 and v1.2.

6Publication Reference

IEEE AIIoT 2026 — Paper [2]

Open-Source Next Gen Endpoint Detection & Response

Ahmed Khan · IEEE World AI IoT Congress (AIIoT 2026) · Seattle, USA

PDF and BibTeX will be linked here upon publication in the IEEE digital library. The paper presents the full architecture, training methodology, evaluation results, adversarial stress test data, and dashboard implementations.

Continue to Genos v1.1 →