Open-Source Next Gen Endpoint Detection & Response

Ahmed Khan

Genos v1.0  ·  IEEE AIIoT 2026  ·  Research Whitepaper

Abstract

Existing security solutions are expensive and often utilize kernel-level agents which, by design, introduce a high attack surface and are resource-intensive, rendering them impractical for low-overhead or constrained environments. This paper proposes and validates a novel open-source, kernel-driverless (Ring 3) intrusion detection and mitigation agent designed as a low-overhead alternative. We successfully integrated a RoBERTa-base classifier fine-tuned using LoRA (Low-Rank Adaptation), achieving 99.75% accuracy against real world tests. The system's operation is anchored by the analysis of select data gathered from Windows Event Logs (WEL), where the classifier performs 4-class syntactic analysis to distinguish malicious from benign commands. We demonstrated its capability to detect and mitigate three high-priority MITRE ATT&CK techniques — T1003.002 (OS Credential Dumping), T1562 (Impair Defenses), and T1134 (Access Token Manipulation) and benign commands — with near perfect fidelity. This work validates a significant contribution by establishing a highly effective and resource-efficient security monitoring approach that matches the detection rigor of kernel-level solutions while adhering to the critical constraint of low system overhead and minimal deployment complexity.

Keywords: Kernel-driverless agent, Windows Event Logs, MITRE ATT&CK, intrusion detection, RoBERTa, LoRA, Syntactic Analysis, Adversarial Training.

Contents
  1. Introduction
  2. Proposed NextGen EDR Framework
    1. System Architecture
    2. Detection Methodology
    3. RoBERTa Classifier
    4. Telemetry Server
    5. Mitigation Techniques
  3. Methodology
    1. Data Generation
    2. Input Preprocessing
    3. Training Hyperparameters
    4. Experimental Environment
    5. Model Evaluation
  4. Results
    1. Classification Performance
    2. Adversarial Robustness and Stress Testing
    3. Low Performance Overhead
  5. Discussions & Future Work
  6. Publication Reference

1Introduction

This paper addresses the security challenges of achieving an effective and efficient threat detection and mitigation solution without compromising both system stability and security. Existing solutions that do this are known as endpoint detection and response solutions or EDR.

Three of the best EDR solutions in terms of both effectiveness and market share are CrowdStrike's "Falcon Sensor", IBM's "QRadar EDR" and SentinelOne's "Singularity Platform". These solutions are very expensive and rely on their respective agents to reside within the kernel. They need to be in the kernel in order to monitor key security events such as process creation, process mutation, as well as monitoring raw network traffic.

The kernel has the highest level of access in a given computer and hence the aforementioned EDR solutions need this data in order to have the highest level of detection possible. Since drivers operate within the kernel, a bug within the driver can cause a system-wide failure. It also expands the host attack surface as the kernel has the highest level of privilege within a given system. Our agent mitigates this risk by only needing to be run as an Administrator on a given Windows computer.

In July 2024, CrowdStrike had a bug within one of their kernel drivers for Windows. The bug was a logic flaw caused by a trivial programming error. This resulted in the blue screen of death across 8.5 million computers worldwide.

In summary, existing kernel-mode solutions provide the necessary fidelity but at the unacceptable cost of stability and an expanded TCB (Trusted Computing Base). This work addresses this critical gap by proposing and validating a novel, kernel-driverless agent that achieves high detection and automated mitigation efficacy through advanced WEL stream analysis, maintaining a highly constrained attack surface and an efficient performance overhead within the host computer. The main verifiable contributions of this work are:

2Proposed NextGen EDR Framework

Based on the risks associated with an expanded Trusted Computing Base and performance overhead, we built the 'Next Generation End-point Detection Agent': a telemetry server and a text classifier model "RoBERTa" as a 3-part comprehensive security solution. The agent is implemented as a Python service operating entirely in user space on the Windows endpoint, requiring only administrative privileges. The agent sends a command to a Flask server that has imported the trained model; the model assigns a label and a confidence score to the command and pushes this data to the database; the server then renders that data in a manner that details each attack appropriately.

Windows Event Log
Events 4688, 4950, 4946, 4947, 4948
Ring 3 Agent
Event Subscription & Detection
Filter · extract command-line · lowercase normalization
Send to model server for inference
Flask Model Server
RoBERTa Classifier (LoRA)
4-class: Benign / T1003.002 / T1134 / T1562
Output: label + confidence score → MongoDB
React Telemetry Server
Dashboard & Mitigation
Incident timeline · Cyber Kill Chain · Process Ancestry
Remote mitigation commands

2.1System Architecture

The agent runs as a Python process in user space (Ring 3) with elevated privileges. This design inherently avoids the risks of kernel-mode drivers. The model classifies a command as malicious or benign and assigns a confidence score. The server reads detection and mitigation data sent from the agent via MongoDB, then renders it in the telemetry dashboard.

2.2Detection Methodology

The agent watches for events generated after program initialization and sends command-line data from specific events to the model for inference. The agent subscribes to the Windows Event Log on startup and watches for the following event IDs:

Event IDDescription
4688A new process was created
4950A firewall setting has changed
4946A firewall rule was added
4947A firewall rule was modified
4948A firewall rule was deleted

If an event with ID 4688 has been caught, the agent checks if the command for that event contains any indicators of a registry hive being copied. Events 4950, 4946, 4947, and 4948 are parsed by default without a filter due to the nature of firewall attacks. The agent sends the command-line data to the model server for a label and confidence score. The label is one of the four classes: the three MITRE codes or Benign.

AttackMITRE ATT&CK Technique ID
Credential DumpingT1003.002
Token Access ManipulationT1134
Impair DefensesT1562
BenignN/A

2.3RoBERTa Classifier

The classifier is the RoBERTa model developed by the University of Washington and Facebook AI, built on BERT (Bidirectional Encoder Representations from Transformers). RoBERTa builds on the performance of BERT by training longer, with bigger batches over more data; removing the next sentence prediction objective; training on longer sequences; and dynamically changing the masking pattern applied to training data. This enhanced version proved useful as the commands being parsed can be lengthy, and the model must be ready to classify long commands.

2.4Telemetry Server

2.5Mitigation Techniques

3Methodology

The methodology focuses on confirming the system's core functional reliability: its ability to successfully classify the four attacks in Table II with a high degree of accuracy.

3.1Data Generation

The data generation protocol was designed to address the inherent syntactic ambiguity of Windows command-line analysis by producing a highly balanced and structurally isolated training corpus.

Corpus Synthesis: The final dataset comprises 200,000 synthesized command-line entries, logically partitioned into training 80%, validation 10%, and test 10% subsets.

Structural Isolation Ratios: The training data enforces a stringent density ratio to reinforce critical decision boundaries: 70% Malicious TTPs / 20% Benign Hard Negatives / 10% Benign Pure Noise.

Toxin Removal Policy: To specifically eliminate the Catastrophic Forgetting failure mode observed in preliminary models (where the model confuses safe and malicious registry commands), a Toxin Removal policy was implemented. All ambiguous reg save/export structures were surgically eliminated from the Hard Negative class and replaced with placeholder commands (e.g., backup-utility) or safe queries (get-registry-value), ensuring that the reg save/export syntax is exclusively associated with T1003.002 payload delivery.

Preprocessing: All raw command input was subjected to a mandatory normalization step prior to tokenization: redundant whitespace was collapsed to a single space, and all characters were converted to lowercase, matching the agent's pre-inference preparation.

3.2Input Preprocessing

By default, all commands sent to the model for inference are normalized by lowering the case for each character. This ensures that if an attacker tries to change the case of a character within a command, the model won't be affected in terms of evaluation performance.

3.3Training Hyperparameters

ParameterValueRationale
Base ModelRoBERTa-baseOptimal balance of robustness and efficiency for syntactic analysis
PEFT MethodLoRA (Low-Rank Adaptation)Minimizes computational overhead and memory footprint required for training
LoRA Rank16Determined empirically to provide sufficient capacity for TTP semantic learning
Target Modulesquery, value, key, denseTargets both the attention mechanism and the final classification layers
Batch Size (Train/Eval)32 / 32Optimized for GPU memory utilization while allowing effective gradient accumulation
Learning Rate2.0×10−4Standard optimal rate for fine-tuning dense transformer models
Precisionbf16 (BFloat16)Reduces memory usage and increases throughput on compatible hardware
Stopping MechanismEarly stopping callback (Patience: 5)Prevents overfitting and ensures the model stabilizes at its best F1 score

3.4Experimental Environment

3.5Model Evaluation

There are two components to assess how well the model performs: (1) the post-training evaluation, which measures precision, recall, F1-score, and support; and (2) real-world tests in the form of randomly generated commands that resemble the four labels.

4Results

4.1Classification Performance

The model was evaluated against a held-out test set of 20,000 samples (10% of the total corpus). The system achieved an overall accuracy of 100% on post-training evaluation, with the agent successfully identifying all three malicious TTPs and distinguishing them from benign administrative activity.

Classification LabelPrecisionRecallF1-ScoreSupport
Benign1.00001.00001.00001.0000
T1003.0021.00001.00001.00001.0000
T15621.00001.00001.00001.0000
T11341.00001.00001.00001.0000

The resulting confusion matrix obtains a perfect diagonal, validating that the structural hardening and toxin removal policies successfully eliminated the overlap between high-privilege administrative tasks and malicious credential access payloads.

100% Post-Training Accuracy
99.5% Stress Test Success
5 ms Inference Latency
255 MB VRAM (stress test)
1,200 Adversarial Cases

4.2Adversarial Robustness and Stress Testing

To verify the model's reliability in a production-simulated environment, a rigorous adversarial stress test was performed. This involved 50 iterations of the "V7 Scenario Suite," comprising 1,200 individual test cases designed to represent critical boundary conditions, built around dynamic fuzzing to generate real-world edge cases and accurately check for possible overfitting.

A successful result is defined as a correct label with a confidence score of at least 80%.

ResultNumber of ExamplesPercentage (out of 1,200)
Success1,19499.5%
Failure30.25%
Low Confidence30.25%
Stress test failure cases (3 unique failures)
Exporting SYSTEM Hive (Malicious) | Predicted: Benign    | 0.98
PowerShell Wrapped Security Dump  | Predicted: Benign    | 0.51
Searching Logs (Safe Pattern)     | Predicted: T1562     | 0.65

4.3Low Performance Overhead

5Discussions & Future Work

The experimental data confirms that high-fidelity security classification is primarily a data engineering challenge. While preliminary model architectures often struggle with "Confident Failures" or unoptimized latency, the fine-tuned RoBERTa-base model achieved 100% accuracy on evaluation and a 99.5% success rate under rigorous adversarial stress.

A critical takeaway was the necessity of Structural Hardening and Toxin Removal. By surgically isolating malicious syntax from benign administrative structures in the training corpus, the model moved from keyword-based pattern matching to a deeper syntactic understanding of command-line intent. This allowed the system to resolve critical boundary cases — such as distinguishing legitimate registry queries from malicious hive dumps — with confidence scores exceeding 0.90 in the majority of production-simulated scenarios.

The performance profiling validates the system's suitability for resource-constrained environments. With a median inference latency of 5 ms and a peak memory footprint of 255 MB, the solution offers a viable alternative to commercial kernel-level agents. This approach significantly reduces the host attack surface by operating entirely in the application layer while maintaining detection rigor that matches or exceeds traditional driver-based tools.

High-fidelity security classification is primarily a data engineering challenge. The model moved from keyword-based pattern matching to a deeper syntactic understanding of command-line intent through structural hardening and toxin removal.

Future development will focus on expanding the classification ontology to include broader persistence and lateral movement TTPs. Additionally, recursive training loops will be implemented to resolve the remaining 0.25% "Low Confidence" edge cases, and the integration of automated remediation orchestration directly into the host agent's response logic will be explored. These directions are addressed in Genos v1.1 and v1.2.

6Publication Reference

IEEE AIIoT 2026 — Paper [2]

Open-Source Next Gen Endpoint Detection & Response

Ahmed Khan  ·  IEEE World AI IoT Congress (AIIoT 2026)  ·  Seattle, USA

PDF and BibTeX will be linked here upon publication in the IEEE digital library. The paper presents the full architecture, training methodology, evaluation results, adversarial stress test data, and dashboard implementations.

Continue to Genos v1.1 →