PENS
POLITEKNIK ELEKTRONIKA NEGERI SURABAYA
IDS
FINAL PROJECT PROPOSAL
ML
IDS
IoT
IIoT
Wahyu Ikbal Maulana
3323600056 · 3 SDT B
PEMBIMBING
Ferry Astika Saputra
PEMBIMBING
Tita Karlita, S.Kom, M.Kom
FP
01
TOPIC
CYBER
01 / 09
Why this research exists · 5 interconnected reasons
1
New Dataset
CIC IIoT 2025 Released
Latest IIoT benchmark — synchronized sensor time-series and network traffic. 50 attack types, 7 categories, 40 industrial devices. More representative than its predecessor.
unb.ca/cic · IIoT-Dataset-2025
2
Threat Landscape
Problem
820K Attacks / Day
Up 46% from the previous year (ORDR). The IoT ecosystem keeps expanding — more devices, wider attack surface, older datasets no longer representative enough.
ordr.net · IoT Security Statistics
3
Dataset Bias
Problem
97.7% Class Imbalance
CICIoT2023 is dominated by attack traffic at 97.7%. 99% accuracy looks great, but the model is mostly guessing the majority class — not actually learning.
CICIoT2023 · class analysis
4
ML Implementation
Problem
Dataset → IDS Model
CIC IIoT 2025 is relevant as an ML benchmark for IDS. But IoT security data is complex — multi-class performance falls far below binary classification.
Feature Selection · IDS · Benchmark
5
Generalization
Cross-Dataset Eval
A model that performs well on one dataset may not generalize to another domain. Cross-dataset evaluation proves the model learns general attack patterns, not memorized training data.
Cross-domain · Generalization · IDS
02 / 09
Goals & Benefits
Research objectives
· expected contributions
Objectives
Research Goals
1
Evaluate the cross-domain generalization capability of ML-based IDS between IoT and IIoT environments.
2
Quantify the generalization gap between in-dataset and cross-dataset evaluation scenarios.
3
Identify robust network traffic features that remain effective across different datasets.
4
Establish a cross-dataset benchmark using Decision Tree, Random Forest, and XGBoost.
Contributions
Research Benefits
Provides a baseline benchmark for the newly released DataSense (CIC IIoT 2025) dataset.
Supports the development of more reliable and generalizable IDS models for real-world IIoT deployment.
Reduces the risk of overestimating IDS performance caused by single-dataset evaluation.
Contributes empirical evidence on the impact of domain shift between IoT and IIoT datasets.
Serves as a reference for future research on cross-dataset intrusion detection evaluation.
03 / 09
Dataset
CICIoT 2023 vs CIC IIoT 2025
Attribute CICIoT 2023 CIC IIoT 2025 · DataSense
Domain Consumer IoT · smart home / campus Industrial IoT · factory / plant floor
Devices 105 heterogeneous IoT devices 40 IIoT + OT industrial devices
Modality Network traffic (flow / packet) Sensor time-series + network traffic
Attacks 33 types · 7 categories 50 types · 7 categories
Features ~48 features per flow Multi-modal: sensor + network features
Labels Benign + 33 attack classes Benign + 50 attack types
Challenge 97.7% class imbalance Domain shift from IoT to IIoT
04 / 09
Related Works
Prior workflows · researchers · CICIoT2023 approaches
Feature
Engineering
Raw traffic → variance threshold → correlation filter → 48-feature subset. PCA for dimensionality reduction before classification. Focus on computational efficiency.
INPUTRaw packet capture
SELECTIONVariance + Correlation
REDUCTIONPCA · Subset
CLASSIFIERRandom Forest · SVM
Ensemble
Learning
Multiple base classifiers (RF, DT, SVM, NB) → majority voting or stacking → ensemble decision. Class imbalance handled with SMOTE or class weighting.
BASERF · DT · SVM · NB
COMBINEVoting · Stacking
IMBALANCESMOTE · Class weight
EVALPrecision · Recall · F1
Deep
Learning
Normalized flow → LSTM/CNN for sequential detection. Autoencoder for anomaly detection on encrypted or unlabeled IoT traffic.
INPUTTime-series network flow
MODELLSTM · CNN · Autoencoder
TASKAnomaly · Classification
EDGEEncrypted traffic
Cross-
Validation
Stratified k-fold CV with per-class metrics. Some studies add cross-dataset testing to validate generalization to other environments.
EVALStratified k-fold CV
METRICSPrecision · Recall · F1
EXTRACross-dataset validation
REPORTConfusion matrix · ROC
05 / 09
Proposed Workflow CIC IoT 2023 - Related Research
Proposed workflow overview
Workflow supporting image 1
Workflow supporting image 2
Workflow supporting image 3
Workflow supporting image 4
06 / 09
System Design
CICIoT2023 training flow · CIC IIoT 2025 testing flow
System design diagram for the cross-dataset evaluation pipeline
Train on CICIoT2023, then align features and test on CIC IIoT 2025 to measure the generalization gap.
07 / 09
Final Outputs
Three deliverables from the study
3
The final result is a three-part answer to the research problem.
Generalization Gap
Measures how far performance drops when the model moves from in-dataset testing to cross-dataset testing.
Robust Feature Analysis
Identifies which features stay useful across both datasets and which ones are too dataset-specific.
Baseline Recommendation
Gives a practical starting point for the best model-feature combination to carry forward.
Research Timeline
4 simple phases · Mar – Oct 2025
from data prep to defense
Mar - Apr 01
Understand the data
Exploration, cleaning, and a quick look at both datasets before any modeling starts.
Apr - May 02
Prepare features
Normalization, imbalance handling, and selecting features that are worth keeping.
Jun - Jul 03
Train and compare
Build DT, RF, and XGBoost models, then test them in-dataset and cross-dataset.
Aug - Oct 04
Analyze and write
Measure the generalization gap, decide the baseline, and finish the final report and defense.
08 / 09
Closing

Thank You

09 / 09
✓ SAVED