Exploring AI-driven solutions to enhance network reliability
Our research focuses on leveraging neural networks to predict network failures. By analyzing traffic patterns and system metrics, we aim to enhance the reliability of network infrastructures and prevent potential downtimes.
Title: Neural Network for Predicting Network Failure
Objective: To develop a predictive model using neural networks that can analyze network traffic and system metrics to identify potential failure points before they occur.
Approach: Our approach includes traffic pattern analysis, system metric analysis, failure simulations using ns-3 or Mininet, and hybrid modeling techniques. We will implement the solution in Python using TensorFlow or PyTorch.
Expected Outcome: A prototype system capable of accurately predicting network failures, contributing to proactive maintenance and improved network reliability.
Dataset Selection: We have switched from our initial plan of using generic public datasets to adopting the SOFI dataset (“Symptom-fault relationship for IP-network”). This dataset provides a labeled collection of normal (“NE”) versus faulty (“F”) network states captured from a large, emulated IP network. It includes SNMP-based performance metrics and covers a range of artificially induced failures such as link down events, line card failures, and high link utilization.
Reason for the Change: The SOFI dataset’s clear labeling and rich feature set (e.g., inbound/outbound packets, error rates, operational status) make it ideal for training our predictive model. It also aligns with real-world conditions where failure instances can be rare.
Progress:
Progress Overview:
Over the past two weeks, we have made substantial progress in building and evaluating machine learning models for network failure prediction using the SOFI (Symptom-Fault relationship for IP-Network) dataset. This dataset, developed to capture symptom-fault causal relationships in IP-based enterprise networks, provided a rich and well-labeled foundation for our experiments.
Our initial focus was on implementing two baseline models using PyTorch in Google Colab:
Updated Project Direction:
Based on our results, we’ve expanded the scope of our project to compare four different machine learning approaches:
Next Steps:
Challenges:
Our primary challenges involve preparing the data in a format compatible with each model type and ensuring fair comparisons between them. Time-series preprocessing, in particular, is a focus for the coming week. Additionally, handling the dataset’s class imbalance across model types continues to be an area we’re actively addressing.
Overall, we’re excited by our progress and believe the comparative evaluation of these models will provide valuable insights for real-world network failure prediction applications.
Dataset: We used the SOFI CoreSwitch-II dataset, which captures over 12,000 labeled network states with performance metrics and artificially induced failures. The dataset is highly imbalanced, with failures comprising less than 2% of records.
Preprocessing: We cleaned the data by removing placeholder values, dropped low-variance features, engineered error-based ratios, and scaled inputs. To address class imbalance, we used class weighting and SMOTE for supervised models.
Models Implemented:
Key Takeaways:
Conclusion: This project gave us hands-on experience with a wide range of modeling strategies for anomaly detection. Our findings reinforce that model selection, data handling, and evaluation strategy all contribute significantly to real-world performance.