Neural Network for Predicting Network Failure

Exploring AI-driven solutions to enhance network reliability

About the Project

Our research focuses on leveraging neural networks to predict network failures. By analyzing traffic patterns and system metrics, we aim to enhance the reliability of network infrastructures and prevent potential downtimes.

Project Proposal

Title: Neural Network for Predicting Network Failure

Objective: To develop a predictive model using neural networks that can analyze network traffic and system metrics to identify potential failure points before they occur.

Approach: Our approach includes traffic pattern analysis, system metric analysis, failure simulations using ns-3 or Mininet, and hybrid modeling techniques. We will implement the solution in Python using TensorFlow or PyTorch.

Expected Outcome: A prototype system capable of accurately predicting network failures, contributing to proactive maintenance and improved network reliability.

Midterm Update

Dataset Selection: We have switched from our initial plan of using generic public datasets to adopting the SOFI dataset (“Symptom-fault relationship for IP-network”). This dataset provides a labeled collection of normal (“NE”) versus faulty (“F”) network states captured from a large, emulated IP network. It includes SNMP-based performance metrics and covers a range of artificially induced failures such as link down events, line card failures, and high link utilization.

Reason for the Change: The SOFI dataset’s clear labeling and rich feature set (e.g., inbound/outbound packets, error rates, operational status) make it ideal for training our predictive model. It also aligns with real-world conditions where failure instances can be rare.

Progress:

Biweekly Update #3

Progress Overview:
Over the past two weeks, we have made substantial progress in building and evaluating machine learning models for network failure prediction using the SOFI (Symptom-Fault relationship for IP-Network) dataset. This dataset, developed to capture symptom-fault causal relationships in IP-based enterprise networks, provided a rich and well-labeled foundation for our experiments.

Our initial focus was on implementing two baseline models using PyTorch in Google Colab:

Updated Project Direction:
Based on our results, we’ve expanded the scope of our project to compare four different machine learning approaches:

  1. Supervised Learning (Completed)
  2. Unsupervised Learning (Completed)
  3. Semi-Supervised Learning (In Progress) – Combines labeled and unlabeled data to improve generalization while reducing reliance on fully labeled datasets. We plan to experiment with pseudo-labeling and consistency regularization techniques.
  4. Time-Dependent (Temporal) Model (Upcoming) – Will involve sequential models such as RNNs or LSTMs to account for temporal dependencies in network behavior. This model aims to capture how patterns evolve over time, which is essential for forecasting failures.

Next Steps:

Challenges:
Our primary challenges involve preparing the data in a format compatible with each model type and ensuring fair comparisons between them. Time-series preprocessing, in particular, is a focus for the coming week. Additionally, handling the dataset’s class imbalance across model types continues to be an area we’re actively addressing.

Overall, we’re excited by our progress and believe the comparative evaluation of these models will provide valuable insights for real-world network failure prediction applications.

Final Report Summary

Dataset: We used the SOFI CoreSwitch-II dataset, which captures over 12,000 labeled network states with performance metrics and artificially induced failures. The dataset is highly imbalanced, with failures comprising less than 2% of records.

Preprocessing: We cleaned the data by removing placeholder values, dropped low-variance features, engineered error-based ratios, and scaled inputs. To address class imbalance, we used class weighting and SMOTE for supervised models.

Models Implemented:

Key Takeaways:

Conclusion: This project gave us hands-on experience with a wide range of modeling strategies for anomaly detection. Our findings reinforce that model selection, data handling, and evaluation strategy all contribute significantly to real-world performance.

📄 View Full Final Report (PDF)