Optimization for Data Collection in ML-Driven Network Control

Deployed ML systems often face environment or domain shifts (different traffic, devices, content, or networking conditions), leading to degraded performance if not trained on representative data. For example, in ML-based control algorithms for video streaming, models trained and evaluated on data from one environment may perform poorly when deployed in another environment with different network conditions and user behaviors. In past work, we address this issue by building ABR-Arena, a platform to collect data, test, and train models in multiple environments simultaneously, enabling better generalization across diverse conditions. However, data collection is expensive and time-consuming, and simply collecting more data uniformly across all regions is inefficient.

On this premise, the goal of this thesis is to design a planning or optimization algorithm that, given a target (e.g., improve performance in environment A or improve generalization across all environments) and a time or money budget, decides where to collect training data next. The algorithm should balance the expected gain from collecting data in a given region (e.g., how much the model is expected to improve) against the cost of collecting that data (e.g., time, money, or resource usage). The algorithm should be adaptive, updating its plan as new data is collected and the model is retrained.

Milestones (suggested)

Familiarization: study related work on ML-based control in networking – particularly video streaming –, active learning, data valuation, and adaptive data collection, as well as understand the ABR-Arena platform.
Define objective: formalize the optimization (gain per cost), choose targets, costs, and evaluation (ID/OOD error, QoE).
Build baseline pipeline: build the loop (plan → collect → retrain → evaluate) and implement simple baseline planning approaches (uniform, proportional, diversity heuristic).
Implement planner prototypes (offline): implement 2–3 candidate planners, run offline/retrospective tests on already collected data.
Evaluate online: test the best planners in the real system and compare against baselines, improving the approaches as needed.
Ablations & write-up: evaluate the planner’s sensitivity to specific budget/targets, fairness, proxy metric choices, deployment regions etc. (and write the thesis).

Candidate approaches (only suggested, non-exhaustive)

Below we list some possible approaches to tackle the problem. Note, however, that these are only ideas; we have not tested any of these approaches. You are, of course, free to explore other ideas or combinations of ideas.

Cost-sensitive contextual bandit: choose regions using LinUCB/Thompson (e.g., using reward = Δ(metric)/cost)
Submodular coverage: greedy facility-location over (RTT, throughput) bins to cover under-represented conditions (or any other diversity metric and “coverage maximization” approach)
Bayesian experimental design: allocate to maximize expected uncertainty/info-gain reduction per unit cost
Learning-curve + knapsack: fit per-region learning curves to allocate the next budget chunk optimally
Robust (min–max) allocation: hedge against uncertain deployment mixes and maximize worst-case generalization

Requirements

Coding experience in Python
Knowledge of ML concepts and experience with ML frameworks (mainly PyTorch)
Experience with optimization/planning algorithms
An interest in networking and learning-based control algorithms

Note

This is a Master’s Thesis – depending on the results and your motivation, you will have the option to participate in writing a scientific publication on this topic.

Supervisors

Benjamin Hoffman

PhD student

Valerio Torsiello

PhD student

Dr. Ayush Mishra

Post-doc

Prof. Laurent Vanbever

Group Leader