References of "Wassermann, Sarah"
     in
Bookmark and Share    
See detailBIGMOMAL - Big Data Analytics for Mobile Malware Detection
Wassermann, Sarah ULiege; Casas, Pedro

Poster (2017, November)

Mobile malware is on the rise. Due to their popularity, smartphones represent an attractive target for cybercriminals, especially regarding unauthorized access to private user data; smartphones ... [more ▼]

Mobile malware is on the rise. Due to their popularity, smartphones represent an attractive target for cybercriminals, especially regarding unauthorized access to private user data; smartphones incorporate a lot of sensitive information about users, even more than a personal computer. Indeed, besides personal information such as documents, accounts, passwords, contacts, etc., smartphone sensors centralize other sensitive data such as user location, physical activities, etc. In this paper, we study the problem of malware detection in smartphones, using supervised machine learning models and big data analytics frameworks. Using a publicly available dataset for smartphone data analysis (the SherLock data collection, see http://bigdata.ise.bgu.ac.il/sherlock/), we train and benchmark different supervised machine learning models to detect malware apps activity.The Sherlock data collection is a crowdsourcing-based smartphone dataset in which hundreds of features from many different "sensors" or vantage points within the device are monitored, using a tailored smartphone agent. The collection is done during a long-term - 2 years (2015/16), field trial on 50 smartphones used as primary device for 50 different participants. The monitoring agent collects a wide variety of network, software and sensor data at a high sample rate (as low as 5 seconds); in addition, participant devices include a sandbox-like smartphone agent which runs controlled malware apps, perpetrating attacks on the user's device (such as contacts theft, spyware, phishing, etc.), while creating labels for the SherLock dataset. The complete labeled dataset contains more than 10 billion data records, with a total of about 4 TB of data. We additionally complement the labels for malicious apps which might have been installed by participants by analyzing the installed apps' hashes in Virus Total (https://www.virustotal.com), a well-known multi antivirus online scanning system. From the complete dataset, we keep two specific feature categories: all those features related to the network traffic generated by the apps, and all those features corresponding to the footprint of the app on the CPU and internal running processes (e.g., statistics on CPUs, memory usage, linux-level processes information, etc.). The rationale is that some malware activity would be more visible at the network traffic level, whereas some others would be better identified at the local processes level. Using this dataset, we train different machine learning models (e.g., decision trees, neural networks, SVMs, etc.) and verify their accuracy to automatically spot out malicious apps running on the users’ devices. We also apply feature selection strategies to improve results and reduce computational times. Given the size of the dataset, we rely on big data platforms (such as Spark) to perform the analysis, complementing the machine learning based analysis with scikit-learn like pipelines. We evaluate three different concepts, including (i) overall model performance (using multi-fold cross validation on the complete dataset), (ii) generalization of the learned models across different users (train in N-1 users, and test in the remaining user), and (iii) detection accuracy drift along time (train during first month, test the resulting model in the subsequent months). Initial results are very promising, especially regarding overall model performance for decision tree based models. [less ▲]

See detailAnycast on the Move - A First Look at Mobile Anycast Performance
Wassermann, Sarah ULiege; Rula, John P.; Bustamante, Fabian

Poster (2017, November)

Service providers rely on replication to improve service performance and reliability, placing instances in multiple locations and redirecting clients to nearby copies. Anycast is a common mechanism used ... [more ▼]

Service providers rely on replication to improve service performance and reliability, placing instances in multiple locations and redirecting clients to nearby copies. Anycast is a common mechanism used for redirecting clients in a variety of domains from naming to CDNs and video streaming. IP anycast offers a method for making a service IP address available to a routing system from several locations at once, and clients' requests are directed based on BGP routing policies. For operators, IP anycast offers an economic, scalable, and simple approach to replicated services; BGP provides considerable robustness, adapting to changes in service and network availability. For clients, however, the mapping can be suboptimal, unstable, and seemingly chaotic, as routing policies have not only technical motivations, and routing changes can silently shift traffic from one site to another with a consequent loss of shared state and potential performance impact. Given its wide deployment and interesting tradeoffs, IP anycast has been the focus of much recent measurement work. All prior studies have, nevertheless, focused on wired networks despite the growing dominance of mobile Internet. Today, the number of mobile subscriptions is over 7.4~billion, and users spend over 2x times more hours browsing on their smartphones than on any other device, with the corresponding increase on cellular traffic. We present early results on the first study of anycast performance for mobile users. Our evaluation focuses on two distinct anycast services, K-and F-Root, each providing part the DNS Root zone. Both services are widely replicated with publicly available site locations and unicast IP addresses that allow us to evaluate the relative performance of anycast routing to its ``optimal'' (in terms of unicast) site location. We collected active measurements from geographically distributed clients on both cellular and WiFi networks from September 2016 until April 2017, using the Aqualab’s ALICE engine [1]. In each experiment clients launched ping and traceroute measurements towards the root servers’ anycast addresses, as well as to five chosen unicast addresses determined to be the closest to the client in terms of geographic distance, at an hourly rate. Clients also recorded their geographic location, anonymized to a 10 km² area. Our findings show that mobile clients are routed to suboptimal replicas in terms of geographical distance, more frequently while on a cellular connection than on WiFi, with a significant impact on perceived service performance. The phenomenon seems to be more pronounced for K-Root than for F-Root. A possible explanation for the long distances would be that our cellular clients are simply far away from all the available replicas. However, our investigations demonstrate that this is not necessarily the case. Finally, we start to explore the root causes for anycast anomalies in cellular networks. We reveal three classes of anomalies: distant client packet gateways, poor anycast routing within Tier-1 networks, and improper routing out of cellular networks. [1] http://aqualab.cs.northwestern.edu/projects/261-alice [less ▲]

Full Text
Peer Reviewed
See detailImproving QoE Prediction in Mobile Video through Machine Learning
Casas, Pedro; Wassermann, Sarah ULiege

in Proc. 8th International Conference on Network of the Future (2017, November)

Despite the massive adoption of HTTP adaptive streaming technology, buffering is still the most harmful event for QoE in video streaming. Previous studies have shown that buffering is not only detrimental ... [more ▼]

Despite the massive adoption of HTTP adaptive streaming technology, buffering is still the most harmful event for QoE in video streaming. Previous studies have shown that buffering is not only detrimental for the overall user experience, but is also highly correlated to viewer engagement. The occurrence of buffering is particularly critical in cellular networks and mobile video deployments, as network conditions are less stable and network resources more limited. In this context, monitoring and properly predicting the QoE of video streaming services becomes paramount to cellular network operators, who need to offer high quality levels to reduce the risks of customers churning for quality dissatisfaction. In this paper, we present a novel approach to multi-dimensional QoE prediction in mobile video using machine learning models. Contrary to previous models for QoE prediction in video streaming, which are generally uni- or low-dimensional and model the impact of single video descriptors independently, we use a high-dimensional input space to model the impact of buffering and initial delay on QoE.We train and test the proposed models on a publicly available mobile video dataset, generated from subjective QoE tests with real viewers. Besides improving prediction performance, the proposed models show that there is a clear influence of other buffering pattern descriptors generally neglected in previous models - in particular those linked to the occurrence of the last stalling event, shedding light on new KPIs to monitor for better QoE assessment in video streaming. [less ▲]

Detailed reference viewed: 33 (2 ULiège)
Full Text
Peer Reviewed
See detailNETPerfTrace – Predicting Internet Path Dynamics and Performance with Machine Learning
Wassermann, Sarah ULiege; Casas, Pedro; Cuvelier, Thibaut ULiege et al

in Proceedings of Big-DAMA ’17 (2017, August)

We study the problem of predicting Internet path changes and path performance using traceroute measurements and machine learning models. Path changes are frequently linked to path inflation and ... [more ▼]

We study the problem of predicting Internet path changes and path performance using traceroute measurements and machine learning models. Path changes are frequently linked to path inflation and performance degradation, therefore the relevance of the problem. We introduce NETPerfTrace, an Internet Path Tracking system to forecast path changes and path latency variations. By relying on decision trees and using empirical distribution-based input features, we show that NETPerfTrace can predict (i) the remaining life time of a path before it actually changes and (ii) the number of path changes in a certain time period with relatively high accuracy. Through extensive evaluation, we demonstrate that NETPerfTrace highly outperforms DTRACK, a previous system with the same prediction targets. NETPerfTrace also offers path performance forecasting capabilities. In particular, our tool can predict path latency metrics, providing a system which can not only predict path changes, but also forecast their impact in terms of performance variations. We release NETPerfTrace as open software to the networking community, as well as all evaluation datasets. [less ▲]

Detailed reference viewed: 82 (16 ULiège)
Full Text
See detailAnycast-based DNS in Mobile Networks
Wassermann, Sarah ULiege

Master's dissertation (2017)

Anycast offers a method for making a service IP address available to a routing system from several locations at once. It is used today to provide important services, such as naming and content delivery ... [more ▼]

Anycast offers a method for making a service IP address available to a routing system from several locations at once. It is used today to provide important services, such as naming and content delivery, in an economic, scalable, and simple to operate manner. The appeal and clear benefits of anycast to service providers have motivated a number of recent experimental studies on its potential performance impact. All studies have, to the best of our knowledge, focused on wired networks, despite the growing dominance of mobile as the most common and sometimes only form of Internet access. In this thesis, we present the first study of anycast performance for mobile users. In particular, our evaluation focuses on three distinct anycast services: K- and F-Root, each providing part the DNS root zone, and Google DNS. Our research revolves around three axes. First, we show that mobile clients are frequently routed to suboptimal replicas in terms of latency and that this issue is not limited to specific regions or ASes of the world. Second, we find that clients are often redirected to a DNS server hosted very far away from her. This happens more frequently while on a cellular connection than on WiFi, with a significant impact on performance. Our study reveals that this is not simply an issue of not having better alternatives, and that the problem is not localised to particular geographic areas or particular ASes. We investigate root causes of this phenomenon and describe three of the major detected classes of anycast anomalies. Third and finally, we explore IP assignment dynamics of mobile clients and find that recurrent IP changes on the client side lead to significant perceived variations of anycast latency. [less ▲]

Detailed reference viewed: 21 (4 ULiège)
Full Text
See detailPredicting Internet Path Dynamics and Performance with Machine Learning
Wassermann, Sarah ULiege; Casas, Pedro; Cuvelier, Thibaut ULiege et al

Report (2017)

In this paper, we study the problem of predicting Internet path changes and path performance using traceroute measurements and machine learning models. Path changes are frequently linked to path inflation ... [more ▼]

In this paper, we study the problem of predicting Internet path changes and path performance using traceroute measurements and machine learning models. Path changes are frequently linked to path inflation and performance degradation; therefore, predicting their occurrence is highly relevant for performance monitoring and dynamic traffic engineering. We introduce NETPerfTrace, an Internet Path Tracking system capable of forecasting path changes and path latency variations. By relying on decision trees and using empirical distribution based input features, we show that NETPerfTrace can predict (i) the remaining life time of a path before it actually changes and (ii) the number of path changes in a certain time-slot with high accuracy. Through extensive evaluation, we demonstrate that NETPerfTrace highly outperforms DTRACK, a previous system with the same prediction targets. NETPerfTrace also offers path performance forecasting capabilities. In particular, it can predict path latency metrics, providing a system which could not only predict path changes but also forecast their impact in terms of performance variations. As an additional contribution, we release NETPerfTrace as open software to the networking community. [less ▲]

Detailed reference viewed: 77 (9 ULiège)
Full Text
Peer Reviewed
See detailMachine Learning based Prediction of Internet Path Dynamics
Wassermann, Sarah ULiege; Casas, Pedro; Donnet, Benoît ULiege

in ACM CoNEXT Student Workshop: Irvine 12 décembre 2016 (2016, December)

We study the problem of predicting Internet path changes and path performance using traceroute and machine-learning techniques. Path changes are frequently linked to path inflation and performance ... [more ▼]

We study the problem of predicting Internet path changes and path performance using traceroute and machine-learning techniques. Path changes are frequently linked to path inflation and performance degradation. Therefore, predicting their occurrence could improve the analysis of path dynamics using traceroute. By relying on neural networks and using empirical distribution based input features, we show that we are able to predict (i) the remaining life time of a path before it actually changes, and (ii) the number of path changes in a certain time slot with relatively high accuracy. We also show that it is possible to predict path performance in terms of latency, opening the door to novel, machine-learning-based approaches for RTT prediction. [less ▲]

Detailed reference viewed: 99 (8 ULiège)
Full Text
Peer Reviewed
See detailOn the Analysis of Internet Paths with DisNETPerf, a Distributed Paths Performance Analyzer
Wassermann, Sarah ULiege; Casas, Pedro; Donnet, Benoît ULiege et al

in Proc. 10th IEEE Workshop on Network Measurements (WNM) (2016, November)

Traceroute is the most widely used Internet path analysis tool today to study the topology of the Internet and to diagnose routing failures as well as poor performance events. A major limitation of ... [more ▼]

Traceroute is the most widely used Internet path analysis tool today to study the topology of the Internet and to diagnose routing failures as well as poor performance events. A major limitation of traceroute when the destination is not controllable by the user is its inability to measure reverse paths, i.e., the path from any given destination back to the source. This is a major drawback for ISPs, who need to understand the performance of the Internet paths connecting popular services (e.g., YouTube and Facebook) to their customers. Even if public servers and distributed measurement platforms can provide partial reverse path visibility through ad-hoc measurements, there is still a need for a structured approach capable of analyzing the performance of Internet paths connecting any pair of nodes (servers, routers, hosts, etc.). While the problem of reverse traceroute has been addressed in the past, proposed techniques rely on IP address spoofing – which might lead to security concerns, and assume the availability of certain route-tracking options –, which might not be available. In this paper, we introduce and evaluate DisNETPerf, a new tool which provides exactly the same type of information as traceroute, but for paths connecting arbitrarily selected nodes. DisNETPerf works by firstly locating probes (i.e., measurement points) that are the closest to a given target node, using them to perform traceroute measurements from the target point-of-view to a given destination for path performance monitoring and troubleshooting purposes. We propose two techniques for probe location, and demonstrate that the reverse path (from server to users) can be measured with very high accuracy in certain scenarios. We also analyze relevant characteristics of Internet paths and distributed measurement platforms, which reinforce the applicability and relevance of DisNETPerf in current Internet. [less ▲]

Detailed reference viewed: 117 (31 ULiège)
Full Text
See detailReverse Traceroute with DisNETPerf, a Distributed Internet Paths Performance Analyzer
Wassermann, Sarah ULiege; Casas, Pedro

in Proc. Demonstrations of the 41th Annual IEEE Conference on Local Computer Networks (LCN-Demos 2016) (2016)

Traceroute is the most widely used Internet path diagnosis tool today. A major limitation of traceroute when the destination is not controllable by the user is its inability to measure reverse paths, i.e ... [more ▼]

Traceroute is the most widely used Internet path diagnosis tool today. A major limitation of traceroute when the destination is not controllable by the user is its inability to measure reverse paths, i.e., the path from a destination back to the source. In this demo session, we showcase DisNETPerf, a new tool to perform reverse traceroute measurements. DisNETPerf is able to collect measurements from the server to the user for path performance monitoring and troubleshooting purposes, even when the server is not under the control of the experimenter. DisNETPerf uses RIPE Atlas, a largely distributed active measurements platform to perform traceroute measurements from any arbitrarily selected server in the Internet. [less ▲]

Detailed reference viewed: 62 (2 ULiège)
Full Text
Peer Reviewed
See detailUnveiling Network and Service Performance Degradation in the Wild with mPlane
Casas, Pedro; Fiadino, Pierdomenico; Wassermann, Sarah ULiege et al

in IEEE Communications Magazine - Network Testing Series (2016)

Unveiling network and service performance issues in complex and highly decentralized systems such as the Internet is a major challenge. Indeed, the Internet is based on decentralization and diversity ... [more ▼]

Unveiling network and service performance issues in complex and highly decentralized systems such as the Internet is a major challenge. Indeed, the Internet is based on decentralization and diversity. However, its distributed nature leads to operational brittleness and difficulty in identifying the root causes of performance degradation. In such a context, network measurements are a fundamental pillar to shed light and to unveil design and implementation defects. To tackle this fragmentation and visibility problem, we have recently conceived mPlane, a distributed measurement platform which runs, collects and analyses traffic measurements to study the operation and functioning of the Internet. In this paper, we show the potentiality of the mPlane approach to unveil network and service degradation issues in live, operational networks, involving both fixed-line and cellular networks. In particular, we combine active and passive measurements to troubleshoot problems in end-customer Internet access connections, or to automatically detect and diagnose anomalies in Internet-scale services (e.g., YouTube) which impact a large number of end-users. [less ▲]

Detailed reference viewed: 84 (5 ULiège)
Full Text
Peer Reviewed
See detailTowards DisNETPerf: a Distributed Internet Paths Performance Analyzer
Wassermann, Sarah ULiege; Casas, Pedro; Donnet, Benoît ULiege

in ACM CoNEXT Student Workshop: Heidelberg 1 décembre 2015 (2015, December)

For more than 25 years now, traceroute has demonstrated its supremacy for network-path measurement, becoming the most widely used Internet path diagnosis tool today. A major limitation of traceroute when ... [more ▼]

For more than 25 years now, traceroute has demonstrated its supremacy for network-path measurement, becoming the most widely used Internet path diagnosis tool today. A major limitation of traceroute when the destination is not controllable by the user is its inability to measure reverse paths, i.e., the path from a destination back to the source. Proposed techniques to address this issue rely on IP address spoofing, which might lead to security concerns. In this paper we introduce and evaluate DisNETPerf, a new tool for locating probes that are the closest to a distant server. Those probes are then used to collect data from the server point-of-view to the service user for path performance monitoring and troubleshooting purposes. We propose two techniques for probe location, and demonstrate that the reverse path can be measured with very high accuracy in certain scenarios. [less ▲]

Detailed reference viewed: 226 (55 ULiège)