Research

Current Projects

Fairness in Machine Learning with Human in the Loop (NSF/Amazon FAI, 2021-2024)

Abstract
Despite early successes and significant potential, algorithmic decision-making systems often inherit and encode biases that exist in the training data and/or the training process.  It is thus important to understand the consequences of deploying and using machine learning models and provide algorithmic treatments to ensure that such techniques will ultimately serve a social good. While recent works have looked into the fairness issues in AI, the long-term impact of automated decision making remains unclear. The understanding of the long-term impact of a fair decision provides guidelines to policy-makers when deploying an algorithmic model in practice and is critical to its trustworthiness and adoption; it will also drive the design of algorithms with an eye toward the welfare of both the makers and the users of these algorithms, with an ultimate goal of achieving more equitable outcomes.  This project aims to understand the long-term impact of fair decisions made by automated machine learning algorithms via establishing an analytical, algorithmic, and experimental framework that captures the sequential learning and decision process, the actions and dynamics of the underlying user population, and its welfare. This knowledge will help design the right fairness criteria and intervention mechanisms throughout the life cycle of the decision-action loop to ensure long-term equitable outcomes. Central to this project’s intellectual inquiry is the focus on human in the loop, i.e., an AI-human feedback loop with automated decision-making that involves human participation in its life cycle. Our focus on long-term impacts of fair algorithmic decision-making while explicitly modeling and incorporating human agents in the loop provides a theoretically rigorous framework to understand how an algorithmic decision-maker fares in the foreseeable future.

A Deep Learning Framework for Intelligent Active and Passive Measurements in the Age of Internet of Things (NSF SaTC, 2020-2023) 

Abstract
The rapid proliferation of Internet-connected devices, specially Internet-of-Things (IoT) devices, has led to mounting concerns regarding their security and the security of the Internet. This project seeks to harness the power of big data analytics and machine/deep learning to enhance Internet measurement techniques and associated information processing, to make them more scalable, efficient, and produce more actionable information. It aims to develop techniques for automated monitoring and data analysis to gain insight into the range of Internet-connected devices, their security vulnerabilities, and the ever-changing activities of malicious entities on the public Internet. The ensuing information will help software/hardware vendors and Internet-connected entities identify vulnerabilities and protect themselves against cyber-attacks, and move toward a more secure and transparent Internet. Technologically, this project aims to significantly advance the state of the art in using active and passive measurements to (1) effectively monitor and track Internet devices, (2) accelerate scanning and improve their efficacy, and design and develop an intelligent honeypot that can learn responses mimicking a wide range of vulnerable devices, in order to fool attackers into engaging and revealing their attack vector. The project seeks to develop software, as well as deep learning and other machine learning models to build new Internet measurement capabilities and process datasets captured from passive/active measurements to distill data consumable by machine learning algorithms and instrumental in security analysis and network monitoring. The resulting automated tools can monitor the Internet in a continuous manner, to maintain an up-to-date view of the devices/machines that comprise the Internet, susceptible and infected devices, and vulnerabilities that are being actively exploited in-the-wild. The final result of this project is a generalized framework of interconnected components that applies deep learning to active/passive network measurements to gain actionable insights with respect to the Internet and its security, a set of scalable tools that model and enable real-time decision making regarding Internet addresses and network traffic, and a large number of raw and curated datasets shared with the research community, while protecting the privacy and security of all parties involved. Automatically detecting software/hardware vulnerabilities as exploits are observed by these techniques allows vendors and network administrators to address critical vulnerabilities, while enhancing intrusion detection and DDoS mitigation techniques. The data can also transform risk assessment techniques for gauging the security of networks by exposing host-level risk factors, for self-assessment as well as assisting third-party assessment, e.g., by security vendors and cyber insurance underwriters.

Network-Level Security Posture Assessment and Predictive Analytics: From Theory to Practice (NSF SaTC TTP, 2016-2020)

Abstract
This project aims to address the following two questions (1) how to assess the security condition of a network, and (2) to what extend can we predict data breaches or other cyber security incidents for an organization. The ability to do so has far-reaching social and economic impact: data has become an evermore important asset in any business, and the recent data breaches such as those at Target, JP Morgan, Home Depot, Office of Personnel Management (OPM), and Anthem Healthcare, to name just a few, highlight the increasing social and economic impact of such cyber incidents. Often, by the time a breach is detected, it is too late and the damage already occurred. Consequently, being able to predict such incidents accurately can greatly enhance an organization’s ability to put preventative and proactive measures in place and make much more judicious and effective resource allocation decisions in doing so. In addition, the answer to these questions has enormous implications on policy design, not only security policies, but also various incentive mechanisms aimed at encouraging the adoption of better security policies and cybersecurity frameworks, including cyber insurance, liability limitation, and rate recovery among others, see e.g., the Presidential Policy Directive PPD-21. This project follows a comprehensive agenda aimed at transitioning to practice technologies developed by the research team over the past few years in the domain of quantitative assessment of security posture on a network or organizational level, and the use of such assessment for the purpose of forecasting cyber security incident. The technological innovation is a sound quantitative framework that combines a large collection of cybersecurity data, novel data processing methods, advanced machine learning techniques, and extensive cybersecurity domain expertise, to produce accurate prediction on security incidents for a given organization, thereby providing tangible information and crucial input for decision makers such as an insurance underwriter, or an enterprise customer seeking to validate a vendor.

EAGER: Theory and Practice of Risk-Informed Cyber Insurance Policies: Risk Dependency, Risk Aggregation, and Active Threat Landscape (NSF, 2019-2021)

Abstract
This project aims to tackle some of the most significant challenges facing the design and adoption of risk-informed cyber insurance policies; these challenges include cyber risk interdependence, correlated risk and value-at-risk, and a fast-changing threat landscape. The proposed research has the potential to bring about a paradigm shift in the design of cyber insurance policies so that they are used as effective economic and incentive mechanisms consistent with cyber risk realities; in doing so it also introduces new ways of thinking about cybersecurity in a holistic, risk management context. Consequently, the proposed research has direct impact on the current practice by cyber insurance carriers and thus the potential to dramatically change the status quo. It has broader impacts on public policy and incentive mechanism design aimed at encouraging the adoption of better cybersecurity frameworks.

Connected Testbeds for Connected Vehicles (NSF CPS Synergy)

Abstract
This research team envisions that connected testbeds, i.e., remotely accessible testbeds integrated over a network in closed loop, will provide an affordable, repeatable, scalable, and high-fidelity solution for early cyber-physical evaluation of connected automated vehicle (CAV) technologies. Engineering testbeds are critical for empirical validation of new concepts and transitioning new theory to practice. However, the high cost of establishing new testbeds or scaling the existing ones up hinders their wide utilization. Enabling high-fidelity cyber-integration of existing but geographically dispersed testbeds can dramatically increase accessibility to engineering experimentation, just as the internet dramatically increased accessibility to information. This project aims to develop a scientific foundation to support this vision and demonstrate its utility for developing CAV technologies. This application is significant, because a synergistic combination of connected vehicles and automated driving technologies is poised to transform the sustainability of our transportation system; automated driving technologies can leverage the information available from vehicle-to-vehicle (V2V) connectivity in optimal ways to dramatically reduce fuel consumption and emissions. However, state-of-the-art simulation and experimental capabilities fall short of addressing the need for realistic, repeatable, scalable, and affordable means to evaluate new CAV concepts and technologies. On the one hand, purely simulation based studies could be off by as much as 27% in terms of fuel economy and as much as 350% in terms of emissions. On the other hand, experimental studies with fleets of vehicles provide are very expensive and not easily repeatable. In addition, the literature extensively exploits connectivity to improve traffic flow, but there is also a vast untapped potential for leveraging the information available from connectivity at the powertrain level to increase sustainability. Thus, the goal of this project is to enable a high-fidelity integration of geographically dispersed powertrain testbeds and use this novel experimental capability to develop and test powertrain-level strategies to increase sustainability benefits of CAVs.

Other current projects:

Past Projects