SPACE Lab - Boston University

Systems Databases Static Analysis

Applications must abide by a variety of security, privacy, and data use policies. These policies may reflect business requirements, codify end-user terms of service, or reflect requirements from data protection laws, such as the GDPR.

Currently, applications developers often rely on ad-hoc approaches to achieve this. For example, by adding explicit policy checks or implicitly reflecting these requirements in their code. However, this approach is error-prone and tedious and often causes serious violations that result in reputation harms or heavy fines.

The goal of this research direction is to assist application developer in ensuring their applications meet their desired policies. We achieve this by building new off-the-shelf systems and abstractions, such as databases, distributed systems, web frameworks, and compilation toolchains, that provide compliance and enforcement guarantees by design. We carefully design these systems to be easy to use, expressive, and efficient.

Scalable web applications are built on complex distributed architectures using remote and micro services, which makes privacy and security properties hard to reason about. This leads to oversight, inconsistent enforcement and data leaks. Existing enforcement and compliance solutions fail to accommodate needs of common distributed deployments, such as flexibility and performance.

Tahini is a framework that provides end-to-end policy enforcement in distributed applications. It builds on local process-level enforcement systems, such as Sesame, and extends their enforcement guarantees to the entire distributed application end-to-end.

Tahini upholds end-to-end guarantees while keeping services decoupled by providing a safe and explicit policy transformation abstraction, and a lightweight, non-intrusive attestation protocol that ensures that caller-approved policy configurations are used by the remote servers at runtime.

Systems like Resin and Sesame track policies and data within the application process and ensure data leaves the process only when the policy allows it. This works well for simple applications, but is not ideal for applications where data leaves the boundary of a process (but not the application) frequently. For example, when the application uses frequently uses underlying databases or processing systems to store and query data.

We aim to extend the protections of policy enforcement systems to the entire application, including databases and query engines, rather than just a single application process. This provides stronger guarantees, ensures policies are persisted along its data for long-term storage, and radically improves ergonomics for application developers.

Using user data to train machine learning models complicate user expectations and legal requirements around data deletion. Even if the data is deleted, its effects in (and the potential to reconstruct it from) machine learning models remain. Recent advances such as SISA and selective scrubbing limit how much influence a single data item in the training set has on a model and enable "unlearning" it after training.

Building on our earlier work on K9db, we are building an end-to-end system that automatically manages and tracks user data used in model training, and integrates unlearning in data deletion and subject access requests workflows, while retaining good model performance and utility.

We are interested in the interplay of unlearning with anonymization and statistical privacy techniques, such as differential privacy, and how we can use our work to influence and inform the attitudes of end-users, companies, and regulators on data deletion in practice.

Recent work surveyed several existing compliance and enforcement systems, and concluded that no system supports the totality of the requirements imposed by the GDPR (e.g. data deletion and subject access requests, purpose limitation, consent handling, transparency, etc).

Is it possible for a system to cover all these requirements? We plan to investigate this by creating a system that tightly couples K9db and Sesame offering a single easy-to-use interface for specifying all compliance-related policies.

We will use this system as a starting point, and identify and add new components that may be needed to cover important compliance requirements that are left unresolved. Evaluating such a system likely requires conducting a serious developer study to measure how much of the GDPR it effectively helps developers cover in practice.

Applications are required to transparently inform end users of the scope and purpose for which data is collected. They also must acquire and track informed consent as well as other privacy and security preferences from end users. This is true whether applications use ad-hoc approaches or compliance systems, such as K9db or Sesame.

Manually designing human-facing privacy policy descriptions and implementing code to collect and manage end-user consent is error-prone. Human-facing policies may become obsolete after application changes, and bugs in consent and preferences collection or management may cause user's preferences to be violated. Furthermore, end-users have little guarantees that the application indeed abides by the privacy policy or that it respects the stated user perferences.

We aim to build end-user-facing extensions for server-side compliance systems. These systems take on the responsibility of managing and collecting user preferences and generating human-readable privacy policies corresponding to the formal policies governing the application. We will also investigate extending policy enforcement to client-side application code and using remote attestation to give the end-user strong guarantees about the application server-side code and behavior.

Web applications are governed by privacy policies, but developers lack practical abstractions to ensure that their code actually abides by these policies. This leads to frequent oversights, bugs, and costly privacy violations.

Sesame is a practical framework for end-to-end privacy policy enforcement. Sesame wraps data in policy containers that associate data with policies that govern its use. Policy containers force developers to use privacy regions when operating on the data, and Sesame combines sandboxing and a novel static analysis to prevent privacy regions from leaking data. Sesame enforces a policy check before externalizing data, and it supports custom I/O via reviewed, signed code.

Experience with four web applications shows that Sesame's automated guarantees cover 95% of application code, with the remaining 5% needing manual review. Sesame achieves this with reasonable application developer effort and imposes 3--10% performance overhead (10--55% with sandboxes).

Data privacy laws like the EU's GDPR grant users new rights, such as the right to request access to and deletion of their data. Manual compliance with these requests is error-prone and imposes costly burdens especially on smaller organizations, as non-compliance risks steep fines.

K9db is a new, MySQL-compatible database that complies with privacy laws by construction. The key idea is to make the data ownership and sharing semantics explicit in the storage system. This requires K9db to capture and enforce applications' complex data ownership and sharing semantics, but in exchange simplifies privacy compliance. Using a small set of schema annotations, K9db infers storage organization, generates procedures for data retrieval and deletion, and reports compliance errors if an application risks violating the GDPR.

Our K9db prototype successfully expresses the data sharing semantics of real web applications, and guides developers to getting privacy compliance right. K9db also matches or exceeds the performance of existing storage systems, at the cost of a modest increase in state size.

Privacy Enforcement and Compliance Web Systems

Active Projects

Tahini: End-to-end policy enforcement in Distributed Applications Systems

SesameBun: Policy Tracking and Permanence in SQL-backed Web Applications Systems Databases

Data Deletion and Machine Unlearning Systems Databases Machine Learning

Future Ideas

Extending Policy Enforcement Systems to End-Users Systems Usable Security & Privacy

Past Projects

Sesame: Practical End-to-End Privacy Compliance with Policy Containers and Privacy Regions Systems Static Analysis

K9db: Privacy-Compliant Storage For Web Applications By Construction Systems Databases