You are all cordially invited to the AMLab seminar on Tuesday December 12 at 16:00 in C3.163, where Giorgio Patrini will give a talk titled “Federated learning on vertically partitioned data via entity resolution and homomorphic encryption”. Afterwards there are the usual drinks and snacks!
Consider two data providers, each maintaining private records of different feature sets about common entities. They aim to learn a linear model jointly in a federated setting, namely, data is local and a shared model is trained from locally computed updates. In contrast with most work on distributed learning, in this scenario (i) data is split vertically, i.e. by features, (ii) only one data provider knows the target variable and (iii) entities are not linked across the data providers. Hence, to the challenge of private learning, we add the potentially negative consequences of mistakes in entity resolution.
Our contribution is twofold. First, we describe a three-party end-to-end solution in two phases — privacy-preserving entity resolution and federated logistic regression over messages encrypted with an additively homomorphic scheme — , secure against a honest-but-curious adversary. The system allows learning without either exposing data in the clear or sharing which entities the data providers have in common. Our implementation is as accurate as a naive non-private solution that brings all data in one place, and scales to problems with millions of entities with hundreds of features. Second, we provide a formal analysis of the impact of entity resolution on learning.