Social Graph Anonymization

Huu-Hiep Nguyen

Résumé

Privacy is a serious concern of users in daily usage of social networks. Social networks are a valuable data source for large-scale studies on social organization and evolution and are usually published in anonymized forms. This thesis addresses three privacy problems of social networks: graph anonymization, private community detection and private link exchange. First, we tackle the problem of graph anonymization via uncertainty semantics and differential privacy. As for uncertainty semantics, we propose a general obfuscation model called Uncertain Adjacency Matrix (UAM) that keep expected node degrees equal to those in the unanonymized graph. We analyze two recently proposed schemes and show their fitting into the model. We also present our scheme Maximum Variance (MaxVar) to fill the gap between them. Using differential privacy, the problem is very challenging because of the huge output space of noisy graphs. A large body of existing schemes on differentially private release of graphs are not consistent with increasing privacy budgets as well as do not clarify the upper bounds of privacy budgets. In this thesis, such a bound is provided. We introduce the new linear scheme Top-m-Filter (TmF) and improve the existing technique EdgeFlip. Thorough comparative evaluation on a wide range of graphs provides a panorama of the state-of-the-art's performance as well as validates our proposed schemes. Second, we present the problem of community detection under differential privacy. We analyze the major challenges behind the problem and propose several schemes to tackle them from two perspectives: input perturbation (LouvainDP) and algorithm perturbation (ModDivisive)

La vie privée est une préoccupation des utilisateurs des réseaux sociaux. Les réseaux sociaux sont une source de données précieuses pour des analyses scientifiques ou commerciales. Cette thèse aborde trois problèmes de confidentialité des réseaux sociaux: l'anonymisation de graphes sociaux, la détection de communautés privées et l'échange de liens privés. Nous abordons le problème d'anonymisation de graphes via la sémantique de l'incertitude et l'intimité différentielle. Pour la première, nous proposons un modèle général appelé Uncertain Adjacency Matrix (UAM) qui préserve dans le graphe anonymisé les degrés des nœuds du graphe non-anonymisé. Nous analysons deux schémas proposés récemment et montrons leur adaptation dans notre modèle. Nous aussi présentons notre approche dite MaxVar. Pour la technique d'intimité différentielle, le problème devient difficile en raison de l'énorme espace des graphes anonymisés possibles. Un grand nombre de systèmes existants ne permettent pas de relâcher le budget contrôlant la vie privée, ni de déterminer sa borne supérieure. Dans notre approche nous pouvons calculer cette borne. Nous introduisons le nouveau schéma Top-m-Filter de complexité linéaire et améliorons la technique récente EdgeFlip. L'évaluation de ces algorithmes sur une large gamme de graphes donne un panorama de l'état de l'art. Nous présentons le problème original de la détection de la communauté dans le cadre de l'intimité différentielle. Nous analysons les défis majeurs du problème et nous proposons quelques approches pour les aborder sous deux angles: par perturbation d'entrée (schéma LouvainDP) et par perturbation d'algorithme (schéma ModDivisive)

Social Graph Anonymization

Anonymisation de graphes sociaux

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager