Infrastructure and Algorithms for Information Retrieval Based On Social Network Analysis/Mining

Mohamed Reda Bouadjenek

Résumé

Nowadays, the Web has evolved from a static Web where users were only able to consume information, to a Web where users are also able to produce information. This evolution is commonly known as Social Web or Web 2.0. Social platforms and networks are certainly the most adopted technologies in this new era. These platforms are commonly used as a means to interact with peers, exchange messages, share resources, etc. Thus, these collaborative tasks that make users more active in generating content are one of the most important factors for the increasingly growing quantity of available data. From the research perspective, this brings important and interesting challenges for many research fields. In such a context, a mostly crucial problem is to enable users to find relevant information with respect to their interests and needs. This task is commonly referred to as Information Retrieval (IR). IR is performed every day in an obvious way over the Web, typically under a search engine. However, classic models of IR don’t consider the social dimension of the Web. They model web pages as a mixture of a static homogeneous terms generated by the same creators. Then, ranking algorithms are often based on: (i) a query and document text similarity and (ii) the existing hypertext links that connect these web pages, e.g. PageRank. Therefore, classic models of IR and even the IR paradigm should be adapted to the socialization of the Web, in order to fully leverage the social context that surround web pages and users. This thesis presents many approaches that go in this direction. In particular, three methods are introduced in this thesis: (i) a Personalized Social Query Expansion (PSQE) framework, which achieves social and personalized expansions of a query with respect to each user, i.e. for the same query, different users will obtain different expanded queries. (ii) a Personalized Social Document Representation (PSDR) framework that uses social information to enhance, improve and provide a personalized social representation of documents to each user. (iii) a Social Personalized Ranking function called SoPRa, which takes into account social features that are related to users and documents. All these approaches have the particularity of being scalable to large-scale datasets, flexible and adaptable according to the high dynamicity of social data, and efficient since they have been intensively evaluated and compared to the closest works. From a practical point of view, this thesis led to the development of an experimental social Web search engine called LAICOS that includes all the algorithms developed throughout this thesis.

Avec l’émergence du Web social, le Web a évolué d’un Web statique, où les utilisateurs étaient seulement capables de consommer de l’information, à un Web où les utilisateurs sont aussi capables de produire de l’information. Cette évolution est connue comme le Web social ou Web 2.0. Ainsi, le Web 2.0 a introduit de nouvelles libertés à l’utilisateur dans sa relation avec le Web, en lui permettant d’interagir avec d’autres utilisateurs qui ont les mêmes centres d’intérêts. Les plateformes et les réseaux sociaux (tel que MySpace, Facebook, ou LinkedIn), les plateformes de tagging collaborative (tel que CiteULike, Flickr, ou delicious), sont certainement les technologies les plus adoptées dans ce nouveau contexte. Ces plateformes permettent aux utilisateurs d’interagir, d’échanger des messages, de partager des ressources (photos et vidéos), commenter des information, créer et maintenir des profiles, interagir via des applications, etc. En plus de ces plateformes sociales dédiées à l’interaction entre les utilisateurs, les sites Web traditionnels qui sont dédiés à fournir de l’information (tel que les journaux en lignes) tendent à devenir plus sociales en fournissant des moyens aux utilisateurs pour partager, commenter, construire, et les lier des documents [AYLY09, AYHY09], e.g. via le bouton j’aime de Facebook. Ceci a aussi été facilité grâce à des initiatives tel que OpenID 1 et OpenSocial 2 . Ainsi, ces taches collaboratives permettant à l’utilisateur d’être plus actif dans la génération du contenu sont l’un des facteurs les plus importants dans l’accroissement constant des données 3 . Du point de vue de la recherche, cela pose des défis importants et intéressants pour de nombreux domaines de recherche comme: la recherche d’information, les bases de données, la fouille de données, etc., où les axes de recherche sont principalement entraîné par: (i) l’énorme quantité de données disponibles et (ii) les connaissances potentiellement utiles, latentes dans ces données.

Infrastructure and Algorithms for Information Retrieval Based On Social Network Analysis/Mining

Infrastructure et Algorithmes pour la Recherche d’Information Basés sur l’Analyse des Réseaux Sociaux.

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager