Domain-based Latent Personal Analysis (LPA) and its uses
Updated: Jun 8, 2021
LPA is an easy-to-use and fast domain-based spectral signature that can be used in a variety of domains that is tailored for big-data computations. It creates a strong and unique signature of both over-used and missing items that identify a user or a component in a domain. It can be used in a variety of domains, from textual to computational.
Paper: Mokryn, O., & Ben-Shoshan, H. (2020). Domain-based Latent Personal Analysis and its use for impersonation detection in social media. Accepted to User Modeling and User Adapt Interaction (UMUAI) May 2021. ArXiv preprint arXiv:2004.02346. (PDF)
Domain-based Latent Personal Analysis and its use for impersonation detection in social media Zipf's law defines an inverse proportion between a word's ranking in a given corpus and its frequency in it, roughly dividing the vocabulary to frequent (popular) words and infrequent ones. Here, we stipulate that within a domain an author's signature can be derived from, in loose terms, the author's missing popular words and frequently used infrequent words. We devise a method, termed Latent Personal Analysis (LPA), for finding such domain-based personal signatures. LPA determines what words most contributed to the distance between a user's vocabulary from the domain's. We identify the most suitable distance metric for the method among several and construct a personal signature for authors. We validate the correctness and power of the signatures in identifying authors and utilize LPA to identify two types of impersonation in social media: (1) authors with sockpuppets (multiple) accounts; (2) front-user accounts, operated by several authors. We validate the algorithms and employ them over a large scale dataset obtained from a social media site with over 4000 accounts, and corroborate the results employing temporal rate analysis. LPA can be used to devise personal signatures in a wide range of scientific domains in which the constituents have a long-tail distribution of elements.
LPA for music: Spotify 2017 streaming
Authorship attribution applications:
Impersonation on social media - Sockpuppets accounts
LPA is fast and easy to implement at a large-scale. We then deployed it over 4000 IMDb reviewer accounts to find sockpuppets accounts activated by a single author. We then performed a 4000x4000 similarity measured between all accounts' LPA signatures.
The full list of suspected Sockpuppets
User 59775972 (joshuadrake-39480) and User 62431316 (joshuadrake-91275) have 502 common terms in their LPA-signatures and the distance between their signatures is 0.013.
User 24051675 (jpachar82) and User 53564354 (jasonpachar) have 584 common terms in their LPA-signatures and the distance between their signatures is 0.2965892