Learning to detect spyware using end user license agreements

Document type: Journal Articles
Article type: Original article
Peer reviewed: Yes
Full text:
Author(s): Niklas Lavesson, Martin Boldt, Paul Davidsson, Andreas Jacobsson
Title: Learning to detect spyware using end user license agreements
Translated title: Detektion av spionprogram genom inlärning av slutanvändarlicenser
Journal: Knowledge and Information Systems
Year: 2011
Volume: 26
Issue: 2
Pagination: 285-307
ISSN: 0219-1377
Publisher: Springer London
URI/DOI: 10.1007/s10115-009-0278-z
ISI number: 000286211500005
Organization: Blekinge Institute of Technology
Department: School of Computing (Sektionen för datavetenskap och kommunikation)
School of Computing S-371 79 Karlskrona
+46 455 38 50 00
Authors e-mail: Niklas.Lavesson@bth.se, Martin.Boldt@bth.se, Paul.Davidsson@bth.se, Andreas.Jacobsson@mah.se
Language: English
Abstract: The amount of software that hosts spyware has increased dramatically. To avoid legal repercussions, the vendors need to inform users about inclusion of spyware via end user license agreements (EULAs) during the installation of an application. However, this information is intentionally written in a way that is hard for users to comprehend. We investigate how to automatically discriminate between legitimate software and spyware associated software by mining EULAs. For this purpose, we compile a data set consisting of 996 EULAs out of which 9.6% are associated to spyware. We compare the performance of 17 learning algorithms with that of a baseline algorithm on two data sets based on a bag-of-words and a meta data model. The majority of learning algorithms significantly outperform the baseline regardless of which data representation is used. However, a non-parametric test indicates that bag-of-words is more suitable than the meta model. Our conclusion is that automatic EULA classification can be applied to assist users in making informed decisions about whether to install an application without having read the EULA. We therefore outline the design of a spyware prevention tool and suggest how to select suitable learning algorithms for the tool by using a multi-criteria evaluation approach.
Subject: Computer Science\Artificial Intelligence
Computer Science\Electronic security
Computer Science\General
Keywords: End user license agreement, Document classification, Spyware