Data Sets

End User License Agreements (EULA) Database

The EULA database features 996 instances of end user licence agreement documents classified as either good (associated with legitimate software) or bad (associated with spyware). The good EULAs have been extracted from applications downloaded from and the bad applications have been downloaded by following links provided by If you use this database, please cite the following article:

Niklas Lavesson, Martin Boldt, Paul Davidsson, and Andreas Jacobsson, "Learning to Detect Spyware Using End User License Agreements," Knowledge and Information Systems, vol. 26, no. 2, pp. 285-307, DOI: 10.1007/s10115-009-0278-z, Springer, 2011

eula-raw.arff - The 'raw' data set, which features the complete EULA text documents and their classification. More information is provided in the actual arff file.

eula-freq.arff - The word frequencies of the documents included in eula-raw.arff. More information is provided in the actual arff file.

NOTICE: The 'raw' data set features complete documents in the sense that the actual order of words is preserved. However, all special characters except for spaces have been removed. The collection of original end user license agreement documents can be downloaded by following this link.

The EULA Analyzer Meta Model is a data set that features all of the 996 instances mentioned above, represented by 10 metrics provided by the EULA Analyzer web service (




Open Data for Anomaly Detection in Maritime Surveillance

Click here to download a collection of links to open data in the maritime-domain

User-oriented Understandability Survey Appendices

Click here to download the survey's classification model and questionnaire appendices.



Share Dela