Using Normalized Compression Distance for Classifying File Fragments

Document type: Conference Papers
Peer reviewed: Yes
Author(s): Stefan Axelsson
Title: Using Normalized Compression Distance for Classifying File Fragments
Conference name: 5th International Conference on Availability, Reliability and Security
Year: 2010
Pagination: 641-646
ISBN: 978-0-7695-3965-2
Publisher: IEEE
City: Cracow
URI/DOI: 10.1109/ARES.2010.100
ISI number: 000278197800098
Organization: Blekinge Institute of Technology
Department: School of Computing (Sektionen för datavetenskap och kommunikation)
School of Computing S-371 79 Karlskrona
+46 455 38 50 00
Language: English
Abstract: We have applied the generalised and universal distance measure NCD-Normalised Compression Distance-to the problem of determining the types of file fragments via example. A corpus of files that can be redistributed to other researchers in the field was developed and the NCD algorithm using k-nearest-neighbour as the classification algorithm was applied to a random selection of file fragments. The experiment covered circa 2000 fragments from 17 different file types. While the overall accuracy of the n-valued classification only improved the prior probability of the class from approximately 6% to circa 50% overall, the classifier reached accuracies of 85%-100% for the most successful file types.
Subject: Software Engineering\General