Using Data Mining for Static Code Analysis of C

Document type: Conference Papers
Peer reviewed: Yes
Full text:
Author(s): Hannes Tribus, Irene Morrigl, Stefan Axelsson
Title: Using Data Mining for Static Code Analysis of C
Conference name: 8th International Conference on Advanced Data Mining and Applications (ADMA 2012)
Year: 2012
Publisher: Springer
City: Nanjing, China
Organization: Blekinge Institute of Technology
Department: School of Computing (Sektionen för datavetenskap och kommunikation)
School of Computing S-371 79 Karlskrona
+46 455 38 50 00
Authors e-mail:
Language: English
Abstract: Static analysis of source code is one way to find bugs and problems in large software projects. Many approaches to static analysis have been proposed. We proposed a novel way of performing static analysis. Instead of methods based on semantic/logic analysis we apply machine learning directly to the problem. This has many benefits. Learning by example means trivial programmer adaptability (a problem with many other approaches), learning systems also has the advantage to be able to generalise and find problematic source code constructs that are not exactly as the programmer initially thought, to name a few. Due to the general interest in code quality and the availability of large open source code bases as test and development data, we believe this problem should be of interest to the larger data mining community. In this work we extend our previous approach and investigate a new way of doing feature selection and test the suitability of many different learning algorithms. This on a selection of problems we adapted from large publicly available open source projects. Many algorithms were much more successful than our previous proof-of-concept, and deliver practical levels of performance. This is clearly an interesting and minable problem.
Subject: Computer Science\Artificial Intelligence
Computer Science\General
Keywords: software engineering, static analysis, application