Waqas Mahmood; Muhammad Faheem Akhtar MSC-2009:22, pp. 62. COM/School of Computing, 2009.
Software security has always been an afterthought in software development which results into insecure software. Companies rely on penetration testing for detecting security vulnerabilities in their software. However, incorporating security at early stage of development reduces cost and overhead. Static code analysis can be applied at implementation phase of software development life cycle. Applying machine learning and visualization for static code analysis is a novel idea. Technique can learn patterns by normalized compression distance NCD and classify source code into correct or faulty usage on the basis of training instances. Visualization also helps to classify code fragments according to their associated colors. A prototype was developed to implement this technique called Code Distance Visualizer CDV. In order test the efficiency of this technique empirical validation is required. In this research we conduct series of experiments to test its efficiency. We use real life open source software as our test subjects. We also collected bugs from their corresponding bug reporting repositories as well as faulty and correct version of source code. We train CDV by marking correct and faulty version of code fragments. On the basis of these trainings CDV classifies other code fragments as correct or faulty. We measured its fault detection ratio, false negative and false positive ratio. The outcome shows that this technique is efficient in defect detection and has low number of false alarms.