Sampling program quality

September, 2010

Abstract

Many modern software systems are large, consisting of hundreds or even thousands of programs (source files). Understanding the overall quality of these programs is a resource and time-consuming activity. It is desirable to have a quick yet accurate estimation of the overall program quality in a cost-effective manner. In this paper, we propose a sampling based approach - for a large software project, we only sample a small percentage of source files, and then estimate the quality of the entire programs in the project based on the characteristics of the sample. Through experiments on public defect datasets, we show that we can successfully estimate the total number of defects, proportions of defective programs, defect distributions, and defect-proneness - all from a small sample of programs. Our experiments also show that small samples can achieve similar prediction accuracies as larger samples do.

Type

Conference paper

Publication

In IEEE International Conference on Software Maintenance

Rongxin Wu

Associate Professor

I am currently an associate professor in the department of computer science and technology at Xiamen University. My research interests include software security, program analysis, and software engineering.