As our world becomes ever more dependent on computers and the code that has become the invisible thread that runs through every fabric of our society, our generation has the unique ability to connect, learn, explore, build and create at exponential levels. We at QbitLogic, are investing in the future of technology and all that humanity has yet to accomplish by building a safer, more secure, and more reliable world for coding.
Given the advances in the field of Machine Learning, the proliferation of open source software (so called "Big Code"), and the increased availability of computation resources, it is now possible for Machine Learning algorithms to solve problems related to the quality of software itself, including automatically finding and fixing security vulnerabilities. In order to train Machine Learning architectures on Big Code, the patterns of security vulnerabilities in the source code of open source software repositories need to be filtered and labeled. This goal can be achieved either by analyzing the metadata of the repositories, i.e. commit messages, bug tracking systems, email archives, or by directly analyzing the source code itself.
Your challenge is to mine the commit log of the Apache Httpd project, which can be obtained either by downloading the provided archive file here or by running the following commands in a bash shell:
$ svn checkout http://svn.apache.org/repos/asf/httpd/httpd/branches/trunk/ $ cd trunk/ $ svn log -l1736592 -v http://svn.apache.org/repos/asf/httpd/httpd/trunk/ > ../apache_httpd_commit_messages.txt
By leveraging various data mining techniques, identify clusters of bug-fixing commits in the Apache Httpd Commit Log, which fix security vulnerabilities from the C Common Weakness Enumeration
- A categorized list of Commit IDs and associated Commit Messages, which fix security vulnerabilities in the C Common Weakness Enumeration. (required)
- A graphical respresentation of the commonly occuring security vulnerability related strings in the Apache Httpd commit log. (optional)
- Any source code written to produce results 1 and 2.(required)
- A ReadMe file describing how to run the source code on the Apache Commit Log to reproduce results 1 and 2. The ReadMe should also contain a discussion of the import design decisions involved in producing the prototype. (required)
Please email your results to firstname.lastname@example.org.
Submissions will be evaluated based upon the quantity and quality of bug-fixing examples and the creativity of the solution. If you have any further questions, please do not hesitate to contact us at email@example.com.