Automated Recovery of Issue-Commit Links Leveraging Both Textual and Non-textual Data
Introduction
Issue reports document discussions around required changes in issue-tracking systems, while commits contain the actual code changes in version control systems. Recovering links between issues and commits facilitates many software evolution tasks such as bug localization, defect prediction, software quality measurement, and documentation.
A previous study on over half a million GitHub issues showed that only about 42.2% of issues are manually linked by developers to their related commits. Automating the linking of commit-issue pairs can significantly improve software maintenance tasks. However, current state-of-the-art approaches suffer from low precision, leading to unreliable results, and perform poorly when there's a lack of textual information in commits or issues.
This article presents Hybrid-Linker, an enhanced approach that overcomes these limitations by exploiting both textual and non-textual data channels:
- A non-textual-based component that operates on automatically recorded metadata of commit-issue pairs
- A textual-based component that analyzes the textual content of commits and issues
By combining results from these two classifiers, Hybrid-Linker makes the final prediction, with one component filling gaps when the other falls short.