Recovering Trace Links Between Software Documentation And Code

by Jan Keim , Sophie Corallo , Dominik Fuchß , Tobias Hey , Tobias Telge , and Anne Koziolek

This publication is related to the following approaches:

Published at the 46th International Conference on Software Engineering (ICSE 2024), April 14-20 2024.

Additional presentation at the Software Engineering 2025 (SE25), the symposium of the German Computer Science Society (Gesellschaft für Informatik (GI)).

TransArC Overview

Abstract

Introduction Software development involves creating various artifacts at different levels of abstraction and establishing relationships between them is essential. Traceability link recovery (TLR) automates this process, enhancing software quality by aiding tasks like maintenance and evolution. However, automating TLR is challenging due to semantic gaps resulting from different levels of abstraction. While automated TLR approaches exist for requirements and code, architecture documentation lacks tailored solutions, hindering the preservation of architecture knowledge and design decisions.

Methods This paper presents our approach TransArC for TLR between architecture documentation and code, using component-based architecture models as intermediate artifacts to bridge the semantic gap. We create transitive trace links by combining the existing approach ArDoCo for linking architecture documentation to models with our novel approach ArCoTL for linking architecture models to code.

Results We evaluate our approaches with five open-source projects, comparing our results to baseline approaches. The model-to-code TLR approach achieves an average F1-score of 0.98, while the documentation-to-code TLR approach achieves a promising average F1-score of 0.82, significantly outperforming baselines.

Conclusion Combining two specialized approaches with an intermediate artifact shows promise for bridging the semantic gap. In future research, we will explore further possibilities for such transitive approaches.

Links

Paper (Open Access) on ACM or KITopen
Replication Package on Zenodo and the corresponding GitHub repository
Slides as pptx or pdf
Slides (SE25)

Cite this paper

Recovering Trace Links Between Software Documentation And Code

Jan Keim, Sophie Corallo, Dominik Fuchß, Tobias Hey, Tobias Telge, and Anne Koziolek

In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, Lisbon, Portugal, Apr 2024

DOI

Introduction Software development involves creating various artifacts at different levels of abstraction and establishing relationships between them is essential. Traceability link recovery (TLR) automates this process, enhancing software quality by aiding tasks like maintenance and evolution. However, automating TLR is challenging due to semantic gaps resulting from different levels of abstraction. While automated TLR approaches exist for requirements and code, architecture documentation lacks tailored solutions, hindering the preservation of architecture knowledge and design decisions. Methods This paper presents our approach TransArC for TLR between architecture documentation and code, using component-based architecture models as intermediate artifacts to bridge the semantic gap. We create transitive trace links by combining the existing approach ArDoCo for linking architecture documentation to models with our novel approach ArCoTL for linking architecture models to code.Results We evaluate our approaches with five open-source projects, comparing our results to baseline approaches. The model-to-code TLR approach achieves an average F1-score of 0.98, while the documentation-to-code TLR approach achieves a promising average F1-score of 0.82, significantly outperforming baselines. Conclusion Combining two specialized approaches with an intermediate artifact shows promise for bridging the semantic gap. In future research, we will explore further possibilities for such transitive approaches.
@inproceedings{keim_recovering_2024, author = {Keim, Jan and Corallo, Sophie and Fuch\ss{}, Dominik and Hey, Tobias and Telge, Tobias and Koziolek, Anne}, title = {Recovering Trace Links Between Software Documentation And Code}, year = {2024}, isbn = {9798400702174}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3597503.3639130}, doi = {10.1145/3597503.3639130}, booktitle = {Proceedings of the IEEE/ACM 46th International Conference on Software Engineering}, articleno = {215}, numpages = {13}, location = {Lisbon, Portugal}, series = {ICSE '24}, }