ARDoCo - Automating Requirements and Documentation Comprehension

In this research project, we aim to provide traceability link recovery and consistency analyses between different kinds of software artifacts. Our recent approaches, such as LiSSA, leverage Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) to enable more generic and effective traceability link recovery across various artifact types. These methods combine information retrieval with LLMs to find and suggest trace links, making them adaptable to different tasks like requirements-to-code, documentation-to-code, and more. You can find our different approaches, including LiSSA and others, on the approaches page or read more about them using the info button on the publications page.

Documenting the architecture of a software system is important, especially to capture reasoning and design decisions. However, documentation is often incomplete, outdated, or missing, leading to loss of crucial knowledge and increased risks. Our long-term vision is to persist information from various sources, such as whiteboard discussions, to avoid losing essential system knowledge. A key challenge is ensuring consistency between formal artifacts (e.g., models) and informal documentation. We address this by applying natural language understanding and knowledge bases to analyze consistency and create traceability links between models and textual artifacts.

ARDoCo is actively developed by researchers of the Modelling for Continuous Software Engineering (MCSE) group of KASTEL - Institute of Information Security and Dependability at the KIT.

Important Links

Open student theses
People who are involved in the project
Central code repository ardoco/ardoco

Relevant and Recent Publications

_{The links will lead you to pages that contain details about the corresponding publications}

Paper at AIRE 2025: “Beyond Retrieval: A Study of Using LLM Ensembles for Candidate Filtering in Requirements Traceability” by Dominik Fuchß, Stefan Schwedt, Jan Keim, and Tobias Hey
Paper at ICSE 2025: “LiSSA: Toward Generic Traceability Link Recovery through Retrieval-Augmented Generation” by Dominik Fuchß, Tobias Hey, Jan Keim, Haoyu Liu, Niklas Ewald, Tobias Thirolf, and Anne Koziolek
Paper at REFSQ 2025: “Requirements Traceability Link Recovery via Retrieval-Augmented Generation” by Tobias Hey, Dominik Fuchß, Jan Keim, and Anne Koziolek
Paper at ICSA 2025: “Enabling Architecture Traceability by LLM-based Architecture Component Name Extraction” by Dominik Fuchß, Haoyu Liu, Tobias Hey, Jan Keim, and Anne Koziolek
🇩🇪 Präsentation auf der Jahrestagung der GI-FG Architekturen 2024: “LLM-gestützte Softwarearchitektur: Eine neue Ära?” von Jan Keim, Tobias Hey
Paper at ICSE 2024 with additional presentation at SE25: “Recovering Trace Links Between Software Documentation And Code” by Jan Keim, Sophie Corallo, Dominik Fuchß, Tobias Hey, Tobias Telge, and Anne Koziolek
Paper at ICSA 2023 with additional presentation at SE24: “Detecting Inconsistencies in Software Architecture Documentation Using Traceability Link Recovery” by Jan Keim, Sophie Corallo, Dominik Fuchß, and Anne Koziolek
Paper at ECSA 2021: “Trace Link Recovery for Software Architecture Documentation” by Jan Keim, Sophie Corallo, Dominik Fuchß, Claudius Kocher, Janek Speit and Anne Koziolek
Poster with the initial idea from the ICSA2019 NEMI track.