LLM-based Generation of Weakest Preconditions and Precise Array Invariants (FormaliSE 2025 - Research Track)

Who

Daragh King, Vasileios Koutavas, Laura Kovács

Track

FormaliSE 2025 Research Track

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 28 Apr 2025 11:00 - 11:30 at 203 - Session 4 – Generative AI and Fuzzy Logic Chair(s): Lina Marsso

Abstract

The weakest precondition of a program describes the largest set of initial states from which all terminating executions of the program satisfy a given postcondition. The generation of these weakest preconditions is an important task with practical applications in areas such as software verification and runtime error-checking. For programs containing loops, weakest precondition generation critically depends on synthesizing loop invariants that are inductive in nature; these weakest preconditions then essentially embed the inductive properties required for program verification. This paper investigates the use of Large Language Models (LLMs) to generate weakest preconditions (and accompanying loop invariants) in order to prove the correctness of non-deterministic programs containing arrays, loops, and arithmetic. Specifically, we employ several models of ChatGPT to derive the weakest preconditions and invariants for the aforementioned programs. We then compare these LLM-derived weakest preconditions and invariants to their provable counterparts obtained by the MaxPrANQ tool. We find that the quality of the LLM-derived results can vary greatly and is highly dependent on the underlying model used by ChatGPT. This variance in performance propels us to outline directions for future work and discuss how LLMs and formal-tools can complement one another in generating valid weakest preconditions and strong invariants.

Daragh King

Trinity College Dublin

Vasileios Koutavas

Trinity College Dublin

Ireland

Laura Kovács

TU Wien

Austria