New dataset for the SE community: P3—a curated collection of incomplete bug fixes from real-world C projects. Each case documents multiple partial fixes leading to a complete fix, enabling reproducible studies on patch correctness, fix evolution, and automated repair.
Meet gfixr: the first system to automatically repair context-free grammars. Using spectrum-based fault localization + targeted patches, it iteratively fixes faulty rules until all tests pass—shown effective on student grammars & even Pascal dialects.
In testing 61 student compilers, we found that no single grammar-based method matched the instructor’s suite, but combining systematic and random suites did better. By adding semantic mark-up tokens to grammars—encoding scope and type constraints—we could automatically generate tests that respect context, match instructor coverage, and expose more real bugs, including in the reference compiler.
gtutr: an interactive feedback system for learning context-free grammars with ANTLR. It clusters failing tests via sequence alignment, avoids overload, adds light gamification, and visualizes progress—helping students debug smarter, not harder.
New approach to debugging grammars! We present the first spectrum-based fault localization for context-free grammars—ranking suspicious rules via grammar spectra from LL/LR parsers. Evaluated on real & student grammars, it pinpoints faults in up to 40% of cases.
We explore two mutation-based methods for generating programs with guaranteed syntax errors — word mutation (token edits) and rule mutation (grammar edits) — to complement traditional grammar-based testing focused only on valid programs.
New approach to debugging grammars! We present the first spectrum-based fault localization for context-free grammars—ranking suspicious rules via grammar spectra from LL/LR parsers. Evaluated on real & student grammars, it pinpoints faults in up to 40% of cases.
Automated Program Repair is powerful, but can devs trust the fixes? We present AVICENNAPATCH: a tool that explains what inputs a patch actually fixes using differential testing + grammar-based constraints. More transparency, better trust in APR!
Grammar-based test generators can produce huge suites, but running them is costly. We show that simple prioritization strategies (e.g., input length, novel rule coverage, rule frequency) can improve fault detection vs. random ordering—though performance varies across suites.