Collored commentary

Published in Under submission, 2026

This work studies the robustness and correctness of LLM-based assembly transpilers operating at the -O2 optimization level for CISC→RISC translation. Using a curated and synthetically-augmented benchmark that spans x86, ARMv8, and RISC-V, we:

  • Build a taxonomy of recurrent failure modes, including mis-handled calling conventions, broken control-flow, and incorrect flag/condition semantics.
  • Introduce a program generation pipeline that produces diverse C programs whose compiled assembly stresses specific instruction patterns and optimization interactions.
  • Show that targeted augmentation using these synthetic programs can improve cross-ISA translation accuracy on held-out benchmarks, especially for tricky patterns such as loop unrolling, strength reduction, and instruction fusion at -O2.