Two related changes for diagnosing hung RSP ucodes:
1. Per-ucode pre-task hook. The legacy wrapper now calls
recomp::rsp::run_pre_task_hook(rdram, &persistent_ctx,
ucode_name, ucode_addr) before invoking impl. Lets games
register hook callbacks keyed by ucode name to replicate
parts of rspboot's setup that the static recompilation can't
infer (initial GPRs, DMA-engine residue, pre-loaded command
data in DMEM). No-op when no hook is registered (one branch).
Used this session to unstick Stadium's aspMain dispatch by
pre-loading audio commands at DMEM[0x2B0] and seeding $29 =
0x2B0, $30 = chunk_size. Confirmed via watchdog trail: the
dispatch now lands on real handlers (r26 = first audio cmd
word 0x020004E0, r28/r27/r30 advance through one command)
instead of looping on dispatch-table residue.
2. Watchdog trip dump now includes r1, r2, r3, r25, r26, r27,
r28, r29, r30, r31, jump_target, dma_mem_address,
dma_dram_address. Earlier dump (r1/r28/r29/r31 only) was too
sparse to localize the next blocker — Stadium's hang now
occurs at L_11B4 ↔ L_10EC because r3 is uninit going into
the dispatched handler (handler does `r3 -= 1` then DMAs,
but no upstream path set r3 for the first command). Richer
state lets future hangs in any consumer game be diagnosed
without recompiling the recompiler each time.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a per-label tick to the recompiler's basic-block emit:
increments ctx->watchdog_count and stores the current label PC
into ctx->pc_trail (32-entry ring). If the count exceeds 100M
basic-block transitions (roughly 1.6s wall-clock at native RSP
speed), the function returns RspExitReason::Watchdog with the
last 32 PCs and key GPRs dumped to stderr.
Cost is ~5 cheap ops per label, <1ms over a typical 50ms audio
frame. The mechanism is the canonical "always-on ring buffer"
shape from CLAUDE.md global rules: no arming, no instrumentation
toggling — recording is continuous, probes query backward.
Pairs with the librecomp ultra_trace ring (separate commit) so a
hung ucode shows up immediately as `rsp_run_task_enter` with no
matching `_exit`. Used this session to localize the Stadium
audio aspMain hang at PC cycle 1048→10EC→1038→103C — see
project_pokestadium_status.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Real RSP hardware retains GPRs across task switches. rspboot only
writes $1/$2/$3/$4/$7 before jumping to the loaded ucode; everything
else is whatever the previous task left. The previous emit zero-init'd
all 32 GPRs at every entry, breaking any ucode that depends on
inherited state — e.g. Pokemon Stadium's libultra aspMain reads $29
on its first dispatch iteration expecting it from a prior run.
RSPRecomp/src/rsp_recomp.cpp:
* All GPRs (and dma_*, jump_target, rsp) are now emitted as C++
references into *ctx — writes auto-persist through to the backing
RspContext, no manual store-back at exit points.
* No-overlay case (Stadium aspMain shape) emits an `_impl(rdram,
ctx)` function plus a legacy-ABI wrapper `(rdram, ucode_addr)`
that owns a `static thread_local RspContext`. The wrapper preserves
the runtime ABI (no librecomp change needed) while the static
thread_local delivers cross-run_task GPR retention.
* Overlay-swap function's stack-local `RspContext ctx{}` promoted to
`static thread_local` for the same reason.
* write_overlay_swap_return reduced from 9 lines of manual store-back
to just `return RspExitReason::SwapOverlay` — references handle it.
(NOTE: original local commit 5c6c654 also bundled per-register cop0
dispatch + cop0_regs[32] storage. That portion is split out for a
separate follow-up PR — it depends on a runtime API not yet in
upstream N64ModernRuntime.)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Implement function hook insertion
* Fix recompiled code indentation
* Add _matherr to renamed_funcs
* Replace after_vram by before_vram
* Emit dummy value if relocatable_sections_ordered is empty