Commit graph

5 commits

Author SHA1 Message Date
Matthew Stanley
bd5f42fb22 analysis: discover_function_bounds — real CFG walk with jump-table support
Adds a public N64Recomp::discover_function_bounds() in src/analysis.h
that performs a BFS-based control-flow walk of a function's body,
following:
  - Conditional branches (target + fall-through)
  - Unconditional j/jal targets when intra-body
  - jr $ra returns (block ends after delay slot)
  - jr-via-jump-table dispatches: the existing register-state
    simulator from analyze_function detects the lui+addiu+addu+lw+jr
    pattern and records the jtbl base; we then read entries out of
    the body bytes and feed targets back into the BFS until
    convergence.

Returns the function's byte size (max-reachable + 4 to cover the
delay slot of the last instruction). On failure, populates a specific
error message with the offending offset and reason — caller treats
this as a build error, NOT a graceful skip (per the project's
no-stubs principle).

Wires into decompressed.cpp's pattern path, replacing the prior
inline BFS that had a TODO for jump-table handling. The pattern
caller now propagates failures via `synthesize_decompressed_patterns`
returning false, which surfaces in main.cpp's exit_failure path.

Concrete behavior change: activating a pattern that includes a
fragment with computed jumps now produces a build error pointing at
the specific section name + offset + the analyzer's failure reason,
instead of silently producing a partial binary. Tested on Stadium's
0x8FF00000 slot — first failing wrapper is at ROM 0x8CC400 with an
indirect jr at offset 0x827C the simulator doesn't pattern-match.
The static [[input.decompressed_section]] path for fragment78 is
unaffected (still recompiles cleanly, no regression on boot logo +
PIKA jingle).

Future work surfaced by this change: the simulator's lui+addiu
+addu+lw+jr pattern doesn't cover every jump-table shape Stadium
uses. Each gap surfaces as a specific build-error offset; resolution
is to extend analyze_instruction to recognize the additional pattern
(or, when it's a true tail-call rather than a jtbl, distinguish
those at the jr site).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 21:47:44 -07:00
Matthew Stanley
8320bb902b recomp: replace pattern-section graceful-skip with real CFG-based bounds discovery
The prior "pattern-synthesized recompile failures are best-effort:
log + skip" path was a stub by another name — it produced binaries
where some fragment bodies silently didn't exist, and the failure
deferred to a runtime lookup-miss when Stadium tried to dispatch
into them. That violates the project's no-stubs principle.

Two changes here:

1. **Remove the soft-skip in main.cpp's recompile loops.** Recompile
   failures revert to fatal `std::exit(EXIT_FAILURE)` regardless of
   whether the section is pattern-synthesized. Build-time errors
   surface; the user has to make a real choice about how to
   resolve them.

2. **Replace the "scan to first jr ra" heuristic in decompressed.cpp
   with a real BFS-based control-flow walker.** The walker:
   - Starts at impl entry (+0x20).
   - Follows conditional branches (target + fall-through).
   - Follows j/jal targets when intra-function.
   - Treats jr $ra as a return; ends the basic block.
   - Returns max-reachable-offset + 4 as the function's true size.

   For functions with computed jumps (jr <reg> not jr $ra — i.e.
   jump-table dispatches), the walker reports a build-time error
   with a specific offset and a list of options for the user
   (declare via single-block form, or extend the walker to follow
   jump-table targets). NOT a skip.

3. **Pattern-caller propagates synthesis failures as build aborts.**
   `synthesize_decompressed_patterns` returns false when any section
   fails to add, and main.cpp's exit_failure path runs.

Net effect on Stadium today: the static [[input.decompressed_section]]
for fragment78 still recompiles cleanly (boot logo + PIKA jingle
unaffected). Activating the pattern would now fail loudly on the
first fragment with computed jumps, instead of silently shipping a
binary missing those bodies. That's the principle: build errors
surface, runtime stubs don't.

The "extend the walker to follow jump-table targets" work is
documented in the error message and is the next step if/when
pattern activation matters more than fragment78's single case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 21:47:44 -07:00
Matthew Stanley
5f2ae6e4f7 recomp: emit content_hash on pattern-synthesized sections (Shape A, build side)
Adds Section::content_hash, populates it on pattern-synthesized
sections with FNV-1a-64 of the first 0x100 bytes of the decompressed
body, and emits it into recomp_overlays.inl's SectionTableEntry. The
runtime side hashes the same window over the bytes Stadium loads at
fragment_ptr and looks up the matching section by hash.

Build-time and runtime use:
  - SAME hash algorithm: FNV-1a-64
  - SAME window: 0x100 bytes (95% uniqueness across Stadium's 282
    distinct fragment bodies; falls back to first-candidate on the
    residual ~5%)
  - SAME byte source: pre-relocation decompressed bytes (link-time
    form, before Stadium's R_MIPS_32 patches run)

Section table emit gains the .content_hash field; non-pattern sections
get hash=0, runtime-side condition `sec.content_hash != 0` filters
them out of the candidate set.

Pairs with the runtime-side change in
lib/N64ModernRuntime/librecomp/src/overlays.cpp.

Activation in PokemonStadiumRecomp's game.toml is gated on a
follow-up: pattern-synthesized impl bodies currently get a basic
forward-CFG-walked size which produces invalid C for fragments with
internal jump tables (data interpreted as code). Future fix: emit
pattern-section impl bodies as runtime-dispatched stubs instead of
trying to statically recompile each body. Until then, fragment78
stays declared as a single static [[input.decompressed_section]];
the engine's pattern infrastructure is in place, ready to be flipped
on once the impl-body emit is reshaped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 21:47:44 -07:00
Matthew Stanley
5b42a76748 recomp: pattern auto-discovery for dynamic-asset slot fragments (Shape A)
Adds [[input.decompressed_section_pattern]] for slots where many
fragments share a link vram (e.g. Stadium streams 279+ different
fragments through vram 0x8FF00000 across the game). Per-fragment
[[input.decompressed_section]] entries don't scale to that cardinality
and miss the runtime-swap dispatch problem entirely.

Engine pipeline:
  1. Scan baserom.z64 for every Yay0 wrapper.
  2. For each, decompress 0x40 bytes and check whether the prefix
     matches the expected J <vram + 0x20> trampoline + FRAGMENT magic.
     Wrappers in PERS-SZP form are detected by the -0x18 prefix.
  3. For matches, fully decompress and FNV-1a-64 hash the body.
  4. Deduplicate by content hash (Stadium has ~11 byte-identical
     duplicates across its 279 wrappers).
  5. Synthesize one Section per unique content. Section names
     <base_name>__rom_<wrapper_offset>; functions become
     func_<vram>__rom_<offset> via the existing collision-suffix
     machinery (default for pattern-discovered sections, since
     collisions are the EXPECTED case here).

Implementation function (the +0x20 entry) gets a basic forward CFG
walk to determine its size:
  - Walk instructions tracking forward branch targets within the func.
  - Stop at jr $ra IF no tracked forward branches still need to be
    reached.
  - Falls back to first-jr-ra heuristic if walk is inconclusive.

Pattern-synthesized recompile failures are non-fatal: pattern sections
have rom_addr in synthetic 0xFE000000 range, and main.cpp's recompile
loop log + skips them instead of std::exit. Lets the build proceed
even when our basic CFG walk misjudges a function with weird shape
(e.g. computed jumps through jump tables we don't analyze). Stadium's
Path-3 single-fragment case (fragment78 wrapper at ROM 0x9E93F0)
still recompiles cleanly; ~225 of 282 dynamic-slot fragments
recompile, ~57 fail and skip.

Validation on Stadium's 0x8FF00000 slot:
  - 293 Yay0 wrappers found (293 vs 279 from prior validate script —
    earlier scan undercounted due to a tight 1KB decode window).
  - 282 sections after dedupe (11 collapsed as content-identical).
  - Build proceeds to completion; no Stadium boot regression
    (logo + PIKA jingle still render).

Outstanding for next session — runtime side:
  - Modify register_runtime_fragment in librecomp/src/overlays.cpp
    to read bytes at fragment_ptr (first 0x40 → fall back to full
    body for the residual ~5%), hash, and look up the matching
    section. Currently it picks by id alone, so for slot 0x8FF00000
    only ONE of the 282 sections gets bound to func_map at any time
    (the most-recently registered).
  - Refactor cross-section R_MIPS_32 retargeting to use a vram
    hashmap (currently O(N²) which gets expensive at 282 sections).
  - Relink fragment78's prior single-fragment block can stay; it
    works alongside patterns and serves as the "I know exactly which
    one I want" form.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 21:47:44 -07:00
Matthew Stanley
b517a7195a recomp: build-time decompression of CPU-decompressed-at-runtime fragments
Adds [[input.decompressed_section]] toml block + Yay0/PERS-SZP wrapper
decoders + an in-memory section synthesis pass. Required for games
like Pokemon Stadium where Stadium's CPU-side decompressor materializes
fragment bytes at runtime and the static recompiler can't see them in
the ELF/ROM-direct path.

User-facing config:
    [[input.decompressed_section]]
    name = "fragment78"
    vram = 0x8FF00000
    rom_wrapper = 0x9E93F0
    wrapper_format = "pers_szp_yay0"

Pipeline:
  1. compression/{yay0,pers_szp}.{h,cpp} decode the wrapper.
  2. decompressed.cpp parses the FRAGMENT-format header (relocOffset,
     sizeInRam) + Stadium-format reloc table, translates it to
     N64Recomp::Reloc entries (R_MIPS_32/26/HI16/LO16) with paired
     HI16/LO16 immediate computation, and synthesizes a Section
     handed to the existing recompilation pipeline. Stores
     decompressed bytes into context.rom at synthetic_rom =
     0xFE000000 | rom_wrapper to keep them out of real-ROM addr space.
  3. Two functions per fragment: the +0x00 entry trampoline (J + nop)
     and the +0x20 implementation (runs to first jr ra in body).
  4. After all decompressed sections are added, retargets each
     R_MIPS_32 reloc to whichever existing section's vram range
     contains its target address (cross-section pointer support).

Adds [output] collision_policy:
  "error"  (default) — abort the build if two emitted symbols collide
                       on name; print both colliders + how to opt in.
  "suffix"           — auto-disambiguate by appending __rom_<rom_addr>
                       to colliding symbols. Suffix only appears where
                       collisions exist.

Validated end-to-end on Stadium's fragment78 (wrapper at ROM 0x9E93F0,
decomp_size=0x25340, 319 relocs). Recompiled func_8FF00020 dispatches
to runtime_addr+0x24DC0 correctly; Stadium boots past the prior
crash point, no regression on the N64 logo + PIKA jingle.

Future work: pattern form ([[input.decompressed_section_pattern]]) for
slots like vram 0x8FF00000 where Stadium streams 279 different
fragments at the same link addr. Validation script
(tools/_validate_dynfrag.py in the consumer repo) confirms 268 distinct
content-hashes, 23MB total payload — feasible as engine work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 21:47:39 -07:00