Commit graph

8 commits

Author SHA1 Message Date
Matthew Stanley
6f9649c7e7 decompressed: per-variant synthetic link identities for pattern fragments
Path 2 of the pattern-fragment dispatch architecture: each variant of
a [[input.decompressed_section_pattern]] now gets a unique link-time
ram_addr from a synthetic vram pool (0xC0000000+, KSEG2/KSEG3 — unused
by N64 software so it can't collide with engine-resident sections like
RSP at 0xA4000000+).

Why: when multiple variants share a single canonical link bucket
(e.g. all stadium_models pattern variants at 0x8FF00000), runtime
fragment-vaddr resolution via gFragments[id] is single-pointer and
ambiguous when more than one variant is host-resident at the same
time. Per-variant synthetic ram_addrs make each variant's RELOC_HI16
/ RELOC_LO16 emit produce a unique 0xCXXXXXXX literal at runtime,
giving variant-internal references unambiguous identity without
depending on caller PC, host stack walks, or data-context tracking.

Implementation:

- add_decompressed_section accepts an override_link_ram_addr param.
  The bytes-encoded `vram` (= canonical link bucket) is passed to
  parse_fragment_relocs and discover_function_bounds (so jump tables
  resolve correctly against the body's encoded references), while
  section.ram_addr is set to the override. The two roles of vram are
  cleanly separated.

- New original_pattern_id field on Section. Populated for synthetic-
  link variants with the original game-side fragment id derived from
  the pattern's canonical bucket (e.g. 0xEF for stadium_models).
  Lets the runtime candidate filter know which game id should
  include this synthetic section as a candidate, eliminating cross-
  pattern hash-collision misregistration.

- main.cpp emit: section_load_table now writes original_pattern_id
  into the SectionTableEntry initializer.

- decompressed.cpp pattern loop: every unique variant now gets
  synthetic ram_addr = 0xC0000000 + variant_idx * 0x100000 (1 MB
  stride, ~286 KB largest observed variant). For Stadium's 279
  unique variants the pool occupies 0xC0000000..0xCDB00000, well
  within the runtime-side 512-bucket capacity.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 21:47:45 -07:00
Matthew Stanley
0d6bc043d2 decompressed: cumulative synthetic_rom allocator — fixes overlap corruption
The previous formula \`synthetic_rom = 0xFE000000 | rom_wrapper\` assumed
wrapper offsets were spaced apart by at least their decompressed body
sizes. They are NOT — Stadium's wrappers are densely packed (often
within 0x100-0x10000 bytes of each other) while their decompressed
bodies are 0x500-0x50000 bytes. This caused later sections' memcpy
into context.rom to OVERWRITE earlier sections' bytes, corrupting
their jump-table entries and any other content addressed by
relative offsets.

Concrete repro before the fix: pattern-activate Stadium's 0x8FF00000
slot. Section frag_8FF00000__rom_56E900 has impl_size=0xC2F4 (correctly
bounded). Its jump table at body offset 0xC300 has 5 entries pointing
to body offsets 0x48..0x74. After the section was added, frag_*__rom_574A50
(wrap_off=0x574A50, synthetic_rom=0xFE574A50) memcpy'd 0x58 bytes
starting at 0xFE574A50 — INSIDE the first section's range
[0xFE56E900, 0xFE57AC20). The jtbl bytes at offset 0xC300 (rom 0xFE57AC00)
got clobbered with garbage from the second section's body. analyze_function
then read jtbl entries that didn't decode to in-function vrams and
reported "Failed to determine size of jump table" — a real symptom
caused by silent data corruption.

The fix: cumulative allocator. A static counter starts at 0xFE000000;
each new section claims a fresh, 4-byte-aligned chunk equal to its
reloc_offset. No two sections ever share a byte range. The 0xFE000000
prefix is preserved for traceability (synthetic ranges live above any
real ROM offset). Fails the build cleanly if cumulative usage exceeds
0x100000000 (256 MB of synthesized payload), which Stadium's 0x8FF00000
slot at ~23 MB total is comfortably under.

Verified: pattern-activated Stadium's 0x8FF00000 slot. After the fix,
ZERO analyze_function failures and ZERO bounds-discovery failures
(was 57+ before). Build now hits a different class — discover_function_bounds
walks past real function ends via j/jal-in-body that are tail calls,
not intra-function jumps. That's a separate analyzer bug, surfaced by
this fix and tracked as the next layer of work. Still principle-clean:
build aborts with specific instruction offsets.

Static [[input.decompressed_section]] for fragment78 still
recompiles cleanly. No regression on Stadium boot logo + PIKA jingle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 21:47:45 -07:00
Matthew Stanley
4f5fb0b64b decompressed: detect data-only fragments by absence of jr \$ra
Stadium's dynamic-asset slot at vram 0x8FF00000 contains a mix of
fragment shapes:

  Code fragments — real MIPS function at +0x20 ending in jr \$ra
                   (and possibly more functions). Stadium dispatches
                   the +0x00 J trampoline to invoke them.

  Data fragments — pure data starting at +0x20 (tables of (tag,
                   pointer) records, animation curves, etc.). The
                   +0x00 J trampoline is a dormant placeholder that
                   Stadium NEVER actually calls. Stadium reads the
                   data directly via R_MIPS_32 pointers from elsewhere.

The previous code path attempted to recompile a function at +0x20
in EVERY synthesized section, which (a) was incorrect for data
fragments, and (b) reliably produced invalid C from data words
decoded as instructions.

Detection heuristic: scan the first 0x100 instructions of the body
for any jr \$ra (encoded as 0x03E00008). If absent, the fragment is
data-only — register the section + R_MIPS_32 relocs but emit NO
FuncEntry rows. If Stadium ever does dispatch the +0x00 J for one
of these (which shouldn't happen), the runtime LOOKUP_FUNC reports
the miss loudly — that's the correct surface, NOT a stub.

Tested on Stadium's 0x8FF00000 slot via [[input.decompressed_section_pattern]]:
  - 282 wrappers attempted
  - 62 classified as data-only (registered without impl function)
  - 220 attempted as code; first failure surfaces an analyze_function
    jump-table sizing gap (separate issue, distinct from data-only
    classification)

Static [[input.decompressed_section]] for fragment78 is unaffected
(still recompiles cleanly; boot logo + PIKA jingle still play).
The pattern stays inactive in Stadium's game.toml until the
analyze_function jtbl gap is addressed; build correctly refuses to
proceed if activated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 21:47:45 -07:00
Matthew Stanley
bd5f42fb22 analysis: discover_function_bounds — real CFG walk with jump-table support
Adds a public N64Recomp::discover_function_bounds() in src/analysis.h
that performs a BFS-based control-flow walk of a function's body,
following:
  - Conditional branches (target + fall-through)
  - Unconditional j/jal targets when intra-body
  - jr $ra returns (block ends after delay slot)
  - jr-via-jump-table dispatches: the existing register-state
    simulator from analyze_function detects the lui+addiu+addu+lw+jr
    pattern and records the jtbl base; we then read entries out of
    the body bytes and feed targets back into the BFS until
    convergence.

Returns the function's byte size (max-reachable + 4 to cover the
delay slot of the last instruction). On failure, populates a specific
error message with the offending offset and reason — caller treats
this as a build error, NOT a graceful skip (per the project's
no-stubs principle).

Wires into decompressed.cpp's pattern path, replacing the prior
inline BFS that had a TODO for jump-table handling. The pattern
caller now propagates failures via `synthesize_decompressed_patterns`
returning false, which surfaces in main.cpp's exit_failure path.

Concrete behavior change: activating a pattern that includes a
fragment with computed jumps now produces a build error pointing at
the specific section name + offset + the analyzer's failure reason,
instead of silently producing a partial binary. Tested on Stadium's
0x8FF00000 slot — first failing wrapper is at ROM 0x8CC400 with an
indirect jr at offset 0x827C the simulator doesn't pattern-match.
The static [[input.decompressed_section]] path for fragment78 is
unaffected (still recompiles cleanly, no regression on boot logo +
PIKA jingle).

Future work surfaced by this change: the simulator's lui+addiu
+addu+lw+jr pattern doesn't cover every jump-table shape Stadium
uses. Each gap surfaces as a specific build-error offset; resolution
is to extend analyze_instruction to recognize the additional pattern
(or, when it's a true tail-call rather than a jtbl, distinguish
those at the jr site).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 21:47:44 -07:00
Matthew Stanley
8320bb902b recomp: replace pattern-section graceful-skip with real CFG-based bounds discovery
The prior "pattern-synthesized recompile failures are best-effort:
log + skip" path was a stub by another name — it produced binaries
where some fragment bodies silently didn't exist, and the failure
deferred to a runtime lookup-miss when Stadium tried to dispatch
into them. That violates the project's no-stubs principle.

Two changes here:

1. **Remove the soft-skip in main.cpp's recompile loops.** Recompile
   failures revert to fatal `std::exit(EXIT_FAILURE)` regardless of
   whether the section is pattern-synthesized. Build-time errors
   surface; the user has to make a real choice about how to
   resolve them.

2. **Replace the "scan to first jr ra" heuristic in decompressed.cpp
   with a real BFS-based control-flow walker.** The walker:
   - Starts at impl entry (+0x20).
   - Follows conditional branches (target + fall-through).
   - Follows j/jal targets when intra-function.
   - Treats jr $ra as a return; ends the basic block.
   - Returns max-reachable-offset + 4 as the function's true size.

   For functions with computed jumps (jr <reg> not jr $ra — i.e.
   jump-table dispatches), the walker reports a build-time error
   with a specific offset and a list of options for the user
   (declare via single-block form, or extend the walker to follow
   jump-table targets). NOT a skip.

3. **Pattern-caller propagates synthesis failures as build aborts.**
   `synthesize_decompressed_patterns` returns false when any section
   fails to add, and main.cpp's exit_failure path runs.

Net effect on Stadium today: the static [[input.decompressed_section]]
for fragment78 still recompiles cleanly (boot logo + PIKA jingle
unaffected). Activating the pattern would now fail loudly on the
first fragment with computed jumps, instead of silently shipping a
binary missing those bodies. That's the principle: build errors
surface, runtime stubs don't.

The "extend the walker to follow jump-table targets" work is
documented in the error message and is the next step if/when
pattern activation matters more than fragment78's single case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 21:47:44 -07:00
Matthew Stanley
5f2ae6e4f7 recomp: emit content_hash on pattern-synthesized sections (Shape A, build side)
Adds Section::content_hash, populates it on pattern-synthesized
sections with FNV-1a-64 of the first 0x100 bytes of the decompressed
body, and emits it into recomp_overlays.inl's SectionTableEntry. The
runtime side hashes the same window over the bytes Stadium loads at
fragment_ptr and looks up the matching section by hash.

Build-time and runtime use:
  - SAME hash algorithm: FNV-1a-64
  - SAME window: 0x100 bytes (95% uniqueness across Stadium's 282
    distinct fragment bodies; falls back to first-candidate on the
    residual ~5%)
  - SAME byte source: pre-relocation decompressed bytes (link-time
    form, before Stadium's R_MIPS_32 patches run)

Section table emit gains the .content_hash field; non-pattern sections
get hash=0, runtime-side condition `sec.content_hash != 0` filters
them out of the candidate set.

Pairs with the runtime-side change in
lib/N64ModernRuntime/librecomp/src/overlays.cpp.

Activation in PokemonStadiumRecomp's game.toml is gated on a
follow-up: pattern-synthesized impl bodies currently get a basic
forward-CFG-walked size which produces invalid C for fragments with
internal jump tables (data interpreted as code). Future fix: emit
pattern-section impl bodies as runtime-dispatched stubs instead of
trying to statically recompile each body. Until then, fragment78
stays declared as a single static [[input.decompressed_section]];
the engine's pattern infrastructure is in place, ready to be flipped
on once the impl-body emit is reshaped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 21:47:44 -07:00
Matthew Stanley
5b42a76748 recomp: pattern auto-discovery for dynamic-asset slot fragments (Shape A)
Adds [[input.decompressed_section_pattern]] for slots where many
fragments share a link vram (e.g. Stadium streams 279+ different
fragments through vram 0x8FF00000 across the game). Per-fragment
[[input.decompressed_section]] entries don't scale to that cardinality
and miss the runtime-swap dispatch problem entirely.

Engine pipeline:
  1. Scan baserom.z64 for every Yay0 wrapper.
  2. For each, decompress 0x40 bytes and check whether the prefix
     matches the expected J <vram + 0x20> trampoline + FRAGMENT magic.
     Wrappers in PERS-SZP form are detected by the -0x18 prefix.
  3. For matches, fully decompress and FNV-1a-64 hash the body.
  4. Deduplicate by content hash (Stadium has ~11 byte-identical
     duplicates across its 279 wrappers).
  5. Synthesize one Section per unique content. Section names
     <base_name>__rom_<wrapper_offset>; functions become
     func_<vram>__rom_<offset> via the existing collision-suffix
     machinery (default for pattern-discovered sections, since
     collisions are the EXPECTED case here).

Implementation function (the +0x20 entry) gets a basic forward CFG
walk to determine its size:
  - Walk instructions tracking forward branch targets within the func.
  - Stop at jr $ra IF no tracked forward branches still need to be
    reached.
  - Falls back to first-jr-ra heuristic if walk is inconclusive.

Pattern-synthesized recompile failures are non-fatal: pattern sections
have rom_addr in synthetic 0xFE000000 range, and main.cpp's recompile
loop log + skips them instead of std::exit. Lets the build proceed
even when our basic CFG walk misjudges a function with weird shape
(e.g. computed jumps through jump tables we don't analyze). Stadium's
Path-3 single-fragment case (fragment78 wrapper at ROM 0x9E93F0)
still recompiles cleanly; ~225 of 282 dynamic-slot fragments
recompile, ~57 fail and skip.

Validation on Stadium's 0x8FF00000 slot:
  - 293 Yay0 wrappers found (293 vs 279 from prior validate script —
    earlier scan undercounted due to a tight 1KB decode window).
  - 282 sections after dedupe (11 collapsed as content-identical).
  - Build proceeds to completion; no Stadium boot regression
    (logo + PIKA jingle still render).

Outstanding for next session — runtime side:
  - Modify register_runtime_fragment in librecomp/src/overlays.cpp
    to read bytes at fragment_ptr (first 0x40 → fall back to full
    body for the residual ~5%), hash, and look up the matching
    section. Currently it picks by id alone, so for slot 0x8FF00000
    only ONE of the 282 sections gets bound to func_map at any time
    (the most-recently registered).
  - Refactor cross-section R_MIPS_32 retargeting to use a vram
    hashmap (currently O(N²) which gets expensive at 282 sections).
  - Relink fragment78's prior single-fragment block can stay; it
    works alongside patterns and serves as the "I know exactly which
    one I want" form.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 21:47:44 -07:00
Matthew Stanley
b517a7195a recomp: build-time decompression of CPU-decompressed-at-runtime fragments
Adds [[input.decompressed_section]] toml block + Yay0/PERS-SZP wrapper
decoders + an in-memory section synthesis pass. Required for games
like Pokemon Stadium where Stadium's CPU-side decompressor materializes
fragment bytes at runtime and the static recompiler can't see them in
the ELF/ROM-direct path.

User-facing config:
    [[input.decompressed_section]]
    name = "fragment78"
    vram = 0x8FF00000
    rom_wrapper = 0x9E93F0
    wrapper_format = "pers_szp_yay0"

Pipeline:
  1. compression/{yay0,pers_szp}.{h,cpp} decode the wrapper.
  2. decompressed.cpp parses the FRAGMENT-format header (relocOffset,
     sizeInRam) + Stadium-format reloc table, translates it to
     N64Recomp::Reloc entries (R_MIPS_32/26/HI16/LO16) with paired
     HI16/LO16 immediate computation, and synthesizes a Section
     handed to the existing recompilation pipeline. Stores
     decompressed bytes into context.rom at synthetic_rom =
     0xFE000000 | rom_wrapper to keep them out of real-ROM addr space.
  3. Two functions per fragment: the +0x00 entry trampoline (J + nop)
     and the +0x20 implementation (runs to first jr ra in body).
  4. After all decompressed sections are added, retargets each
     R_MIPS_32 reloc to whichever existing section's vram range
     contains its target address (cross-section pointer support).

Adds [output] collision_policy:
  "error"  (default) — abort the build if two emitted symbols collide
                       on name; print both colliders + how to opt in.
  "suffix"           — auto-disambiguate by appending __rom_<rom_addr>
                       to colliding symbols. Suffix only appears where
                       collisions exist.

Validated end-to-end on Stadium's fragment78 (wrapper at ROM 0x9E93F0,
decomp_size=0x25340, 319 relocs). Recompiled func_8FF00020 dispatches
to runtime_addr+0x24DC0 correctly; Stadium boots past the prior
crash point, no regression on the N64 logo + PIKA jingle.

Future work: pattern form ([[input.decompressed_section_pattern]]) for
slots like vram 0x8FF00000 where Stadium streams 279 different
fragments at the same link addr. Validation script
(tools/_validate_dynfrag.py in the consumer repo) confirms 268 distinct
content-hashes, 23MB total payload — feasible as engine work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 21:47:39 -07:00