FPGA LR35902-Based CISC CPU Digital System
A complete FPGA-based digital system built around an LR35902-inspired CISC CPU architecture. The CPU is based on the Sharp LR35902 used in the original Game Boy family, combining an 8-bit register-oriented instruction model with memory-mapped I/O, interrupt handling, stack operations, and tight timing requirements. The system also includes an integrated pixel processing unit, audio processing unit, HDMI output, external memory experiments, and hardware-level debugging.
CPU Architecture
The core of the project is an LR35902-inspired CISC CPU implemented on FPGA. The LR35902 is the CPU family used in the original Game Boy architecture, and my implementation takes it as the reference point for the register model, instruction behavior, memory-mapped I/O organization, interrupt flow, and timing expectations. The CPU is designed around a classic fetch, decode, and execute flow, with control logic responsible for instruction sequencing, register updates, memory accesses, and flag behavior.
The architecture includes the main LR35902-style register file, arithmetic and logic operations, program counter control, stack behavior, memory-mapped I/O access, and interrupt-related control paths. A major part of the work was not only implementing the visible behavior of the CPU, but also matching the timing constraints expected by the rest of the system, since the PPU, APU, timers, and memory bus all depend on precise CPU sequencing.
Because the LR35902-style CPU interacts directly with video, audio, memory, and peripheral logic, debugging it required cycle-level observation. Small mistakes in instruction timing, flag updates, interrupt servicing, or memory access order could create bugs much later in the system, especially in the video pipeline or during boot.
Instruction Set
The instruction set follows the LR35902-style programming model, with data transfer instructions, arithmetic operations, logical operations, jumps, calls, returns, stack operations, immediate instructions, register-to-register operations, and memory-addressed instructions.
A large part of the implementation effort was spent validating how each instruction affects registers, flags, memory, and timing. Instructions that appear simple at first can become difficult when they involve side effects such as flag updates, stack pointer changes, conditional branching, or memory access ordering.
The project also required testing immediate values, register operands, indirect addressing, and control-flow instructions in a way compatible with LR35902 software expectations. This made the CPU much more than a simple academic processor: it had to behave consistently enough to run real software on top of the FPGA system.
Integrated GPU: Pixel Processing Unit
The system includes an integrated GPU-like module, closer to a Pixel Processing Unit than a modern programmable GPU. Its role is to generate the video output pixel by pixel, using timing-sensitive logic synchronized with the rest of the system.
The PPU handles background rendering, tile and pixel fetching, scrolling behavior, sprite processing, and video timing. It is responsible for producing the raw pixel stream that later becomes the HDMI image.
One of the main challenges was aligning the PPU timing with memory behavior. Video logic is very sensitive to latency: a one-cycle delay in the wrong place can shift pixels, break scrolling, corrupt sprites, or create visual artifacts. This made the PPU one of the most important debugging targets of the project.
Audio Processing Unit
The Audio Processing Unit generates the digital audio output of the system. It includes timing-sensitive audio channels, sequencing logic, mixing behavior, and output formatting for the HDMI audio path.
Unlike video bugs, audio bugs are harder to inspect visually. A small synchronization issue can produce distorted, unstable, or inconsistent sound, especially when crossing from internal audio logic to the external HDMI or I2S output path.
The APU required careful handling of clocks, counters, channel state, sample timing, and reset behavior. One important part of the project was comparing simulation behavior with real FPGA behavior, because audio problems sometimes appeared only after synthesis or after restarting the board.
FPGA Integration Process on Quartus
The design was integrated and synthesized using Intel Quartus for a Cyclone V FPGA board. The FPGA integration process involved connecting the generated CPU, PPU, APU, memory interfaces, HDMI driver, clocks, reset logic, and physical board pins into a single top-level hardware design.
Quartus integration required working with synthesis reports, timing analysis, pin assignments, resource usage, and generated IP blocks. The project also required debugging issues that only appear after place-and-route, such as timing violations, clock-domain problems, inferred memory behavior, or differences between simulation and real hardware.
This part of the project was especially important because a design can be logically correct in simulation but still fail on FPGA if the clocks, memories, or I/O paths are not handled correctly.
Saleae Logic Analyzer Debugging
A Saleae logic analyzer was used to debug real hardware signals directly from the FPGA board. This was essential for observing behavior that could not be fully understood from simulation alone.
For video debugging, the logic analyzer made it possible to inspect pixel clocks, synchronization signals, data enable behavior, and RGB output timing. This helped identify problems such as shifted pixels, unstable video timing, or incorrect signal sequencing.
For audio debugging, the Saleae was used to inspect I2S-style signals, including bit clock, word select, and serial audio data. This made it possible to compare expected audio timing with the actual signals produced by the FPGA.
HDMI Driver
The HDMI output path is built around an ADV7513 HDMI transmitter. The FPGA generates the video timing, synchronization signals, data enable signal, and RGB pixel data that are sent to the HDMI transmitter.
The HDMI driver is responsible for converting the internal video output of the system into a format accepted by a standard display. This includes producing stable horizontal and vertical synchronization, a correct pixel clock, and a valid active video region.
Audio integration also required connecting the APU output to the HDMI path, making the HDMI driver not only a video component but also part of the complete audio and video output system.
PLL 25 MHz Clock
The project uses a 25 MHz pixel clock for the HDMI video pipeline. This clock is required to generate a standard video output timing compatible with the external HDMI transmitter and display.
The PLL configuration is important because the video output depends on stable and predictable timing. If the pixel clock is unstable, incorrect, or not properly related to the rest of the system, the display output can become corrupted or fail entirely.
Managing the 25 MHz PLL also introduced clock-domain considerations. The system logic, video pipeline, audio logic, and memory interfaces do not always operate under the same timing constraints, so clean synchronization between domains was necessary.
LPDDR2 SDRAM and Its Limits
LPDDR2 SDRAM was explored as a way to support larger memory requirements than what could fit directly inside FPGA block RAM. This was especially relevant for larger programs, ROM data, or memory regions that exceed the practical limits of internal FPGA memory.
However, LPDDR2 SDRAM introduces significant complexity. Unlike simple internal memory, SDRAM access has latency, requires a controller, and often runs in a different clock domain from the CPU and video logic.
The main limitation is that timing-sensitive CPU or video accesses cannot always wait for unpredictable external memory latency. Some early boot or frequently accessed regions may need to be mirrored into local FPGA memory, cached, or carefully buffered. This made the SDRAM integration useful but not a simple replacement for fast internal memory.