Skip to content
Snippets Groups Projects
  • ganoam's avatar
    c69ebadc
    Add FPGA Optimized Register File Version · c69ebadc
    ganoam authored
    
    Add a register file, optimized for synthesis on FPGAs supporting
    distributed RAM.
    
    Principle:
    
    The baseline implementation implements the register file as an array of
    flip-flops and implements large multiplexers for read- and write-
    accesses. On FPGAs, we have a more efficient implementation for data
    storage: By using distributed RAM for memory storage, we can store up
    to 64 bits in just one LUT (depending on the memory layout and FPGA
    device). In addition, distributed RAM comes with integrated address
    decoders. The register file features one distributed RAM block per
    implemented sync write port, each with the parametrized number of
    async read ports. The read access is arbitrated depending on which
    block was last written to. For this purpose an additional array of
    *NUM_WORDS* registers is maintained keeping track of write accesses.
    
    Since both FFs and multiplexers are an expensive structure on FPGA
    technology, the achieved savings are considerable. The register file
    is used for the FPU and general purpose register files.
    
    Concrete Savings: (Xilinx Kintex-7, xc7k325tffg900-2)
    
    ```
                LUT    FF      LUTRAM
    ---------------------------------
    baseline:   40499  22799   0
    optimized:  36350  18806   440
    ---------------------------------
    Diff        -4149  -3993   +440
                -10.2% -17.5%
    ```
    
    Signed-off-by: default avatarganoam <gnoam@live.com>
    c69ebadc
    History
    Add FPGA Optimized Register File Version
    ganoam authored
    
    Add a register file, optimized for synthesis on FPGAs supporting
    distributed RAM.
    
    Principle:
    
    The baseline implementation implements the register file as an array of
    flip-flops and implements large multiplexers for read- and write-
    accesses. On FPGAs, we have a more efficient implementation for data
    storage: By using distributed RAM for memory storage, we can store up
    to 64 bits in just one LUT (depending on the memory layout and FPGA
    device). In addition, distributed RAM comes with integrated address
    decoders. The register file features one distributed RAM block per
    implemented sync write port, each with the parametrized number of
    async read ports. The read access is arbitrated depending on which
    block was last written to. For this purpose an additional array of
    *NUM_WORDS* registers is maintained keeping track of write accesses.
    
    Since both FFs and multiplexers are an expensive structure on FPGA
    technology, the achieved savings are considerable. The register file
    is used for the FPU and general purpose register files.
    
    Concrete Savings: (Xilinx Kintex-7, xc7k325tffg900-2)
    
    ```
                LUT    FF      LUTRAM
    ---------------------------------
    baseline:   40499  22799   0
    optimized:  36350  18806   440
    ---------------------------------
    Diff        -4149  -3993   +440
                -10.2% -17.5%
    ```
    
    Signed-off-by: default avatarganoam <gnoam@live.com>