



2384-29

#### ICTP Latin-American Advanced Course on FPGADesign for Scientific Instrumentation

19 November - 7 December, 2012

Clock domains - multiple FPGA design

KLUGE Alexander

PH ESE FE Division

CERN

385, rte Meyrin, CH-1211 Geneva 23

SWITZERLAND

# Clock domains – multiple FPGA design

#### Clock distribution: multiple FPGAs



#### **Clock distribution**



#### clock distribution/t<sub>co</sub> & t<sub>s</sub> /board 0-> 1



#### **Clock distribution**



#### clock distribution/t<sub>co</sub> & t<sub>s</sub> /board 1-> 0



#### **Clock distribution**



#### clock distribution/slow output board 0->1



#### clock distribution/fast output board 0->1



#### **Clock distribution**



#### clock distribution/fast output board 1-> 0



#### clock distribution/slow output board 1-> 0



#### Constraints

- Fulfilling FPGA internal constraints is not sufficient.
- Perform system simulations
- Logic can be too fast



- Data (20 bits) every \* 100 ns
- collision -> L0 (1µs)
- collision -> L2y or L2n (100 μs)



- Data (20 bits) every \* 100 ns
- collision -> L0 (1μs)
- collision -> L2y or L2n (100 μs)

#### Options:

Data pipeline until L2 with FIFO based on shift registers @ 10 MHz

20 bits \* 100 μs / 100 ns

20 bits \* 1000

= 20000 bits

Data pipeline with FIFO with shift registers
 @ 10 MHz
 20 bits \* 1000 = 20 000 bits



 Data pipeline with FIFO based on dual port RAM @ 10 MHz
 20 bits \* 1000 = 20 000 bits



FPGAs have RAM cells in addition to logic blocks



```
💮 💮 🚫 Kastor_extractor.vhd - /Volumes/akluge/cadence/spd/spd_rxcard/link_rx_card_2004_pascal/verilog_files/V25/verilog_altera/fastor
File Edit Search Preferences Shell Macro Windows
                                                                                                              clah
nce/spd/spd rxcarc/link rx; card 2004 pascal/verilog files/V25/verlog altera/fastor/fastor extractor.vnd 4060 bytes L: --- C: ---
library ises;
use iese.std logic 1164.all;
use iese.numerić std.all;
entity fito faster is
generic (fifo_depth
                             integer
           fifo_ptr_width :integer;
           fifo_width
                            integer
           ):
port ( reset_i
                          :in std_logic;
:in std_logic;
        c1z
        write
                          :in std_logic;
                          :in std_logic;
         mead
                          :in std_logic_vector (fife width-1 downto 0);
        data in
                          :out std_logic_vector `(fifo_width=1 downto 0);
:in unsigned (fifo ptr width=1 downto 0);
        daba_oub
        delay
        enabĺe
                          in std logic
);
end fifo factor;
architecture behavioral of fifo fastor is
type mem_array is array (integer range <>) of std_logic_vector(fife_width - 1 downto 0);
signal mem : mem_array(0 to (f fo_dep.h-1)); -- synthesis syn_ramstyle = "BLOCK_BAN"
attribute syn_ramstyle : string;
attribute syn_ramstyle of mem : signal is "BLOCK_RAM";
signal read_pointer :unsigned {fifo_ptr_width 1 downto 0};
signal write_pointer :unsigned {fifo_ptr_width=1 downto 0};
process (clk, reset_1)
if (clk'event and clk = '1') then
  if (write = C') then
   mem(to_integer(write_pointer))
elsif (enable = '1') then
                                                     <= (others => '0 );
      mom(to integer(write pointer))
                                                     data in;
   end if;
end if:
               it and c1k = '1') then
   if (enable = '1') then
                                      mem( .o_in .egen(read_point.en));
       dat_a_out.
   end if:
end if;
if (clk'event and clk = '1') then if (ceset_i = '0') then
     write pointer \Leftarrow (others \Rightarrow elsif (write = 1 and enable = 1) then
                                     <= (others => '0');
      write_pointer
                                     <= write_pointer + 1;</pre>
   end if;
end if;
if (clk'event and clk = '1') then
   if (reset_i = '0') then
     read_pointer <= delay; elsif (read ='1' and enable = '1') then
       read pointer
                                     <= read pointer + 1;
   end if:
end if;
end process;
end behavioral;
```

```
Tastor extractor.vhd - /Volumes/akluge/cacence/spd/spd_rxcard/link_rx_card_2004_pascal/verilog_files/V25/verilog_altera/fastor
  File Ecit Search Preferences Shell Macro Windows
                                                                                                                                                                                                                                                                                                                                                                                               I lelp
 ice/spd/spd_rxcard/link_rx_card_2004_bascal/verilog_files/V25/verilog_altera/fastor/fastor_ex_ractor.vnd 4060 bytes _: --- C:--
 library ieee;
use ieee.std logic 1164.all;
use ieee.numeric std.all;
entity fifo fastor is
qeneric (fife depth :integer;
                                       fifo ptr width :integer:
                                       fifo width :integer
in std_logic;
                              clk
                                                                                          :in std_logic;
                              write
                                                                                           in std logic;
                               mead
                              data_ir :in std_logic_vector (f:fo_width-1 downto 0);
data_out :out std_logic_vector (fifc_width-1 downto 0);
                                                                   :in unsigned (f_fo_plr_width-1 downto 0);
:in std_logic
                              delay
                               enable
end fifo faster;
architecture behavioral of fife fastor is
 type nem_array is array (integer range <>) of std_logic_vector(file_width - 1 downto 0);
 signal mem : mem array(0 to (fifo depth 1)); synthesis syn remstyle = "BLOCK DAM"
 attribute syn ramstyle : string;
 attribute syn_ramstyle of mem : signal is "BLOCK RAM":
signal read pointer :unsigned (fife ptr width-1 downto 0);
signal write pointer :unsigned (fife otr_width-1 downto 0);
begin
process (clk, reset_i)
hegin
(C, L, 1), L, \ldots, L, \ldots, L, \ldots, L, \ldots, L, L, \ldots, L,
```

```
signal read pointer :unsigned (fife ptr width-1 downto 0);
signal write pointer :unsigned (fife otr width-1 downto 0);
begin
process (clk, reset i)
begin
if \{clk'event \text{ and } clk = '1'\} then
  if (write = 0^{\circ}) then
      mem(to integer(write pointer)) <= (others => '0');
   elsif (enable = 1') then
      end if:
end 1f:
if \{clk^{\dagger}event \text{ and } clk = {}^{\dagger}1^{\dagger}\} then
  if (enable = 11) then
      data out
                              <= mem(to integer(read pointer));</pre>
  end if:
end if;
if \{clk\} event and clk = '1'\} then
   if (reset i = 10 ) then
                             write pointer
   elsif (write ='1' and enable = '1') then
     write pointer <= write pointer + 1;
  end if:
end 1f:
if \{clk^{\dagger} \text{ event and } clk = {}^{\dagger}1^{\dagger}\} then
   if (reset i = '0 ) then
     read pointer
                              <= delav:
   elsif (read ='1 and enable = '1 ) then
     read pointer
                             <= read pointer + 1:</pre>
   end 1f:
end if:
end process;
end behavioral:
```

```
library ieee;
use ieee.std logic 1164.all;
use ieee.numeric std.all;
entity fastor_extractor is
qeneric (fifo depth
                         :integer := 16;
         fifo ptr width :integer := 4;
                        :integer := 20
         fifo width
         ):
                         :in std_logic;
port ( reset_i
                         :in std logic:
       clk
                         :in std_logic_vector (9 downto 0);
       fastor0
                         :in std_logic_vector (9 downto 0);
       fastor1
                         :in std_logic :='0';
       10
       12y
                         :in std_logic :='0';
       12n
                         :in std_logic :='0';
                         :in unsigned (3 downto 0) := "1111";
       delay 10
                         :out std_logic_vector (9 downto 0);
       fastor delayed0
                         :out std_logic_vector (9 downto 0);
       fastor delayed1
                         :in std_logic
       enable
       );
end fastor_extractor;
architecture behavioral of fastor extractor is
component fifo fastor is
qeneric (fifo depth
                         integer:
         fifo ptr width :integer;
         fifo width
                         :integer
         );
port ( reset i
                         :in std_logic;
                         :in std_logic;
       clk
                         :in std_logic;
       write
                         :in std_logic;
       read
                         :in std_logic_vector (fifo_width-1 downto 0);
       data in
                         :out std_logic_vector (fifo_width-1 downto 0);
       data out
                         :in unsigned (fifo ptr width-1 downto 0);
       delav
                         :in std_logic
       enable
       );
end component;
signal fastor
                         :std_logic_vector (fifo_width-1 downto 0);
                         :std_logic_vector (fifo_width-1 downto 0);
signal fastor 10
                         :std_logic_vector (fifo width-1 downto 0);
signal fastor 12
                         std_logic;
signal 12yn
```

#### begin

```
fastor (19 downto 10) <= fastor1;
fastor (9 downto 0) <= fastor0;
                      <= 12v or 12n;
12vn
fastor delayed1
                       <= fastor 12(19 downto 10);</pre>
                       \Leftarrow fastor 12(9 downto 0):
fastor delayed0
fifo fastor 10: fifo fastor qeneric map(fifo depth, fifo ptr width, fifo width)
                             port map (reset i,
                                      clk
                                               => clk.
                                      write
                                              => '1',
                                      read => '1',
                                      data_in => fastor,
                                      data out => fastor 10,
                                      delav => delav \overline{10}.
                                      enable => enable
fifo_fastor_12: fifo_fastor generic map(4,2,20)
                             port map (reset i.
                                          => clk.
                                      clk.
                                      write => 10,
                                      read => 12yn,
                                      data in => fastor 10,
                                      data out => fastor_12,
                                      delay => (others => '0').
                                      enable => enable
                                       );
end behavioral;
```



Baseline = 384,525ns
Cursor-Baseline = -19,600ns

Name ▼

Clk

TimeA = 364,92

| 365,000ns



| TimeA = 364,92<br> 365,000ns | 5ns<br> 366,000ns | 367,000ns                                | 368,000ns | 369,000ns   | 370,000ns                              |
|------------------------------|-------------------|------------------------------------------|-----------|-------------|----------------------------------------|
|                              |                   |                                          |           |             |                                        |
|                              |                   |                                          |           |             |                                        |
| oo► XXooooo                  |                   |                                          |           |             |                                        |
|                              |                   |                                          |           |             |                                        |
|                              |                   |                                          |           |             |                                        |
| 000000000                    | 0000000000        | 000000000                                | 000000000 | 00000000    | 0000000000                             |
| 0000000000                   | 0000000000        | )))()()()()()()()()()()()()()()()()()()( | 000000000 | XXXXXXXXXXX | XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX |
| 00000                        | X                 | 00000                                    |           |             |                                        |
| 0                            |                   | <u> </u> 1                               |           |             |                                        |
| 0                            |                   |                                          |           |             |                                        |
| 00000                        |                   | 55555                                    |           |             |                                        |
|                              |                   |                                          |           |             |                                        |



Baseline = 367.150ns Cursor-Baseline = -1937.5ns. TimeA = 365,212,5ns365.300ns 65.200ns |365,400ns| 365,500ns 1365,600ns |365,700ns Name 🔻 ······<mark>→</mark> clk ±-----fastor 00000 55555 00000 -----<del>></del> 10 .....<mark>. 1</mark>2yn 🕀 ····· 🙃 read\_pointer 3 5 6 4 3 4 6 ⊕----• fastor\_l0 00000 ⊞·····• write\_pointer ii read\_pointer 0 ± fastor\_l2

00000





Baseline = 384,500ns Cursor-Baseline = -17,725ns Name ▼











# System level simulation



# 6 x 10

# System level simulation

3 x



- 60 ASICs: simplified behavioral
- 40 ASICs: full behavioral
- 5 FPGA: full behavioral
- 7 SRAMs: full behavioral
- 4 PCBs

#### What happens if we have speed problems?

- Often because of inadequate logic architecture/coding style
  - evaluate logic architecture
  - rewrite HDL code to adapt structure to better data throughput
  - insert pipeline structure often one clock cycle more latency does not matter
  - Understand the specifications
  - look for systematics which can help to simplify logic
  - adapt architecture and schematics/code
  - only then optimize placing & routing

#### What happens if we have speed problems?

- Often because of components too small and routing congestion
  - timing constraints
  - Routing constraint placement constraint
  - Use bigger/faster component

#### Conclusion

- FPGA application at CERN
  - data selection/trigger (muon track finder trigger)
  - data processing (pixel detector)
- Design cycle
- Defining Specifications
- Clock domains
- Data delay

#### Additional slides

Alexander.kluge@cern.ch

• http://akluge.web.cern.ch/akluge