# Data Acquisition in Particle Physics

#### Igor Konorov

Institute for Hadronic Structure and Fundamental Symmetries (E18)

**TUM Department of Physics** 

Technical University of Munich

Advanced Workshop on FPGA based System-on-Chip for Scientific Instrumentation and Reconfigurable Computing ICTP Trieste





### **CERN** Accelerator and Experiments



# ЛШ

# **CERN** Accelerators

#### LHC Experiments

- CMS
- ATLAS
- LHCb
- ALICE
- TOTEM, LHCf, MeEDAL

#### **Fixed Target Experiments**

- COMPASS
- NA61/SHINE
- NA62
- DIRAC
- LOUD
- ...



# LHC CMS Experiment





CMS Experiment at the LHC, CERN Sun 2011-Aug-07 05:00:32 CET Run 172822 Event 2554393033 C.O.M. Energy 7.00TeV H>ZZ>4mu candidate







### CMS Experiment



#### 3 other LHC experiments

- ATLAS
- LHCb
- ALICE



### **COMPASS** Experiment



# ТЛП

# Data Acquisition System

- The process of sampling detector signals
- Conversion to digital form
- Data processing
- Transmission to PC for further processing





TRIGGER – define time when amplitude value to be copied Data path delay should be equal to trigger delay with correction for time of flight !

# Triggered DAQ



At certain time

- when something interesting happened





# Take time measurement=> TRIGGERED DAQPhoto shooting=> TRIGGERED DAQ



TRIGGER – define time when amplitude value to be copied Data path delay should be equal to trigger delay with correction for time of flight !

Why Triggered and not continuous ?



# **Trigger-less DAQ**

Take everything

A time when something interesting happen is not easy to define





#### Taking video => TRIGGERLESS DAQ



TRIGGER – define time when amplitude value to be copied Data path delay should be equal to trigger delay with correction for time of flight !

Why Triggered and not continuous ?

- Feasibility to handle continuous stream
- No need to collect all data



TRIGGER – define time to measure a value

Data path delay should be equal to trigger delay with correction for time of flight !

How much data the system should be able to take?

Probability mass function for Poisson distribution:  $P(k) = \frac{\lambda^k e^{-\lambda}}{k!}$ Where  $\lambda$  – average number of events, k – number of occurred events

The system should take as often as maximum trigger frequency



TRIGGER – define time to measure a value

Data path delay should be equal to trigger delay with correction for time of flight !

How much data the system should be able to take?

Probability mass function for Poisson distribution:  $P(k) = \frac{\lambda^k e^{-\lambda}}{k!}$ Where  $\lambda$  – average number of events, k – number of occurred events

The system should take as often as maximum trigger frequency

### **DAQ Architecture in Particle Physics**



### **DAQ** Architecture in Particle Physics





# Efficiency of data taking

RO Sequence : Trigger -> Busy - Read Out -> Release Busy (Ready for next event) = T busy Probability of events described by Poisson distribution

$$q_j(t) = \frac{r(rt)^{j-1} e^{-rt}}{(j-1)!}$$

J – number of triggers and r – trigger rate





# **Pipe Line Front Ends**



Instrumentation



# DAQ efficiency vs FIFO Depth





# **Data Flow DAQ Architecture**



# **DAQ** Architectures

Data

Storage/CDR

PC

 $\geq$ 

PC

Trigger

Logic



Logic

Storage/CDR

Advanced Workshop on FPGA based System-on-Chip for Scientific Instrumentation

PC

Storage/CDR

# **DAQ Elements**

□ Front-end electronics, detector specific

- Conversion of detector analog signal to digital form
- Derandomization
- Data processing: signal detection, extraction of signals' parameters Time and/or Amp...
- □ Trigger Logic
  - reduce amount of stored data
  - define time when of interesting event
- □ Trigger Distribution system => Time Distribution System
- □ Slow Control System
  - Control and monitoring of PS, Gas system, Temperature, Humidity,...
  - Programming of Front-ends
- □ Acquisition System => Event builder
  - Data acquisition moving data from FE to PCs
  - Data flow control
  - Real time Software
  - Run control



# ТШ

#### **Time Distribution and Time Measurement**

### Time measurement



Classical method:

- TRIGGER is a reference
- SIGNAL time is measured respectively to TRIGGER signal

#### Alternative method for big experiments:

- Distribute CLOCK , why clock?
  - Easier to distribute with very low jitter
- Measure absolute time respectively to CLOCK phase

 $T_{sig} = N_s T_{clk} + t_{sig}$  $T_{trg} = N_t T_{clk} + t_{trg}$ 

Clock and Data are encoded and transmitted from single source to multiple destinations

#### NA48, LHC->TTC, COMPASS->TCS





# **TDC** types



### • Time stretching :

- Time measurement between START and STOP
- Fast charging of capacitor with reference current => slow discharging
- Time to amplitude converters :
  - charging capacitor with reference current => ADC to measure amplitude
- ASIC TDC: Delay Locked Loop based TDC
- FPGA Counter as a simple TDC
- FPGA TDC

# Counter as TDC

ПЛ







# Delay Lock Loop TDC, HPTDC





Transistor parameters vary from chip to chip and even within one chip

DLL compensates variation of transistor parameters from chip to chip and due to voltage and temperature

- Resolution 25ps
- 32channels/chip

#### Wu, Jinyuan, Fermilab

# FPGA based TDC1





# Tapped Delay TDCs



Jinyuan Wu and Zonghan Shi



#### Fermilab

# Tapped TDC





### Time Resolution is 0 ps using Virtex 4 chip



# FPGA based TDC, next step

#### Jinyuan Wu and Zonghan Shi Fermilab

ALTERA Cyclone II EP2C8T144C6 Tap delay: 60 ps Ultra-wide bin: 165 ps Main clock : 400 MHz





# FPGA based TDC next step



Wavelet launcher:

- Input pulse unleash bit pattern
- Multiple measurement





# FPGA based TDC next step



# **Tapped TDC Resource Utilization**



Device utilization summary and power consumption.

|                           |                   | 1024-Unit TDC |             | 512-Unit TDC                  |             |
|---------------------------|-------------------|---------------|-------------|-------------------------------|-------------|
|                           | Available         | Used          | Utilization | Used                          | Utilization |
| Slice Registers           | 69,120            | 1410          | 2%          | 602                           | 0.9%        |
| Slice LUTs                | 69,120            | 666           | 1%          | 327                           | 0.5%        |
| Occupied Slices           | 17,280            | 1265          | 7%          | 652                           | 3.7%        |
| Bonded IOBs               | 640               | 25            | 3%          | 25                            | 3%          |
| Block RAM/FIFO            | 148               | 2             | 1%          | 2                             | 1%          |
| Clock Resources           | 32                | 4             | 12%         | 2                             | 6%          |
| Number of routed lines    |                   | 13,127        |             | 4937                          |             |
| Dynamic Power Consumption |                   | 23 mW         |             | 9 mW                          |             |
| Total Power Consumption   |                   | 1.113 W Adv   |             | 1.087 W<br>vanced Workshop or |             |
|                           | System-on-Chip fo |               |             |                               |             |
|                           |                   |               |             |                               | Instrumenta |



## Problems of Tapped FPGA TDC

- Detailed analysis of FPGA circuit layout
- Advanced usage of placement constrains
- Variation of bin sizes due to differences in propagations through logic elements
- Consumes quite a lot of FPGA fabric resources

References :

- Jinyuan Wu et all. The 10-ps wave union TDC: Improving FPGA TDC resolution beyond its cell delay. November 2008 IEEE Nuclear Science Symposium conference record. Nuclear Science Symposium. DOI: 10.1109/NSSMIC.2008.4775079
- 2. Min Zhang et all. A 7.4 ps FPGA-based TDC with a 1024-unit measurement matrix April 2017. Sensors 17(4):865. DOI: 10.3390/s17040865

## **DeSerializer as TDC**







Features:

- LVDS input
- 64 taps for delay adjusting, one tap 7
- Delay controlled by Reference Clock





# Clustering

### **DEPFET PXD Detector for Belle2 Experiment**



### **Detector Readout**



### Simple requirements:

Merging direct neighbors :



- Pixel array 768x250
- Real time processing
  - 4 streams 50 10^6 pixel/second = 2 10^8 pixel/s
- Latency is not important
- Cluster data processing :
  - Center of gravity for ROI
  - Marking clusters created by low momentum particles



• Most common case:

Clustering for calorimeters with predefined shape 2x2,3x3, 5x5...

- General purpose real time DCE3 clustering algorithm
  - Parallel clustering
- General purpose clustering(A.Annovi, M.Beretta) for ATLAS pixel detector

Algorithm:

Each detector pixel is presented as FSM(Finite State Machine) Detector of NxM pixels requires NxM FSMs

Clustering procedure:

Initialization , loading FSMs by hit information : EMPTY, HIT Readout :

- external FSM selects first not empty Pixel and reads it
- SELECT signal propagates to neighboring FSMs for further readout
- this procedure is repeated till all neighboring pixels are readout



Problem: amount of hardware scales linearly with number of pixels and very fast uses up all FPGA resources:

| grid size | clock period | area usage |
|-----------|--------------|------------|
| 8x8       | 6ns          | 1%         |
| 120x8     | 13ns         | 5%         |
| 32x32     | 13ns         | 6%         |
| 64x32     | 16ns         | 11%        |
| 256x8     | 15ns         | 11%        |
| 328x8     | 17ns         | 16%        |
| 120x32    | 20ns         | 21%        |

#### Solution to the problem : "Sliding Window". Window is bigger than any cluster

Table 1: Algorithm performances on a xc5vlx330 FPGA.



FPGA : XC5VLX155 Window : 328x8 30 % FPGA resources Speed : >20 Mhits/s

## New clustering algorithm for FPGA

#### Algorithm takes advantage of detector readout feature:

- sequential data transmission
- limited data rate: not more than 4 x 76 MPix/s
- Ordered hits readout sequence, almost row wise:
  data mixed within 4 consecutive rows
- no latency requirements

#### Clustering algorithm features:

- Hit information is analyzed once and cluster number assigned
- Following processing steps shuffle hits using cluster number information
- Clustering algorithm reconstructs any cluster shape within half ladder
- Pipeline design real time operation



### **Clustering FSMs**





#### What **FSMs** do?

- Each FSM responsible for hits of two columns
- Process one hit in one clock cycle
- Evaluate hit cluster number
- Write hit together with cluster number to hit memory
- Store cluster number in cluster memory
- When two clusters touch each other the lowest cluster number is taken over

### FSM behavior is described for all cases

- 1. FSM is not active, hit arrives
- 2. FSM is active, no hit
- 3. FSM is active and new hit arrives
- 4. FSM is active, there was no hit belonging to any of these two rows within this column and current hit is a first belonging to new column

## ПΠ

## **Examples of FSMs actions 1**



#### Next cluster counter



#### Cluster memory

| Addr. | Value |
|-------|-------|
| 1     | -     |
| 2     | -     |
| 3     | -     |
| 4     | -     |
| 5     | -     |

## Examples of

### **Examples of FSMs actions 1**



#### Next cluster counter



#### Cluster memory

| Addr. | Value |
|-------|-------|
| 1     | 1     |
| 2     | -     |
| 3     | -     |
| 4     | -     |
| 5     | -     |

## ПЛ

## **Examples of FSMs actions 1**



#### Next cluster counter



#### Cluster memory

| Addr. | Value |
|-------|-------|
| 1     | 1     |
| 2     | 2     |
| 3     | -     |
| 4     | -     |
| 5     | -     |



#### Next cluster counter



#### Cluster memory

| Addr. | Value |
|-------|-------|
| 1     | 1     |
| 2     | 2     |
| 3     | 3     |
| 4     | -     |
| 5     | -     |

## ПП

## **Examples of FSMs actions 1**



#### Next cluster counter



#### Cluster memory

| Addr. | Value |
|-------|-------|
| 1     | 1     |
| 2     | 2     |
| 3     | 3     |
| 4     | -     |
| 5     | -     |

## ПП

## **Examples of FSMs actions 1**



#### Next cluster counter



#### Cluster memory

| Addr. | Value |
|-------|-------|
| 1     | 1     |
| 2     | 2     |
| 3     | 3     |
| 4     | 4     |
| 5     | -     |



#### Next cluster counter



#### Cluster memory

| Addr. | Value |
|-------|-------|
| 1     | 1     |
| 2     | 2     |
| 3     | 3     |
| 4     | 4     |
| 5     | 5     |

## Examples

### **Examples of FSMs actions 1**



#### Next cluster counter



#### Cluster memory

| Addr. | Value |
|-------|-------|
| 1     | 1     |
| 2     | 2     |
| 3     | 3     |
| 4     | 4     |
| 5     | 5     |

#### Next cluster counter



### 

#### Cluster memory

| Addr. | Value |
|-------|-------|
| 1     | 1     |
| 2     | 2     |
| 3     | 3     |
| 4     | 4     |
| 5     | 4     |



Cluster numbers are updated using cluster memory during hit readout.





#### Next cluster counter



#### Cluster memory

| Addr. | Value |
|-------|-------|
| 1     | 1     |
| 2     | 2     |
| 3     | 3     |
| 4     | 4     |
| 5     | 5     |



### **Extreme case**





#### Cluster memory

| Addr. | Value |
|-------|-------|
| 1     | -     |
| 2     | -     |
| 3     | -     |
| 4     | -     |
| 5     | -     |
| 6     | -     |
| 7     | -     |
| 8     | -     |
| 9     | -     |





Cluster memory content has to be updated to map all cluster numbers to smallest number



Cluster memory

| Addr. | Value |
|-------|-------|
| 1     | 1     |
| 2     | 1     |
| 3     | 2     |
| 4     | 3     |
| 5     | 4     |
| 6     | 5     |
| 7     | 6     |
| 8     | 7     |
| 9     | 8     |



Update starts from cluster #3 to maximum cluster # Maximum look down length is 1 clusters



One update takes 3 clock cycles Using DP memory allows to reach 2 clock cycles per update

## **III** Broadcast Cluster Number change (1)



## **TIP** Broadcast Cluster Number change (2)



## **TIP** Broadcast Cluster Number change (3)



## **TIP** Broadcast Cluster Number change (4)



## **TIP** Broadcast Cluster Number change (5)



## **TIP** Broadcast Cluster Number change (6)



## **TIP** Broadcast Cluster Number change (6)





### Clustering FSMs 64 columns and 4k clusters:

7% of Slices of XC6VLX130T 1% of Memory blocks (6 out of 264) 100 Mhits/s clock

### Clustering FSMs 128 columns and 4k clusters:

13% of Slices1% of Memory80 Mhits/s

### Clustering for 250 columns and 8k clusters: 30% of slices 30% memory blocks 100 Mhits/s



Software:

. . . . . . . . . . . . . .

. 1 1 1 1 1 1 1 1 1 . 1 1

. . . . . . . .

. . . . . . . . . . . . .

. 1 1 1 1 1 1 1 1 1 . . 1 . 1 1 1 . . . . . . . .

#### Compare hardware and software results

|    |     |     |     |    |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   | - |
|----|-----|-----|-----|----|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Ha | aro | dwa | are | e: |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
| •  | •   | •   | •   | •  | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | ÷ | • | • | • | • | • | • | • | ÷ | • | • | • |
| ٠  | •   | •   | •   | •  | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • |
| •  | •   | •   |     |    |   |   | • | • |   | • |   |   | • | • | • | • |   |   | • | • | • |   |   | • | • |   |   | • | • | • | • |
|    |     |     |     |    |   | 1 |   |   | 1 |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
|    |     |     |     |    |   | 1 | 1 | 1 | 1 | 1 |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
|    |     |     |     | 1  | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |   |   | 1 |   | 1 | 1 |   |   |   |   |   |   |   |   |   |   |   |   |   |
|    |     |     |     |    |   |   |   |   | 1 | 1 |   |   |   |   |   |   |   | 1 |   |   |   |   |   |   |   |   |   |   |   |   |   |
| -  |     |     |     | 2  |   | _ | _ | _ | 1 | 1 | 1 |   | 1 |   |   |   |   | 1 |   |   |   |   |   |   |   |   |   |   |   |   |   |
|    |     |     |     |    |   |   |   |   | - |   | - | - | - |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
|    |     |     |     | •  |   |   |   |   |   |   | 1 | _ | 1 |   |   |   |   | • |   |   |   |   |   |   |   |   |   |   |   |   |   |
| ٠  | •   | •   | •   | •  | • | • | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | • | 1 | 1 | • | 1 | • | • | • | • | • | • | • | • | • | • | • |
| •  |     |     |     |    |   | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |   |   |   |   |   | • |   |   |   |   |   |   |
|    |     |     |     |    | 1 |   |   | 1 |   | 1 |   |   | 1 | 1 | 1 | 1 | 1 | 1 |   |   |   |   |   |   |   |   |   |   |   |   |   |
|    |     |     |     |    |   |   |   |   |   |   |   |   | 1 | 1 | 1 | 1 | 1 | 1 |   |   |   |   |   |   |   |   |   |   |   |   |   |
|    |     |     |     |    |   |   |   |   |   |   |   | 1 | 1 | 1 | 1 |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
|    |     |     |     |    |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
|    |     |     |     |    |   |   |   |   |   |   |   |   |   |   |   |   |   | 2 |   |   |   |   |   |   |   |   |   |   |   |   |   |
| •  |     |     |     |    |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
| •  |     | •   | •   | •  | • | • | • | • | • | • |   | T | • | • | • | • | • | • | • | • |   |   | • | • | • | • |   |   | • | • | • |
| •  | •   | •   | •   | •  | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • |
| •  | •   | •   |     |    | • | • | • | • | • | • |   |   | • | • | • | • | • |   | • | • | • | • | • | • | • | • |   | • | • | • | • |

#### Test of algorithm





## **Clustering FSMs**



<u>column</u>

