#### INTERNATIONAL ATOMIC ENERGY AGENCY United Nations Educational, Scientific and Cultural Organization INTERNATIONAL CENTRE FOR THEORETICAL PHYSICS I.C.T.P., P.O. BOX 586, 34100 TRIESTE, ITALY, CABLE: CENTRATOM TRIESTE #### UNITED NATIONS INDUSTRIAL DEVELOPMENT ORGANIZATION #### INTERNATIONAL CENTRE FOR SCIENCE AND HIGH TECHNOLOGY CO INTERNATIONAL CENTRE FUR THEORETICAL PHYSICS. SIND TRESSE (TALT) VIA GESCHAND, 9 (ADRIATICO PALACE) P.O. BOX 506 TELEPHONE 040-DIS72. TELEPAX 040-DIS73. TELEX 40449 APRI I SMR/474 - 22 #### COLLEGE ON "THE DESIGN OF REAL-TIME CONTROL SYSTEMS" 1 - 26 October #### DIGITAL SIGNAL PROCESSING D. CROSETTO C.E.R.N. SPS Division CH-1211 Geneva 23 Switzerland These are preliminary lecture notes, intended only for distribution to participants. #### **COLLEGE ON** "THE DESIGN OF REAL-TIME CONTROL SYSTEMS" 1-26 October 1990 **DIGITAL SIGNAL PROCESSORS** INTERNATIONAL CENTRE FOR THEORETHICAL **PHYSICS** ### **DIGITAL SIGNAL PROCESSORS** - Digital Signal Processors are special purpose microprocessors optimized for the execution of digital signal algorithms. They are traditionally designed for performance, not extensive functionality nor programmer convenience. #### 1. Historical evolution of DSP. | - 1stDSP | 1978 | AMI S2811 | |----------|------|---------------------------------| | - | 1979 | INTEL 2920/21 (Telecom.) | | - | 1979 | Bell Labs DSP1 (not mark.) | | - | 1980 | NEC uPD7720 | | - | 1980 | <b>Analog Devices ADSP-2100</b> | | - | 1981 | Hermes (not marketed) | | - | 1982 | Hitachi 61810 | | - | 1982 | Texas 32010 | The last line represents a great widening of the applicability spectrum due to external reprogrammability, ideal for low volume applications. In the beginning most DSP's were distinguishable from other microprocessors due to their characteristics of: - 1) Harvard architecture (separation between Program and Data memories) - 2) internal and very small Program and Data memory area - 3) small instruction sets, and mostly executable in one cycle (for this reasons similar to RISC) - 4) special hardware units for treatment of digital signals (such as: parallel multiply, barrel shifting, auxiliary registers for single cycle manipulation of data tables, etc.) # Characteristics of different chips #### 2. Basic elements of a real-time processor. - ALU (one or more, for Addr and Data) - (Floating Point Unit) - Control Unit - Program RAM or ROM - Data RAM - Parallel I/O Controller (DMA) - Serial I/O Controller (DMA) - (A/D and D/A Converters) - Interrupt - After performance comparison with other components it will become clearer why the DSP is more suitable for several types of applications. - There are several ways to realize a concurrent system of many basic elements listed above, each one having a different throughput, privileging in one case one aspect respect to another - The most important thing to do in selecting a component in a certain application is to know the characteristics of all the components that can solve the same problem in order to make balanced judgement. #### 2.1. Characteristics of DSPs. - In recent years we see that the characteristics of the DSP's are improving very rapidly. - no one features of the past was dropped (hardware multiplier, special instructions, etc.) - in addition today's DSPs use extensive pipelining, several independent memories with large address capability, parallel function units (one cycle floating point instruction), and hardwired design (not microprogrammed). - Applications in this field are increasing so rapidly that at present a classification among the hardware of the DSPs must be made (section 4). - Among the several DSPs existing on the market today, I will describe one from the "General Purpose DSP family". Not with the intention to make any preferences, but just due to the fact that in the past I have used Texas and AT&T General Purpose DSPs, as example I will describe now the Motorola DSP96000. - The best DSP for your application will certainly be the one that has the best rate performance/price. - Some of the characteristics that make DSP's particularly suitable to treat discrete signals are found in its instruction set. - Several presently available DSP's have hardware "DO LOOP" instructions, - have bit-reverse addressing - -can perform a simple operation y = ax+ b in one cycle (75 nsec) while at the same time performing some operations on addresses by updating pointers. E.g. a single line of assembly code of the Motorola DSP96000 (ideal for Butterfly FFT calculation), FMPY D9,D7,D1 FADDSUB.S D5,D2 D4.S,X:(R5)+ Y:(R1)+,D7.S executed in the cycles, will generate the results of multiplication, addition, subtraction between two terms and will update the pointers to the data in the memory. Simplified block diagram of a General Purpose DSP ## MICROCONTROLLERS versus DSP #### MICROCONTROLLER versus DSP - A "microcontroller" contains all the necessary components of a complete system on one piece of silicon (E.g. Intel 8051, Motorola MC6804, MC6805, MC68HC11, etc.). The microcontroller has less performance than a DSP, has 4, 8, 16-bit, - has an instruction set more like CISC processor (using more then one cycle per instruction). - has some extra programmable peripherals on chip, like A/D converters are not available on DSP. - is not designed to build concurrent systems - is intended to be used for economical applications in embedded systems where is necessary only to have the capability of one of the most common 8-bit or 16bit microprocessor instruction sets. #### **Applications:** - industrial control, device controller (printers, plotters, etc.) #### TRANSPUTER versus DSP # 2.3. TRANSPUTER versus DSP. - A Transputer contains in a single chip: - an integer processor - a Floating Point Unit - 4 Kbyte of memory - 4 high speed serial links (20 Mbit/sec) - Transputer is designed as a programmable component to implement a system with much higher degree of concurrency then is currently common. The formal rules of Occam provide the design methodology for this family of concurrent systems. Special instructions divide the processor time between the concurrent processes, and perform interprocessor communication. With the Transputer it is easier to build concurrent systems because of the good coordination between hardware and software (Occam), it is easy to transport software on different concurrent systems with different numbers of transputers. DSPs have the performance of 20 to 40 Mflops, the T800 Transputer have 4.5 Mflop. #### RISC versus DSP #### 2.4. RISC versus DSP. - Initial simple concepts of a register-intensive cpu design from Seymour Cray in 1960 for CDC 6600 - modern notion of RISC architectures emerged from John Cocke's project at IBM in 1970. - Cocke's team goal was to design the best CPU architecture for an optimizing compiler, - 1) the machine should be register-to-register with only load and store accessing the memory, - 2) the architecture eliminated microcode and microsequencers in favor of simple, hardwired, pipelined, one-instruction-per cycle CPU design. - RISC technology created an almost insatiable demand for memory speed. - 1) The answer to this problem comes with high performance memory hierarchy, including general purpose registers and cache memories, - 2) instruction set is regular and simple with few addressing modes: indexed and PC-relative. - RISC variations from these common theme. - IBM 1975 with 801 minicomputer - BERKELEY 1980 with RISC I and RISC II - STANFORD 1981 with MIPS (Microprocessor Without Interlocked Pipeline Stages) IBM and Stanford pushed the state of art in Compiler Technology to maximize the use of registers. The BERKELEY team did not include compiler experts, so a hardware solution was implemented to keep operand in registers. To optimize the task switching time they have defined many sets or windows of registers (global and local) so that registers would not have to be saved on every procedure call. The disadvantage of register windows is that they use more chip area. - Clipper recognize the growing memory bandwidth. Their solution was to separate instruction and data buses. The MC88000 follows the dogma of simple, one-cycle, fixed-length instructions and load/store architecture. The MC88000 have the system's ability to incorporate new, specialized execution units. Liffle-endian IBM RISC CLIPPER 29000 R3000 SPARC 980 Type[VLS] chips S G) 84 x 3 per Cycle inst. 30 Whz 30 Mhz 20 Mhz 25 Mhz ß 33 Mhz Byte2 | Bute & | Bute 33 Mhz Mhz Clock von Neumann Architecture Regs llarvard liarvard Harvard Harvard Harvard Harvard <520 Register windows and delayed branch instr. 64 Special instr. for graphic processing, 3D graphic Unit 195|No Direct Cache support branch cache 32 Special bit-field instructions. Division, SQRT op. 48/64k icache, 64k dcache. On chip 64 TLB. Clever Compilers. Bus transfer rate up to 480 Wbyte CPU = 3 processors (ICU, FXU, FPU) 72/8k icache, 8k dcache Supports Byte ordering formats: big-endian, little endian MMU (4Gbyle), FCU, FAU, FMU, 4Kb icache, 8k dcache Addr. 8 Gbyte for Oper. System. 4 Gbyte for user Supports Byte ordering formats: big-endian, little endian On chip 64 TLB Microprocessor Without Interlocked Pipeline Stages. Need simpler compilers. Supports the Big-endian format Comments Onchip 2 x 64 71.B A.T. Other RISC's vendors: MIPS: troller). family. HARRIS RTX2000 (highly integrated FORTH-executing microco LSI Performance Semiconductor, Device Technology. Acorn Sanyo for VL86C010. Hewlett-Packard with the Apollo Domain 10.000. AMD 290 d SPARC from Fujitsu, Bipc ar Integrated Technology, Cypro CISC (CRISP) versus DSP Ext. Interrupts RAM ROM 32-bit 8, 16, 32-bit DMA 210 PIO von Neumann arch. Amplified block diagram of a several cycles per instr. (CRISP a few instructions per cycle) complex instructions - microcoded ထ် - MMU - floating point - Multitasking and (cache) Multiprocessor Support (one)- Ext. Interrupts ## 2.5. CISC (CRISP) versus DSP - CISC (Complex Instruction Set Computer) use a large amount of hardware complexity to provide high degree of instruction set capability. - -large instruction set (some very complex instructions) - different length and execution time of instruction - Instructions can manipulate bit, byte, word and long word. - The dynamic bus interface allows for simple, highly efficient access to devices of different data bus width. - support, directly via BUS Monitoring, Multimaster and Multiprocessor applications. - RISC and CISC may become more alike in the future. RISC is a technology, a philosophy of design, not a product. Some design techniques that have been applied to RISC machine can be applied to CISC architecture to improve performance. - Processors like the 32532 with Intel 80486 and Motorola 68040, incorporate more RISC-like features to push the number of cycles for most of the instructions below 2. These new features will probably characterize the new type of processor as CRISP (Complexity-Reduced Instruction Set Processor) #### High Performance EMBEDDED CONTROLLERS versus DSP # High performance EMBEDDED CONTROLLERS versus DSP. This architecture has been designed to meet the need of embedded applications (machine control, robotics, process control, avionics, and instrumentation). - These type of applications require high integration, low power consumption, quick interrupt response time and high performance. - Intel chips (80960) are based on a RISC core architecture. Each processor in the series will add its own special set of functions to the core to satisfy the need of a specific application or range of applications in the embedded market. For example, future processors may include DMA controllers, timers, or an A/D converter. - other characteristics are: large register set, Fast instruction execution, load/store architecture, simple instruction format, overlapped instruction execution, integer execution optimization. - The Motorola MC683xx family combines the high performance of M68000 family microprocessor with intelligent data-handling peripherals on a single chip. - In one chip (32-bit) besides the CPU, there are: DMA controller, a timer module, a serial I/O module, a system interface module, and a 16-bit data port. Instructions are similar to the M68000 Family and need several cycles per instruction. - 3 Future trends in microprocessors. Fig 1—The multiply-accumulate (MAC) times for the DSPs match the execution times for the filters. Filters. FLOATING POINT **FIXED POINT** FLOATING POINT FIXED POINT | e Tec. The | <b>Gys</b> I-s: | P-m | D-m | Nbus | ALU | Other feat. | |---------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | CMOS 80<br>CMOS | 24 | 32K | 16K | 1 | 16-Fix | | | CMOS<br>CMOS 15<br>CMOS 40<br>CMOS 20 | 16<br>32<br>32 | 2k | 4k | 2 | 16-Fix<br>32-Flo<br>32-Flo | SIO, PIO, DMA | | CMOS 100<br>CMOS | 24 | lk | lk | 1 | 26-Fix | Like 8764 fewer-p | | CMOS 250<br>CMOS 50 | 16<br>16 | | | | 16-Flo | Fast I/O | | CMOS 100 | 16 | | | | 16-Fix | Filters | | HCMOS 97 | 8 | 256<br>256<br>512 | 256<br>256<br>2k | 7<br>1<br>8 | 24-Fix<br>16-Fix<br>32-Flo | Filters | | CMOS 100 | 28 | | | 7 | 32-Fix | | | CMOS<br>CMOS 150<br>CMOS 100 | 32<br>24 | 2k | 2k | 2 | 55-Flo<br>24-Fix | Data Flow<br>subset of 77230 | | CMOS 100 | 22 | | | | 22-Flo | | | CMOS 125 | 16 | | | 2 | 16-Fix | | | CMOS 360 | 32 | | | | 32-Fix | | | 250 | 16 | | | | 16-Fix | | | 160<br>100 | 16<br>16<br>32 | 144 | | 1 | 32-Fix<br>32-Fix<br>32-Flo | | | 50 | 16 | | | 1 | 16-Fix | | | - | | | | 1 | 16-Flo<br>32-Flo<br>32-Flo | IEEE-Flo | | | CMOS 80 CMOS CMOS 15 CMOS 15 CMOS 40 CMOS 100 CMOS 100 CMOS 100 HCMOS 75 HCMOS 97 HCMOS 75 CMOS 100 | CMOS 80 CMOS 15 16 CMOS 15 16 CMOS 20 32 CMOS 100 24 CMOS 50 16 CMOS 75 HCMOS 75 HCMOS 100 28 CMOS 100 28 CMOS 100 24 CMOS 100 24 CMOS 100 22 CMOS 100 22 CMOS 100 22 CMOS 100 22 CMOS 100 32 CMOS 100 32 CMOS 100 32 CMOS 100 32 CMOS 100 32 CMOS 100 16 16 160 16 100 16 32 50 16 CMOS 100 16 100 16 100 32 | CMOS 80 CMOS CMOS CMOS 15 CMOS 15 CMOS 40 CMOS 20 CMOS 20 CMOS 20 CMOS 20 CMOS 100 CMOS 100 CMOS 100 CMOS 75 HCMOS 75 HCMOS 75 HCMOS 150 CMOS 150 CMOS 150 CMOS 100 C | CMOS 80 CMOS CMOS 15 CMOS 15 CMOS 15 CMOS 20 16 22 2k 4k CMOS 20 32 2k 4k CMOS 100 CMOS 250 CMOS 50 16 2k 1k CMOS 100 CMOS 50 16 2k 256 256 256 256 256 256 256 256 256 256 | CMOS 80 CMOS 24 32K 16K 1 CMOS CMOS 15 CMOS 15 CMOS 40 CMOS 20 16 CMOS 40 CMOS 20 22 k 4k 2 CMOS 100 CMOS 50 16 CMOS 50 CMOS 50 16 CMOS 50 CMOS 50 16 CMOS 50 CMOS 50 16 CMOS 50 CMOS 50 7 HCMOS 75 CMOS 100 | CMOS 80 CMOS 24 32K 16K 1 16-Fix CMOS CMOS 15 CMOS 15 CMOS 40 32 CMOS 20 32 2k 4k 2 32-Fio 32- | #### 4. Classification of DSP hardware. Among all the DSP chips available on the market there are four main classification of the hardware: # 4.1. High performance general purpose DSP. These processors have an architecture similar to an MPU/MCU, but in addition may include on chip multiplier, RAM, ROM, DMA, peripherals I/O hardware Do-loop, pipelining and several interna and external busses. Some examples of these DSP types are: - AT&T DSP16, DSP16A, DSP32, DSP32C - Motorola DSP5600x, DSP9600x - Texas TMS320Cxx - Analog Devices 2100. ## 4.2. Algorithm specific DSP. The architecture is configured for the optimum processing of a specific algorithm. Among the DSP types designed for executing digital filter algorithms (FIR, IIR) there are: INMOS A100, LSI64240, Motorola DSP56200 Among the DSP types designed for executing FFT there are: TRW2310, HDSP66110, UT69532, Zoran. # 4.3 Application specific DSP. This type of DSP's are designed to implement specific applications such as a modem or voice encoder/decoder. ## 4.4. Building blocks Multiplier, adder, registers, RAM, ROM, I/O peripherals, etc. can be used as building block components to configure a complete DSP system with very high performance but with higher costs. (E.g. MaxVideo) # 6. DSP software support. - Assembler language may be convenient to optimize a fast algorithm, but is a limitation for large programs. The principle firms: AT & T, Motorola, Philips and Texas Instruments are already providing "C" compilers for their DSP's. - Software development support is given by the firms themselves and also by: - TEKTRONIX that offers the Signal Processor Workstation (SPW) that runs on VAX or Apollo Computer Domain. - DATACUBE offers Euclid Tools and DSP-1000 - DSP Development introduced DADiSP which is a menu driven software for displaying and analyzing digital waveforms. - STEP Engineering offers Step-4 SDT running on IBM PC AT - BURR-BROWN # DSP APPLICATION AREAS (BY UNIT VOLUME) ## 3.7. Applications overview Low-cost and high-speed, favors the use of DPS in these applications. - instrumentation - telecommunication (high speed modems) - image processing and pattern recognition - speech recognition, musical synthesizer - direction finding in radar, - target tracking (closed loop systems) - ultrasound medical imaging, image processing - automobiles: antiskid braking systems, adaptive suspension, engine control and instrumentation - vibration analysis - medical electronics - digital video - disk drives, tape drives - printers, plotters and consumer products - digital filters - digital HIFI, digital AM/FM radio - workstations - robotics - spectrum analysis # Algorithm # each FDPP - finds local maximum # Algorithm each FDPP - scans 100 channels - finds local maximum - calculates E, I/C, 0/C in 59 $\mu sec$ 000 (electron events) $$E=C+\sum_{i=1}^{8}I+\sum_{i=1}^{16}C_{i}$$ $$I/C = (1/8 \sum_{i=1}^{8} I)/C$$ $$E=C+\sum_{i=1}^{8}I+\sum_{i=1}^{16}O$$ $$I/C=(1/8\sum_{i=1}^{8}I)/C$$ $$O/C=(1/16\sum_{i=1}^{16}O)/C$$ # Algorithm 3 # each FDPP - scans 100 channels - finds type of hit - finds local maximum (1010) - calculate E, I/C, 0/C for different type of hit # in 62 $\mu sec$ (electron events) The same event N.32 from RUN 22531 on SPACAL (1990) $$E_1 = 27.25 \text{ GeV}$$ $E_2 = 28.25 \text{ GeV}$ $I/C_1 = 0.1667$ $I/C_2 = 0.2874$ $I/C_3 = 0.1544$ $I/C_4 = 0.0977$ # Algorithm 4 each FDPP - calculate R<sub>p</sub> in 58 $\mu sec$ (electron events) $$R_{p} = \frac{\sum_{i=1}^{18} r_{i} E_{i}^{0.4}}{\sum_{i=1}^{18} E_{i}^{0.4}}$$ ₹; ⊆ 36 algorith g for #### SUMMARY # **CALORIMETER 100 x 100 channels** (with square or hexagonal elements) - A 256 FDPP parallel architecture (with a granularity of 100 channels/FDPP) finds CLUSTER in the WHOLE DETECTOR (calculates energy and cluster shape factors) # within 60 µsec Detailed timing for a granularity of 100 channels/FDPP: - scans channels 42 - 48 μsec - E, I/C, O/C or Rp 15 - 17 Msec (in floating point) Detailed timing for a granularity of 49 channels/FDPP: - scans channels 21 - 24 μsec - E, I/C, O/C or Rp 15 - 17 μsec (in floating point) Lowest cost per Mflop available in the market (AT&T DSP32C = 7 US \$ per Mflop)