# Task 10: IEEE 754 FP16 Adder (Hard)

## Objective

Implement a combinational IEEE 754 half-precision (FP16) floating-point adder.
FP16 arithmetic is the compute primitive of every modern GPU tensor core and
AI accelerator — getting it right is non-trivial and formally verifiable.

## Interface

```verilog
module fp16_adder (
    input  wire [15:0] a,
    input  wire [15:0] b,
    output wire [15:0] result
);
```

## FP16 Format (IEEE 754-2008)

```
Bit 15  : sign       (s)
Bits 14:10 : exponent (e, biased with bias=15)
Bits  9:0  : mantissa (m, implicit leading 1 for normal numbers)

Value = (-1)^s × 2^(e−15) × 1.m    for normal numbers (e = 1..30)
Value = 0                            for e = 0, m = 0  (zero)
```

## Scope (What You Must Handle)

| Case                   | Requirement                                  |
|------------------------|----------------------------------------------|
| Normal + Normal        | Correct result, normalized                   |
| x + 0  or  0 + x       | Return x                                     |
| x + (−x) (cancellation)| Return +0.0 (`16'h0000`)                    |
| Overflow to infinity   | Return `16'h7C00` (+Inf) or `16'hFC00` (−Inf)|
| NaN input (e=31,m≠0)   | Propagate: return `16'h7E00`                 |
| Infinity input (e=31,m=0)| Propagate or handle ∞±∞ as NaN             |

**Rounding**: truncate (round toward zero). Round-to-nearest is not required but earns full area score.

## Algorithm

```
1.  Extract fields: sign, exp (5-bit), mantissa (10-bit)
2.  Prepend implicit 1: full_m = {1, mantissa}  (11 bits; 0 for zero/subnormal)
3.  If |a| < |b|: swap so that |a| >= |b|
4.  Compute alignment shift d = exp_a − exp_b  (≥ 0 after swap)
5.  Shift full_m_b right by d (with 3 guard bits for rounding)
6.  If signs equal:  sum_m = full_m_a + shifted_m_b
    If signs differ: sum_m = full_m_a − shifted_m_b
7.  Normalize: count leading zeros in sum_m, left-shift, adjust exponent
8.  Handle exponent overflow → ±Inf
9.  Pack result: {sign_result, exp_result[4:0], sum_m[9:0]}
```

## Scoring

- Correct compilation: 5%
- Passing simulation tests (normal numbers + zero + special cases): 60%
- Formal verification (SymbiYosys, if available): 15%
- Area efficiency vs reference: 20%

## Useful Constants

```verilog
localparam BIAS    = 15;
localparam INF     = 16'h7C00;
localparam NEG_INF = 16'hFC00;
localparam QNAN    = 16'h7E00;
```

## Hint: Alignment and Normalization

```verilog
// Extended mantissa with guard bits
wire [13:0] m_b_shifted = {1'b1, man_b, 3'b0} >> d;   // 14 bits: 1 hidden + 10 + 3 guard

// After subtraction, find first 1 in result (clz):
// Use a priority encoder or a generate loop to count leading zeros
```