ULP-related papers from ISLPED96


Several papers at ISLPED96 dealt with issues related to ULP.

Among those was a new modeling paper (9.3, p.197) by Kai Chen, Yuhua
Cheng, and Chenming Hu of U.C. Berkeley showing how they extended the
bsim3 model to fit better in weak inversion.

Another was a paper (14.2,p.309) by Tadahiro Kuroda et al from Toshiba
reporting measurements that show very few substrate ties are needed in
a back-biased design.  They implemented a PLL, SRAM, and DCT, and
showed little difference between a fully contacted version and a
version contacted only at the edge of the chip. They are including
their back-bias circuits on a gate array which they are going to offer
as a product. Threshold-compensated silicon for ASICs. The win is
tighter design margins, better worst-case performance, and higher
yields, even when practiced with fairly standard thresholds.

Another was a circuits paper (10.3,p.237) by M. Eisle et al of Siemens
on the impact of intra-die parameter variations on path delays. The
message is that the central limit theorem causes the propagation delay
through a long chain of devices to converge toward the mean. Though he
didn't say so, his results suggest sigma_path = sigma_gate/(ld*Vdd/Vt).

Another paper I liked even though I don't agree with the conclusion
was 17.4,p.377, David Frank's comparison of high speed voltage-scaled
conventional and adiabatic circuits. He claims that a micropipelined
0.1um TSPC circuit can run at 2GHz at Vdd=0.950V, Vt=0.175V. When it
runs at 500MHz, it is most energy-efficient at Vdd=0.45V, Vt=0.23V,
where it dissipates 200uW.  2N-2N2P, an adiabatic circuit style, on
the other hand, is most efficient at Vdd=0.60V, Vt=0.25V, where it
dissipates only 53uW, about 4x more energy efficient. His analysis
assumes worst-case performance and power conditions. I looked at his
paper carefully, redoing his calculations using the same worst-case
performance and power assumptions, and reach several conclusions:

1) He claims the lowest energy operating point for 500MHz operation is
200uW at Vdd=0.450V, Vt=0.230V. I get the same worst-case performance
at Vdd=0.250V, Vt=0.105V, for only 72uW worst-case power.  I think he
is constrained by Ion/Ioff because TSPC is dynamic.

2) If I use tunability to compensate Vt variations I get the same
worst-case performance at Vdd=0.150V, Vt=0.062V, for only 33uW worst-case.
For tunable circuits I assume Vt_high = Vt+0.015 + 0*Vdd, and
Vt_low = Vt-0.15 + 0*Vdd.

3) in 0.1um CMOS, his micropipelined TSPC circuit should be running at
12GHz, not 2GHz, at Vdd=0.950V, Vt=0.175V. At Vdd=0.150V, Vt=0.062V,
his TSPC circuit should run at 2.4GHz, not 500MHz. At 500MHz, the
lowest energy operating point is Vdd=0.160V, Vt=0.136V, 28uW worst-case.

The reason the supply voltage needs to be higher at lower operating
frequency is that the static power dissipation is a larger fraction of
the total power when the switching frequency is reduced. Ion/Ioff =
ld/a, but the effective logic depth increases when the system runs
slower.

My only point here is that TSPC is a poor choice to compare against
adiabatic because it is dynamic, which forces both Vt and Vdd up to
maintain adequate Ion/Ioff. If he were to have included a static
circuit style, I think he would have discovered that conventional
static logic is more energy efficient than adiabatic even when
the system is running at 22/0.5 = 44x slower than its peak.
...

Another paper, 10.2,p.233, by X.Tang, V.De, and Jim Meindl from
Georgia Tech, predicts 400mV Vt variations in 0.07um technology using
Monte Carlo simulation. If they are right, we will hit the wall very
soon. Subsequent conversations with researchers from IBM and TI
suggest there may be a bug in the result.