[Arya Raychaudhuri's one-man enterprise that believes in Always Move Ahead with New Ideas]
"home" , "code snippets", "via check add paper" "backpage" buttons on the left side, to select a specific page
[Norton Security Seal is now discontinued by Yahoo for Site Solution based websites, but moved from http to https]
the pages are best viewed through internet explorer
WordPress
BLOG PAGE: http://lvs-debug-solutions.com/blog/
[kept inactive to avoid spam]
This is the age of search engines - quickly search for entities and their inter-relationships. And, LVS Debug is all about analyzing netlists and gds/oasis layout databases, and finding their correspondences/matches/mismatches in the post-PD domain. So, much of the codes presented in the 'code snippets' page focus on searching and parsing these huge netlist and layout files, to extract the relevant information and their connection. The other important thing is the focus on revenue/jobs generating creative ideas - because if you cannot come up with new product ideas every now and then, what will you sell tomorrow, in the future? I have now transformed LVS DEBUG SOLUTIONS LLC into a multi-engineering concepts platform (please review the code snippets* pages under https://www.lvs-debug-solutions.com)
google email: arya.raychaudhuri@gmail.com
Please direct all official communications to
arya@lvs-debug-solutions.com, or call 408-480-1936
I typically don't answer calls from unknown callers, to avoid spam. If you are a serious caller, please leave voice or text message, or email...For text message from international locations, please use cellphone number as 1-408-480-1936
LVS DEBUG SOLUTIONS LLC
980 Kiely Blvd, Unit 308
Santa Clara, CA 95051
United States
ph: 1-408-480-1936
arya
Continues into
http://lvs-debug-solutions.com/code_snippets4
Item#409> Pulse mode voice communication through the telephone exchange when the number of lines will be very high to cause significant voice signal distortion
[March 11, 2018, Arya Raychaudhuri, Santa Clara, California]
One advantage of using pulse mode voice data is that the pulses can be easily buffered before entering the Xgates, and there is no other distortion except going through the APC, PAC at the receiver end. The disadvantage is that additional Analog to Pulse (APC) and Pulse to Analog (PAC) circuits are needed at the receiver end.
The telephone exchange circuit can be generalized to on-chip parallel processing for various requirements.
***
Item#408> Barebones Layout of the transmission gates connection scheme for a 4-Line Exchange
[March 08, 2018, Arya Raychaudhuri, Santa Clara, California]
Please, note that there are 32 Xgates in total, but, 4 of them are turned to ground (vss) and are not used. So, effectively 28 are used, as indicated in Item#407.
While the transmission gates connection scheme is shown above, a single chip implementation of the telephone exchange will also have the VMONs, one for each line, and the other exchange implementer logic circuits on the same chip. Please, note that when the number of lines are higher the gap between the left and the right columns of transmission gates (above) will be larger, in order to accommodate all the vertical spkr lines, also the columns will be longer. In that situation, it will be useful to use the gap region for the additional circuitry. That will require the spkr lines and the horizontal interconnects to be pushed up to higher metals (e.g., m5,m6), so that the lower metals can be used for the additional circuits to be drawn in.
***
Item#407> Revised transmission gates numbers (reduced from the estimate given in Item#406)
The spkr to spkr transmission gates (see Item#406) used to convey the ringtone from the called line to the caller line is redundant. This is so because the ringer1, ringer2 etc. already contain caller and called information. So, we can remove the following lines from the spice code given in Item#403.
Xxgate12_21_spkr spkr12_21 spkr1 spkr2 vdd vss XGATEsmall
Xxgate13_31_spkr spkr13_31 spkr1 spkr3 vdd vss XGATEsmall
Xxgate23_32_spkr spkr23_32 spkr2 spkr3 vdd vss XGATEsmall
The simulation still runs fine:
[March 07, 2018, Arya Raychaudhuri, Santa Clara, California]
This reduces the numbers of connecting transmission gates to 4*N + N*(N-1) = N*(N+3) .
4 lines - 28 Xgates, 16 lines - 304 Xgaes, 64 lines - 4288 Xgates
***
Item#406> Transmission gates numbers estimation from Item#403
[March 07, 2018, Arya Raychaudhuri, Santa Clara, California]
Note that all transmission gates have their gates and input/output terminals turned to well-defined voltages at all times - this is a firm requirement for using transmission gates for switching. The 'ungnd*' turn all spkrs to vss when the phone is not up, or it is dialing. The 4 vertical Xgates needed for the grounding and tones to each spkr give rise to the 4*N term in the formula. Each MIC must have an Xgate towards all other spkrs except its own - so, N-1 per MIC, leading to N*(N-1) Xgates. As far as the spkr to spkr Xgates are concerned, spkr1 needs N-1 such, spkr2 needs N-2, since the Xgate to spkr1 was already there, and so on, till spkrN-1 needs just 1 to spkrN. So, this gives N*(N-1)/2. Thus the three terms total up to 4*N + N*(N-1)*1.5
***
Item#405> Transmission gates introduced signal distortion versus the threshold voltages (Vth0) of the NFET/PFET
In the simulation code of the 3-line exchange shown in Item#403, the various tones and the MIC channels were held at placeholder sinusoids and pulses whose frequencies far exceed the human audible frequency range (20Hz-20KHz). The idea was to be able to see a significant number of cycles within a reasonable simulation time frame. But, in actual practice, we would use audible frequencies.
But, clearly, we can see some signal distortion in the dialtone and ringtone passing through the transmission gates (see Item#s 401, 404). This happened because, I used the high Vth0 LP model for the FETs. If the Vth0 is high, the Vd,sat for each FET, given by Vgg - Vth0 is lowered, and, so the linear region of operation shrinks. This gives rise to distortion - ideally, a linear resistor would give no distortion. The following simulation experiment that studies the SNR versus the Vth0 chosen for the FETs is informative. Please, note that I have used an audible frequency sinusoid (1KHz) for this experiment.
[March 06, 05 , 2018, Arya Raychaudhuri, Santa Clara, California]
As indicated earlier, for low Vth0 (+/- 0.3V) there is very little distortion as seen in the top panel. The SNR was measured as shown in the bottom panel showing the FFT (Fast Fourier Transform) of the spkr1 (OUTPUT). An SNR of >30dB is considered very good for audio.
But, if I choose Vth0 of +/- 0.7V, significant distortion is observed, as seen on the top panel below. The SNR drops to 8.41dB. But, using a vdd of 1.8V removes the distortion (bottom panel below) - this happens because the linear region of operation expands. The SNR improves considerably.
[March 05, 2018, Arya Raychaudhuri, Santa Clara, California]
The SNR was measured also at a few intermediate Vth0 points, and a graph plotted, as shown below
[March 05, 2018, Arya Raychaudhuri, Santa Clara, California]
From this experiment it is clear that lowering the Vth0 to +/- 0.4V with VDD=1V gives a reasonably good SNR - low distortion transmission gate. Also, using 1.8V VDD for the just the transmission gates using the generic LP model FETs would give an equally acceptable SNR.
***
Item#404> More example simulations on the 3-line telephone exchange of Item#403
[March 04, 2018, Arya Raychaudhuri, Santa Clara, California]
***
Item#403> Entire spice code for the 3-line telephone exchange simulated in Item#401 is given in spice_codes page
It should be apparent from the code that each line will use 4 wires - the control pulses wire (crpin), the speaker wire (spkr), the microphone wire (mic), and a ground wire connecting the grounds at the telephone end and at the exchange end. Both the telephone and the exchange will have their own power supply (vdd).
The initial spkr1 to spkr2 connection allows the echo of the ringtone to be heard by the calling telephone. As soon as the called telephone lifts the receiver, spkr1 gets connected to mic2, and spkr2 to mic1, etc. Another simulation example shown below - where phone3 is calling phone1 and getting connection, but, phone2 trying to call phone3 in the middle gets busytone.
[March 03, 2018, Arya Raychaudhuri, Santa Clara, California]
Interestingly, the scheme can be extended to a situation where the telephone lines are distributed over several hubs, each hub having its own VMON. Let's say, the first push on the telephone dials the hub's number, if not busy, the hub's VMON's crpin gets connected to telephone line's crpin, ready to receive the second push number to select the appropriate telephone line in that hub. So, in this case, a receiver can get a busy tone in two ways, first, if the called hub is busy, and second if the called telephone in the called hub is busy.
***
Item#402> Integrational edits to PGEN (Item#s 397,398) and VMON (Item#s 399, 400) sub-blocks that form the LINEx subckt used in the simulation of Item#401
PGEN spice edits shown in green
[March 02, 2018, Arya Raychaudhuri, Santa Clara, California]
The edit disables the telephone push buttons after the first push (dialing), also renders the pushbuttons ineffective when the receiver is down. The following sim shows the feature. Only the push3 after receiver is picked up (pushreset1 --> crbar1) has impact on the pgen's output (crpin). Also, the other two push3 pulses have no impact on the pushbutton latches (allRSout), nor on the toggle (t0). This is controlled by 'pgennotbusy'.
[March 02, 2018, Arya Raychaudhuri, Santa Clara, California]
Please, note that a particular telephone is not allowed to dial its own number - this can be implemented by insulating the contacts of the pushbutton corresponding to its own number. The effect is simulated in Item#401 by passing a vss for that push(n).
VMON spice edits shown in green
[March 01, 2018, Arya Raychaudhuri, Santa Clara, California]
ANYsel is basically an ORing of all the sel(n)s. This is done this way to reduce the number of inputs to the ORing when the VMON is used for many lines - because the number of counter states will be a lot fewer.
The LINEx block that consists of a PGEN and a VMON is shown below.
[March 01, 2018, Arya Raychaudhuri, Santa Clara, California]
Please note that for the sake of simulation, the PGEN and the VMON are put together as a logical entity (LINEx), but, physically, the PGEN is at the telephone receiver end, and the VMON is in the exchenge end. Of course, they belong to the same Line (telephone number).
***
Item#401> Results of a 3-line Telephone Exchange simulation - uses the PGEN and VMON of Item# s 397, 399, with a few integrational edits
[Feb 28, 2018, Arya Raychaudhuri, Santa Clara, California]
To follow the chronology of events, please go from low simulation time to high. Please, note that the Ringtone heard by LINE2 must also be amplified through a speaker that can be heard from outside the phone. This can possibly be achieved by sensing the receiver down position for a telephone. But, that's beyond the scope of the Exchange control circuitry that's being discussed - more to do with the voice transmit/receive circuit.
***
Item#400> Additional spice code (append to the code of Item#398) needed to simulate the Item#399
[Feb 24, 2018, Arya Raychaudhuri, Santa Clara, California]
Note that the entire subckt VMON should follow the subckt for PGEN, and the remaining portion (after .ENDS) to be added after the XPGEN1 .... line.
Edits: The LatchIn1 and LatchIn2 inputs to the FR latch has been delayed by INV0s to avoid some racing related glitch. Also, the 'resetpulse' is now generated by an INV to reduce loading on LatchIn2.
***
Item#399> Telephone Exchange end VMON (a 3-channel variant of Item#288) for the intercom telephone exchange scheme (Item#s 209, 215) simulated in tandem with the PGEN of Item#397
[Feb 23, 23, 2018, Arya Raychaudhuri, Santa Clara, California]
Two cases shown, the top panel shows the case when pushbutton 3 is pressed from the telephone, and the bottom shows the case when the receiver is just lifted and hung up without pressing any pushbutton.
***
Item#398> Spice code for the simulation of Item#397
[Feb 23, 2018, Arya Raychaudhuri, Santa Clara, California]
Note that the clock pulses needed to run the PGEN are getting generated internally, using the 'clkgen' subcircuit. This indicates that each telephone can have its own clkgen - the VMON on the same line is located in the exchange end, and is driven by the dial pulses as its clocking pulses for its counter.
The PGEN's code is encapsulated in a subcircuit, with a call to the subcircuit through XPGEN1 .... from outside the subcircuit. Also, 'CR1bar' replaces 'pgennotbusy' in calls to the 'pushbutton' subcircuit from inside the PGEN subckt. This keeps the push buttons disabled unless the receiver is up. Next, we will add a VMON subcircuit - and it will be used through a call from outside, just as the the PGEN is used with XPGEN1 ..... The calls to PGEN and VMON and some additional circuitry will form the contents of a LINE subciruit. LINEs will be called from outside to implement the telephone exchange circuit. This follows from the conceptual diagram of Item#209.
*Simulation done with LTspice IV
***
Item#397> Telephone Receiver end PGEN (a variant of Item#286) for the intercom telephone exchange scheme (Item#s 209, 215) simulated - a simpler 3-line dialing shown
[Feb 21, 21, 2018, Arya Raychaudhuri, Santa Clara, California]
Why a 3-line dialer pgen shown? Because,that's the minimum number of lines to demonstrate dialtone, ringtone, connection, and busytone, as indicated in Item#215.
***
Item#396> Preliminary speed versus number of pushes calculations for the stand-up scooter of Item#395
Please note that the rotational kinetic energy (KE) of all the rotating parts combined would be negligible with respect to the translational KE. This is exemplified by a sample rotational KE calculation of the pawl-sprocket section (neglecting the small shaft). Also, any wind drag related, or any other losses ignored. The weight redistributed push force of 30lbs is just an assumption. But, it is a very reasonable and conservative guess - one can easily check by pressing on the weight measure scale at home, with one hand or with one foot. The force can be increased by inclining the handle column towards the rider. Note that if the push force increases by 2X, 3X, etc., the number of pushes indicated below will be (1/2)X, (1/3)X, etc.
[Feb 18, 2018, Arya Raychaudhuri, Santa Clara, California]
Note, for example that if the speed falls by 2 mph from 15 mph to 13 mph, a few pushes (~12) bring it back to 15 mph.
The pushes can be slow or fast; a push rate of ~2s per push takes 100s (<2 min) to get to 15mph from 0. If the rider gives a little run (typically ~4mph) of the scooter before stepping on to it, that can reduce the number of pushes.
While similar approaches can be applied to other piston-pawl driven on-road vehicles, the boat (Item#391) will have much harder water drag (proportional to square of the boat speed) to overcome. The water drag can be modeled as an energy term that must be subtracted from the push energy to get the useful push energy that builds the boat's KE. If the push energy equals the drag energy, the boat maintains a constant speed. If the push energy is below the drag energy the boat loses speed. The drag energy will depend on the specifics of the boat's hull design, and the nature of the waters. So, it is not as straightforward to predict the number of pushes required to achieve a certain boat speed.
***
Item#395> A piston pawl drive (Item#372, 375) integrated into the steering handle of a manually driven stand-up scooter
[Feb 14, 2018, Arya Raychaudhuri, Santa Clara, California]
Foot brake mechanism not shown - can be the same as that of a bicycle. All ball-bearings should have their outer cylinders hard connected to the deck, and inner cylinders tied to the respective rotating shafts. The piston is meant to be pushed when driving straight, and not while turning the handle.
***
Item#394> Piston-Pawl Boat (Iem#391) with Assist windmill (Item#s 392,393)
[Feb 16, 2018, Arya Raychaudhuri, Santa Clara, California]
***
Item#393> A minimal windmill for a Piston-Pawl boat with wind Assist (Item#392)
[Feb 13, 2018, Arya Raychaudhuri, Santa Clara, California]
Note that the inner cylinder of the rotor shaft ball bearing is hard connected to the rotor shaft, the outer cylinder is hard connected to the Nacelle. For the ball bearing at the top of the tower, the outer cylinder is tied to the Nacelle, the inner to the Tower - this allows the Nacelle to adjust its position in response to the wind direction as sensed by the rear Vane. The Vane_shaft is a moment generating rod that is hard connected to the Nacelle's back. When the rotors are facing the wind, the Vane is in a stable equilibrium - No moment due to the Vane. The ball bearing inside the tower has its inner cylinder tied to the drive shaft, and the outer tied to the inner surface of the Tower. The Bevel gear at the bottom of the drive shaft is meant to mesh with the Assist bevel of Item#392
The bevel gears based minimal windmill is inspired by
https://www.youtube.com/watch?v=yZa__in_J3Y
Windmill boats have been there before [e.g., look for 'windmill boats' on Google Images], that gave the idea for using one in an Assist mode.
***
Item#392> Piston & Double Pawl Drive (Item#389) with Assist from another energy source
One good thing about the Pawl is that it can be easily used in an Assist configuration, as shown below.
[Feb 13, 2018, Arya Raychaudhuri, Santa Clara, California]
Note how the length adjustable shaft can help disengage the Assist Pawl and the Assist Bevel, if no Assist is desired. For example, on a good windy day in a boat equipped with an Assist windmill, you may want to engage the Assists, and you need to scarcely do the push-pull motions.
***
Item#391> Piston-Pawl Boat schematic with the drive (Item#389) and the minimal rudder (Item#390) integrated
[Feb 12, 2018, Arya Raychaudhuri, Santa Clara, California]
The chair in the cockpit is meant for the driver, and the eight chairs on the deck are for the passengers. The driver's chair should be more ergonomically designed to help easier push-pull actions. For bigger boats, a second Piston-Pawl drive can be integrated in to share the same two flywheel gears - of course, that will need a co-driver.
***
Item#390> Simple Rudder control to go with a Piston-Pawl drive based boat (see Item#389)
[Feb 11, 2018, Arya Raychaudhuri, Santa Clara, California]
The Metallic arc must be hard-connected to the boat's body. Gravitational pull on the rudder sticks the shaft between the teeth of the metallic arc. This type of simple rudder may have been used earlier - but, didn't find in a brief Google images search.
***
Item#389> Piston & Double Pawl Drive for a manually driven boat - uses the Spur-Bevel gear of Item#388
[Feb 10, 2018, Arya Raychaudhuri, Santa Clara, California]
All the rotating shafts rotate inside ball-bearings whose outer cylinder should be hard-connected to the boat's body, so also should be the piston's enclosure. The propeller shaft should penetrate the boat's hull on the stern end through an adequately greased stuffing box to prevent any water leakage.
***
Item#388> Spur Gear and Bevel Gear combined to create a low moment of inertia compacted gear - towards its use in the design of a boat based on the piston pawl drive (e.g., Item#375)
[Feb 09, 2018, Arya Raychaudhuri, Santa Clara, California]
***
Item#387> Single stage, its smbolic view, and connections for f stages, corresponding to the simulation of Item#386
Considering the simple nature of the new sequential timer with respect to 555 based timers, it is good to identify the single stage, its symbolic view, and the connections for an f (some arbitrary number) events sequencer
[Feb 08, 08, 2018, Arya Raychaudhuri, Santa Clara, California]
Please note that when f=2, it's just the duty cycle circuit of Item#382. The Duty Cycle circuit has no 50% or more On time restriction, unlike with the 555 based duty cyclers.
A multiple events sequencer has many applications. In a production shop floor, it can be used to sequentially switch machines on and off - saving on power. It can be used to sequentially display advertisements on digital billboards, sequence operations on an automobile, or on a spacecraft - like putting out satellites into orbits, etc.
***
Item#386> Simulation of the Events Sequencer circuit of Item#383
Please note the introduction of a 10nf capacitor inside the positive edge extraction INV-AND combination - this helps increase the delay at the input to the AND, in order to make the positive edge sizeable. The main caps' charging discharging profiles are also shown in the bottom pane, along with the extracted positive edges Xej, Jej, Yej, that are needed to change the states of the NOR latches.
[Feb 07, 2018, Arya Raychaudhuri, Santa Clara, California]
Other subckts are the same as used for Item#385.
***
Item#385> Simulation of the Duty Cycle circuit of Item#382
Note that in this implementation, the positive edge generation is accomplished with XOR-RC type delay and diff circuit (see Item#168) - replacing the INV-AND combination as shown in Item#382. It boils down to playing with the R_on/R_off slider knobs to get different duty cycles.
[Feb 07, 2018, Arya Raychaudhuri, Santa Clara, California]
***
Item#384> An Artist's Impression of a 4-wheeler based on the Piston-Pawl Drive of Item#379
[Feb 05, 2018, Arya Raychaudhuri, Santa Clara, California]
All the seats and the piston shafts should be easily detachable, so that if less than 4 persons are riding the vehicle, unnecessary items can be removed to reduce load.
***
Item#383> A multiple Events Sequencer (Sequential Timer) circuit based on the same logic as the Duty Cycle Circuit of Item#382
[Feb 04, 2018, Arya Raychaudhuri, Santa Clara, California]
This circuit can be extended to more than 3 events by adding more event time controller stages.
***
Item#382> Duty cycle circuit of Item#381 with ON/OFF switch added
[Feb 04, 2018, Arya Raychaudhuri, Santa Clara, California]
***
Item#381> Duty cycle fed power can act as gears for the electroMagnet based engine of Item#374 - following the example of the manual gear_up as in Item#380
Here is a simple duty cycle controller that can be used to independently control the ON and OFF periods, using the variable resistances R_on and R_off
[Feb 04, 2018, Arya Raychaudhuri, Santa Clara, California]
On/Off times can be controlled between 0 and 15s (approx.).
***
Item#380> False Push and Duty Cycle with the piston-pawl drive of Item#377 - manual gear-up
[Jan 31, 2018, Arya Raychaudhuri, Santa Clara, California]
When the Pawl is rotating at a higher angular speed, giving a slower push doesn't help - it's a false push, it just wastes your energy. So, you develop a sense of duty cycle by adjusting the push energy (height of the push pulses and their numbers) and the rest period - push time versus rest time. When the vehicle is slow and accelerating, you will need more pushes and less rest, but, when you achieve a certain speed, you can switch to less pushes and more rest. This is like switching from lower to higher gears in a car - but, manually.
***
Item#379> Combined Forward and Reverse Gear with the Piston-Pawl Drive, compare with Item#377
[Jan 31, 2018, Arya Raychaudhuri, Santa Clara, California]
Please, note that for the reverse motion, the pawl sits *before* the piston, as seen from the vehicle's front side. Both the forward and the reverse pawls are on the same shaft as the bevel gear. Enclosures of both pistons, as well as the outer cylinder of the ball bearing must be hard connected to the body of the vehicle.
***
Item#378> Clockwise and Counter-clockwise rotation of the Pawl
The good thing about the Piston-Pawl arrangement is that it is easy to produce either clockwise, or counter-clockwise rotation of the pawl - just by reversing the sense of engagement
[Jan 30, 2018, Arya Raychaudhuri, Santa Clara, California]
This feature can be utilized to create a numerical adder-subtractor (integrator), or even a reverse gear when the Piston-Pawl drive is used in a three/four wheeler or in a boat. The bicycle has no reverse gear - looking from the front side of the bike (Item#377), the pawl is placed *behind* the piston.
***
Item#377> Schematic power transmission diagram for the Piston-Pawl bicycle (Item#375)
Please, note the simplification with respect to the currently used traditional safety bikes. The chain is replaced with bevel gears and a rotating connecting rod. The rachet and pawl arrangement of the rear hub is no longer necessary.
[Jan 30, 2018, Arya Raychaudhuri, Santa Clara, California]
The removal of the lower teeth on the piston rod serves two purposes. One, that it allows frictionless free rotation of the Drive Pawl in the forward drive mode (clockwise) at unpressed spring position. And, two, that it allows uninhibited counterclock rotation of the Drive pawl when the rear wheel is rolled backwards. Note that in a traditional bike, the rolling of the rear wheel backwards also turns the pedal back.
The ball-bearings are all hard connected to the bike's body. No difference in steering (handle-frontwheel) or braking is expected.
***
Item#376> On the novelty of the Piston-Pawl drive (Item#375)
The Piston-Pawl drive may be the next biggest innovation in bicycles after the 1880s. Please look up:
https://en.wikipedia.org/wiki/History_of_the_bicycle
This represents the manualization of the engine concept (with a piston), that uses the easiest of human motions, the push. Also, the drive can be incorported in three and four-wheelers with multiple pushers. Even in boats to turn the propeller - in fact, one can stand on the boat and push the piston with one's torso, if one so wishes.
For longer, faster play of the piston (harder push) the bicycle can move like a skateboard - one push and then relax, and then apply another push, and so on. With shorter play of the piston (softer pushes), achieve a gentler, more uniform speed.
The piston-pawl drive is basically an efficient transmission for human energy. The engine is the rider who pushes the piston. The food that the rider eats is the fuel.
***
Item#375> Bicycle with a manually operated spring piston, and pawl based power transfer - that can be pushed with one foot, two feet, or one hand at a time, or simultaneously
An interesting new bicycle design that gives more rest to your legs.
[Jan 30, 2018, Arya Raychaudhuri, Santa Clara, California]
Please note that the piston enclosure, left/right foot rests are meant to be hard connected to the bike's body. The Drive shaft will be held in a ball-bearing whose outer cylinder would be hard connected to the bike's body
***
Item#374> An electroMagnetic engine cum battery charger - with electroMagnets on both ends of the cylinder - an upgrade to Item#373
[Sept 19, 2017, Arya Raychaudhuri, Santa Clara, California]
Both the right to left and the left to right movements of the piston are activated by electroMagnets. No spring is used for the left to right movement, unlike in Item#373. Both pawls transfer power to the drive shaft, as well as to the Generator rotor.
Note the full-wave rectification analogy of the twin pawl power transfer in this case, as opposed to the half-wave rectification analogy in the case of Item#372.
***
Item#373> An electroMagnetic engine cum battery charger - an augmentation to the engine of Item#372
Since the generator rotor is very low load, one can use the return (left to right) cycle of the piston to charge up the battery by addtition of another opposite toothed pawl. This will help increase the battery life.
[Sept 16, 2017, Arya Raychaudhuri, Santa Clara, California]
Please, note that the above configuration can also be run in a twin cylinder mode - driving two parallel pawls on each of the two shafts (drive and generator) using the two alternating pistons. When the piston in one cylinder starts moving right, the piston in the other starts moving left. This will draw more current from the battery, but will generate more drive power.
***
Item#372> An electroMagnetic engine - using a variant of the spring shaft of Item#203, and a drive pawl
[Sept 15, 2017, Arya Raychaudhuri, Santa Clara, California]
Note that when the piston aperture Ap and the cylinder apertures A0 align (default position of the piston), the photo voltage v0 goes to 1 (logical) turning Qem to 1 - the transmission gate allows electroMagnet solenoid to be activated. The plunger of the piston is pulled into the solenoid - letting the gear teeth on the bottom side of the piston rotate the drive pawl in anti-clockwise direction.
When the plunger is fully pulled, Ap and A1 align - V1 goes to 1, unsetting the latch. This deactivates the solenoid, and the piston quickly (no-load) returns to its default position, slipping over the teeth of the rotatig pawl. The whole cycle repeats again and again to sustain steady rotation of the drive pawl.
Although the latch is shown to be set by v0, it could alternatively be set by the rising edges of a control pulse train. So, the torque and speed of the drive pawl can be controlled in two ways - by controlling the Rem (i.e., the current through the solenoid, see for example, A simple experiment with solenoid in http://info.ee.surrey.ac.uk/Workshop/advice/coils/force.html ), and/or by changing the frequency of the control pulse train. This gives more versatility for controlling torque and speed over electric motors used as engines. Also, the pawl and toothed piston arrangement eliminates the need for a crank shaft.
***
Item#371> More explanation on the TML approach of Item#364
As I was explaining to an ex-colleague:
Reduction of clock-driven elements
"Please, look up my Item#366 (http://www.lvs-debug-solutions.com/code_snippets3) where I have reduced a shift register stage from 40 fets to 20 fets. Item#363 where I have reduced a JK FlipFlop from 22 fets to 13 fets. As you know, JK means TFF and DFF also with similar fet counts (e.g. Item#s 354,355 with the latch not yet bundled).All simulating well with 45nm models. See that the shift register is shifting at 1GHz clock rate in Item#368. These are new circuit IPs - small and fast."
Reduction of datapath logic
"Regarding fets bundling of datapath logic into a single complex gate, please look up Item#351. Now, fets bundling is not new - but, I have explained the method more precisely so that it is easier to automate on the fly. So, whenever you see a slow datapath (setup time issue), the software should bundle it on the fly, extract its timing/layout model, and introduce the bundled gate into the PD framework. This is where a <CAD software company> comes in. Of course, to start with, only slow datapaths, but one can expand it to *all* datapaths - a new re-synthesis technique."
So, the TML will depend on some fixed reduced IPs (clock-driven), and some on-the-fly fets bundled IPs (datapath logic) generated by the CAD software. Obviously, the whole thing boils down to a software effort, as opposed to the LML (Longitudinal Moore's Law, the traditional approach) requiring expensive process efforts. The TML can be applied to any stable process node, while the LML is applied to only the previous process node. The TML as explained above will also improve physical design productivity by reducing repetitive setup time fixing runs.
***
Item#370> Symbolic representation of the shift stage simulated in Item#368, 369
[Aug 18, 2017, Arya Raychaudhuri, Santa Clara, California]
The fets' new symbol is more descriptive - the electron and hole markers more clearly differentiate between nfet and pfet. The edges of the box that are transverse to the gate node are source/drain like in an actual physical structure. Source/drain edges of the box cannot have a connection in the middle, to avoid confusion with the gate edge. A vss bulk bias for nfets is typical, and so is a vdd bulk bias for pfets - so, these are not separately indicated. A non-typical bulk bias would be indicated by a net projecting into the box - unlike the gate bias net that stops at the box's edge. LVT, RVT, HVT fets are easily indicated. The new symbol goes with the bigger role of single fets in the reduced circuitry.
***
Item#369> Netlist used for the 1GHz clock rate simulation in Item#368
[Aug 17, 2017, Arya Raychaudhuri, Santa Clara, California]
Note the randomization of the data stream by ORing two pulse streams and a PWL (vPy, vPy2, vPx); also, the use of the hvt fets on the 1X load inverter as indicated on Item#368. The MNQ, MPlxPin fets are made a little more sluggish than in Item#367 because the qq(n) loading (1X) is now shifted to qq(n)bar - helps stop racing. MNRL and MNLR are made a bit stronger to allow quick charging of the qq(n)bar node which now has the extra 1X load on it. These changes allow keeping the MPLRclk and MPRLclk fets at minimum size (L,W=45nm) - making these sluggish can widen the ctl pulses. The MNdum5 fet is added to balance the loading on the ctlout5lr at the end of the shift chain.
The other important change is to hold VP5, VP4 ...VP1 at vss, rather than using .IC - which only sets the initial value. Later these nodes (P5, P4 .... P1) can go up and down with qq through capacitive coupling, if not held down to 0. Also, holding down to 0 is more realistic.
The transient analysis uses a max time step of 500ps to enable better accuracy. Sometimes mis-shifts can happen due to how the numerical data are treated inside the simulator during the transient analysis. So, if a significant portion of the data shift happens accurately, then probabaly the circuit is good. Also, a smaller time step does not mean you will get more accurate simulation - one needs to look for an optimum step size. Or, specify no max step size at all, and let the simulator find it as it goes along.
***
Item#368> The bundled shift stage of Item#366 running accurately at 1GHz clock rate
For this simulation, the 1X load was added to the qqbar terminal and the 'qqbuf' (as opposed to the qqinv as in Item#367) pin is taken out and plotted. Somewhat different aspect ratios were used for this sim. Also, since the qqbar node is more 'noisy' (by design), the 1X load inverter uses HVT fets (Vth0=+/-0.7V). The bottom panel is a zoom-in on a section of the top panel. The qq1bar (in blue in third pane from top) is shown as an example of the 'noisy' nature of the qq(n)bar node. This happens because when qq(n) is 1, the ctlout(n) dips lose the competion with the ctlout(n-1) dips coming from the next/previous stage. So, the qq(n)bar tends to rise, but cannot quite make it. But, if no ctlout(n-1) comes in from the adjacent stage (qq of that stage being 0), then the ctlout(n) dip succeeds in changing the qq(n) state.
[Aug 16, 2017, Arya Raychaudhuri, Santa Clara, California]
***
Item#367> A netlist for simulating the fully fets bundled clk-to-q bidirectional shift stage of Item#366
Note that in order to overcome some data stream dependencies, the LVT fets uses Vth0s of +/- 0.2V (as opposed to +/-0.3V used in Item#366). Also, the input pfets to the bundled latch are turned into LVT fets. Only the latch memory fets, reset input Pfet, and the fanout INV uses regular PTM HP fets.
.SUBCKT NAND2 in1 in2 out vdd vss
MN1 out in1 midx vss NMOS L=0.18um W=0.045um
MN2 midx in2 vss vss NMOS L=0.18um W=0.045um
MP1 out in1 vdd vdd PMOS L=0.18um W=0.09um
MP2 out in2 vdd vdd PMOS L=0.18um W=0.09um
.ENDS
.SUBCKT AND2 in1 in2 out1 vdd vss
MN1 out in1 midx vss NMOS L=0.045um W=0.045um
MN2 midx in2 vss vss NMOS L=0.045um W=0.045um
MP1 out in1 vdd vdd PMOS L=0.045um W=0.09um
MP2 out in2 vdd vdd PMOS L=0.045um W=0.09um
XINV out out1 vdd vss INV
.ENDS
.SUBCKT INV in1 out vdd vss
MN4 out in1 vss vss NMOS L=0.045um W=0.09um
MP4 out in1 vdd vdd PMOS L=0.045um W=0.18um
.ENDS
.SUBCKT shiftstage clkej Pin Pinej sysreset ctlinlr ctlinrl LR RL ctloutlr ctloutrl qq qqbar vdd vss
MPLRclk ctloutlr clkej vdd vdd PMOSlvt L=.05um W=0.09um
MNLR ctloutlr LR qqbars vss NMOSlvt L=.045um W=0.05um
MPRLclk ctloutrl clkej vdd vdd PMOSlvt L=.05um W=0.09um
MNRL ctloutrl RL qqbars vss NMOSlvt L=.045um W=0.05um
MNQ qqbars qq clkbar vss NMOSlvt L=0.18um W=0.09um
MNclk clkbar clkej vss vss NMOSlvt L=0.045um W=0.09um
MPPinclk Pinej clkej vdd vdd PMOSlvt L=.045um W=0.045um
MNPin Pinej Pin clkbar vss NMOSlvt L=.045um W=0.09um
MPlxqq qq qqbar vdd vdd PMOS L=.045um W=0.045um
MPlxqqbar qqbar qq vdd vdd PMOS L=.045um W=0.045um
MNlxqq qq qqbar vss vss NMOS L=.045um W=0.045um
MNlxqqbar qqbar qq vss vss NMOS L=.045um W=0.045um
MPlxctlinlr qq ctlinlr vdd vdd PMOSlvt L=.045um W=0.05um
MPlxctlinrl qq ctlinrl vdd vdd PMOSlvt L=.045um W=0.05um
MPlxPin qq Pinej vdd vdd PMOSlvt L=.045um W=0.045um
MPlxctloutlr qqbar ctloutlr vdd vdd PMOSlvt L=.045um W=0.045um
MPlxctloutrl qqbar ctloutrl vdd vdd PMOSlvt L=.045um W=0.045um
MPlxsrst qqbar sysreset vdd vdd PMOS L=.045um W=0.07um
MPfo qqinv qq vdd vdd PMOS L=0.045um W=0.09um
MNfo qqinv qq vss vss NMOS L=0.045um W=0.045um
.ENDS
Xshft0 clkej P0 P0ej sysreset vdd ctlout1rl LR RL
+ ctlout0lr ctlout0rl qq0 qq0bar vdd vss shiftstage
Xshft1 clkej P1 P1ej sysreset ctlout0lr ctlout2rl LR RL
+ ctlout1lr ctlout1rl qq1 qq1bar vdd vss shiftstage
Xshft2 clkej P2 P2ej sysreset ctlout1lr ctlout3rl LR RL
+ ctlout2lr ctlout2rl qq2 qq2bar vdd vss shiftstage
Xshft3 clkej P3 P3ej sysreset ctlout2lr ctlout4rl LR RL
+ ctlout3lr ctlout3rl qq3 qq3bar vdd vss shiftstage
Xshft4 clkej P4 P4ej sysreset ctlout3lr ctlout5rl LR RL
+ ctlout4lr ctlout4rl qq4 qq4bar vdd vss shiftstage
Xshft5 clkej P5 P5ej sysreset ctlout4lr vdd LR RL
+ ctlout5lr ctlout5rl qq5 qq5bar vdd vss shiftstage
XNANDej1 pulsein0 pulsein0 pulseinv0 vdd vss NAND2
XAND2ej pulseinv0 pulsein0 clkej vdd vss AND2
.IC V(qq0)=0 V(qq1)=0 V(qq2)=0 V(qq3)=0 V(qq4)=0 V(qq5)=0 V(LR)=1 V(RL)=0
.IC V(qq0bar)=1 V(qq1bar)=1 V(qq2bar)=1 V(qq3bar)=1 V(qq4bar)=1 V(qq5bar)=1
.IC V(P5)=0 V(P1)=0 V(P2)=0 V(P3)=0 V(P4)=0
.IC V(ctlout0lr)=1 V(ctlout1lr)=1 V(ctlout2lr)=1 V(ctlout3lr)=1 V(ctlout4lr)=1 V(ctlout5lr)=1
.IC V(ctlout0rl)=1 V(ctlout1rl)=1 V(ctlout2rl)=1 V(ctlout3rl)=1 V(ctlout4rl)=1 V(ctlout5rl)=1
vPx P0 0 PWL (0 0 7ns 0 7.1ns 1 22ns 1 22.1ns 0 100ns 0 100.1ns 1 220ns 1 220.1ns 0
+ 480ns 0 480.1ns 1 1280ns 1 1280.1ns 0 1400ns 0 1400.1ns 1 1585ns 1 1585.1ns 0
+ 1723ns 0 1723.1ns 1 1800ns 1 1800.1ns 0)
vpulses0 pulsein0 0 PULSE (0 1 10ns 1ps 1ps 20ns 40ns)
vreset sysreset 0 PWL (0 1 140ns 1 300ns 1 300.1ns 1 309.9ns 1 310ns 1 730ns 1 1800ns 1)
VHI vdd 0 DC 1
VLO vss 0 DC 0
.TRAN 1ns 2000ns
* PTM HP Model used for NMOS, PMOS
* PTM HP with Vth0 edited to +/- 0.2V for NMOSlvt, PMOSlvt
The 'Pinej' terminal of the 'shiftstage' subckt is just for observation. In fact, since the 1X load inverter is included in the subckt, one can just take out the 'qqinv' as terminal and not 'qq','qqbar', turning the shift stage into a 10 terminal (plus vdd, vss) device. The 'qqinv' is seen to drive at least 16X loads. The 1X load also serves to isolate the interior of the design from external loads. Inclusion of the 1X load in the stage makes it 20 fets in total, two more than the count of 18 indicated earlier.
The above netlist is to simulate the six stage shift register in the Left to Right mode. In order to simulate in the Right to Left mode, appropriate edits should be made. The results shown below - LR=1 RL=0 top, LR=0 RL=1 bottom
[Aug 13, 2017, Arya Raychaudhuri, Santa Clara, California]
***
Item#366> Fully fets bundled clk-to-q bidirectional shift stage (Item#s 173, 258) with clk edge synchronized data entry feature - heavily reduced in size to cater to TML (Item#364)
The earlier version of the shift stage (Item#s 173,258) had instantaneous data entry with inverted pulses/edges. But, the new fets bundled design (combining Item#s 361, 362) is to have clk edge synched data entry.
[Aug 10, 2017, Arya Raychaudhuri, Santa Clara, California]
Fets bundling reduces the fets count to 18 - of which 16 was already indicated in Item#362, plus the two on the ladder fork for data [P(n)] entry. Left to Right, and Right to Left shift simulation involving 6 stages of the above drawing is shown below. Please, note that the LVT fets used in the simulation are just the PTM HP model fets (standard Vth0) with Vth0 edited to +/- 0.3V. Some aspect ratios engg is involved - but the aim is to keep the design small yet stable. The clk edges were generated using a NAND-AND combination similar to that in Item#363 - slightly wider edges used for this case. Simulation showed that the qq terminal can carry a load of 0 - 1X. Any larger load will be buffered from the 1X load - which was seen to drive a 16X load inverter in simulation. Good thing is once the design is working, there is no need touch the interior, just take buffered outputs from the 1X load where needed.
[Aug 10, 2017, Arya Raychaudhuri, Santa Clara, California]
Note the 3d data shift possibility - the DFF role can shift the data through the QQ terminal, and the bi-directional role shifts the data to left and right sideways. So, one can create multiple shifted versions of a data stream much more easily.
***
Item#365> Understanding the charge-dischage currents when the JKFF of Item#363 works in the toggle mode (Jx=1, Kx=1)
Please refer to the discussion at the bottom of Item#363
[Aug 08, 2017, Arya Raychaudhuri, Santa Clara, California]
Note the interesting transitions from saturation to linear, and linear to saturation with lower gate drive. The state change causing events are moving from saturation to linear [square law favorable to linear favorable], while the competing events from linear to sauration with lower gate drive [linear unfavorable to square law unfavorable].
For the above simulation, clk edges ('pulseej0') from a realistic NAND-AND edge generator was used instead of the spice input 'pulsein' (Item#363).
:::::
:::::
.SUBCKT NAND2 in1 in2 out vdd vss
MN1 out in1 midx vss NMOS L=0.135um W=0.045um
MN2 midx in2 vss vss NMOS L=0.135um W=0.045um
MP1 out in1 vdd vdd PMOS L=0.135um W=0.09um
MP2 out in2 vdd vdd PMOS L=0.135um W=0.09um
.ENDS
.SUBCKT AND2 in1 in2 out1 vdd vss
MN1 out in1 midx vss NMOS L=0.045um W=0.045um
MN2 midx in2 vss vss NMOS L=0.045um W=0.045um
MP1 out in1 vdd vdd PMOS L=0.045um W=0.09um
MP2 out in2 vdd vdd PMOS L=0.045um W=0.09um
XINV out out1 vdd vss INV
.ENDS
XNANDej1 pulsein0 pulsein0 pulseinv0 vdd vss NAND2
XAND2ej pulseinv0 pulsein0 pulseej0 vdd vss AND2
vpulses0 pulsein0 0 PULSE (0 1 10ns 1ps 1ps 20ns 40ns)
:::::
:::::
***
Item#364> Depiction of the Transverse Moore's law (TML)
[Aug 05, 2017, Arya Raychaudhuri, Santa Clara, California]
***
Item#363> JKFF of Item#361 with a bundled latch such as the one in Item#362 plugged in
[Aug 10, 2017, Arya Raychaudhuri, Santa Clara, California]
Please note that when Jx=0,Kx=0, both fet ladders are off, so no control dips (jxbar,ctl) are generated - JKFF's state (QQ,~QQ) doesn't change.
If Jx=1,Kx=0, the Kx ladder is off, but the Jx ladder is on. So, QQ will be pulled up by the jxbar dip, if the QQ is at 0; otherwise it will stay at 1. During the pullup process, the jxbar input pfet to the latch needs to overcome the leakage through the ~QQ gated nfet. This is helped by the initial saturation mode of the pfet, compared to the linear nfet of the latch. Also, the jxbar input pfet is slightly wider (70nm versus 45 nm) than the nfet. Also, with the rise of QQ, the charge supply through the QQ gated pfet goes down, and the ~QQ charge is leaked away by the QQ gated nfet - this eventually cuts off the ~QQ gated nfet.
If Jx=0,Kx=1, the same events as above happen with the ~QQ node, because now the Jx ladder is off, and the Kx ladder is on only when QQ is 1. If, to start with, QQ=0, ~QQ=1, no change takes place. Otherwise, ~QQ is pulled up.
Things get a bit more interesting when Jx=1, Kx=1, and both ladders are on with QQ=1. Although the jxbar and the ctl dips are of the same shape (since the fet ladders have identical smallest sized fets on both sides), the ctl gated pfet being wider and in saturation wins the competition to pull up the ~QQ - higher charge supply on the ctl input side.
When Jx=1, Kx=1, and QQ=0 to start with, The Kx ladder is off, initially. So, the QQ gets fully pulled up, and ~QQ pulled down, and the late opening of the Kx ladder cannot reverse the states. So, this time, higher charge supply on the Jxbar input side due to longer on-time of that pfet prevails.
Obviously, the modality here is somewhat different from the situation where the NAND latch was not bundled - as in Item#359, where the L's of a couple of fets on the ladder were edited to accomplish the clk-to-q mechanism. It's about disbalancing the charging-discharging of the QQ, ~QQ nodes in an appropriate way.
***
Item#362> Fets bundling of the latch in clk-to-q bi-directional shift stage of Item#361
The fets bundling of the latch saves another 6 fets
[Aug 03, 2017, Arya Raychaudhuri, Santa Clara, California]
Since all the inputs to the latch are dips, the bundling should work. This makes the shift stage possible with only 16 fets in total.
Similar reduction is possible with the latch in the JKFF of Item#359 to save 3 fets in that case. So, that will be 13 fets in total for the JKFF - down from 22 in a typical all NAND JKFF, or 26 in an AND-NOR implementation.
An example is shown below for an SR latch (only two inputs Sx, Rx)
.SUBCKT INV in1 out vdd vss
MN4 out in1 vss vss NMOS L=0.045um W=0.045um
MP4 out in1 vdd vdd PMOS L=0.045um W=0.045um
.ENDS
MPSx Qx Sx vdd vdd PMOS L=0.045um W=0.09um
MPRx Qxbar Rx vdd vdd PMOS L=0.045um W=0.09um
XQxinv Qxbar Qx vdd vss INV
XQxbarinv Qx Qxbar vdd vss INV
VSx Sx 0 PULSE (1 0 10ns 1ps 1ps 40ps 40ns)
VRx Rx 0 PULSE (1 0 25ns 1ps 1ps 40ps 40ns)
VHI vdd 0 DC 1
VLO vss 0 DC 0
.IC V(Qx)=0
.TRAN 1ns 200ns
**PTM HP Model
[Aug 03, 2017, Arya Raychaudhuri, Santa Clara, California]
The multi-input latch bundling is an extension of the "minimal cmos sr flip flop" - except that the inputs are given through the pfets (since we are dealing with dips), rather than through nfets, and no clock is needed because it is just a latch, not a flipflop, and edges diven. Also, note that some of the edges are generated with a clk edge already.
One factoid that emerges from this bundling is that the latch can be set or reset with both positive and negative edges, depending on whether you feed the edge through nfets or through pfets. In order to do that with typical NAND or NOR latches, one would have to invert the edge first. So, this will save an inverter or two in some cases. Also, the fact that you can implement a NOR latch, a NAND latch, or even a combo latch with just one bundle.
***
Item#361> The clk-to-q bi-directional shift stage of Item#173 further reduced with a forked fets ladder
The new drawing saves 6 more fets
[Aug 10, 2017, Arya Raychaudhuri, Santa Clara, California]
Basically, six fets forming one of the two clk edge sense NAND gates (LR/RL) are no longer required. The forked fets ladder should accomplish the task.
This kind of logical reduction makes the Moore's law in the transverse direction possible and appealing - letting the longitudinal direction look like brute force!
***
Item#360> Usefulness of the fanout (Item#359) and the drive strength of QQ
The fanout is useful in stopping the racing of QQ into the right ladder during its pullup. So, if we regard a fanout of
MPfo qqinv qq vdd vdd PMOS L=0.045um W=0.09um
MNfo qqinv qq vss vss NMOS L=0.045um W=0.045um
as 1X load for the JKFF, my simulation shows that it (QQ) can take 4X load in terms of fanout fets' input cap. For example,
MPfo qqinv qq vdd vdd PMOS L=0.045um W=0.36um
MNfo qqinv qq vss vss NMOS L=0.045um W=0.18um
works successfully with the parameters indicated in Item#359. But, the minimal (1X) fanout is needed. For even larger fanouts (>4X ) some buffers should be added.
***
Item#359> Combining the two ladders (Item#s 352,358) into a fork to reduce fet counts
Noting from Item#358 that L_MNKQclk can be as low as 45nm, one can combine MNKQclk with MNJclk to reduce a fet count and loading on the clk edge.
[Aug 10, 10, 2017, Arya Raychaudhuri, Santa Clara, California]
The above forked ladder JKFF works fine with the same fanout as in Item#357 at W_clkEdge=40ps with L_MPKQclk=110nm and L_MNQ=150nm, with all other fets at minimum size. These target values retain +/-10% gaps from the working range limits for this config.
Please, note also that I have created the symbols for the ladders on the right. The symbols denote the physical description more than the logical description.
***
Item#358> Finding useful parameter (W,L, clk edge width) ranges for successful operation of the JKFF with a fanout (Item#357)
The new JKFF is like a musical instrument that needs to be tuned (finding workable parameter sets) before playing. Although it is possible to play with the W,L of all the fets in the structure, fortunately, good parameter ranges can be found from the fets of the right ladder (QQ,Kx) only.
For this experiment, I fix the clk edge width at 40ps (more energetic than the 25ps used earlier), and a fanout at the same level as in Item#357. All fets of the JKFF are kept at the minimum size (W=L=45nm), except
MPKQclk ctl pulsein vdd vdd PMOS L=0.09um W=0.045um
MNKQclk qqbars pulsein vss vss NMOS L=0.09um W=0.045um
MNQ kxbars qq qqbars vss NMOS L=0.18um W=0.045um
for successful operation of the JKFF with the fanout.
Now, let's focus on these 4 parameters
L_MPKQclk = 90nm, L_MNKQclk = 90nm , L_MNQ = 180nm, W_clkEdge = 40ps.
I fix three of the 4 parameters at a time at the above values, and find the working range on the 4th. This way, the following ranges were found:
L_MPKQclk --> 65nm -- 240nm
L_MNKQclk --> 45nm -- 135nm
L_MNQ --> 120nm -- 225nm
W_clkEdge --> 35ps -- 48ps
The ranges indicate tolerance to process variation related shifts. In any situation, it is best to target the above four parameters to be set at a value so that the range limits are at least +/-10% away. Two good things about keeping most fets at the minimum size are, one that the min feature sizes are heavily controlled in a process, and two that the loading on J and K remains minimal.
***
Item#357> If we add a fanout to the qq node of the circuit indicated in Item#356
If we add a fanout like an inverter
MPfo qqinv qq vdd vdd PMOS L=0.045um W=0.09um
MNfo qqinv qq vss vss NMOS L=0.045um W=0.09um
at the qq node, the simulation shows
[Aug 01, 2017, Arya Raychaudhuri, Santa Clara, California]
PLease, note that due to the increased cap at the qq node, the ctl dip is finding it difficult to pull down the qq potential after penetrating the two NANDs of the latch, and losing to the competing effect of the jxbar dip. So, the ctl dip needs to be a bit wider - linger beyond the jxbar dip. This is accomplished by making the pullup pfet of the QQ,Kx ladder a bit sluggish
MPKQclk ctl pulsein vdd vdd PMOS L=0.065um W=0.045um
[Aug 01, 2017, Arya Raychaudhuri, Santa Clara, California]
The following sim plots compare the jxbar, ctl between the two cases with L=45nm (unsucessful pulldown) and L=65nm (successful pulldown) for the MPKQclk pfet
[Aug 01, 2017, Arya Raychaudhuri, Santa Clara, California]
***
Item#356> Making the JKFF of Item#352 work with the smallest sized fets (L=45nm, W=45nm) for the tech node - reducing the loading on J, K, clk edge even further
Of course, the clk edge query based clk-to-q JKFF of Item#352 has two important advantages. One is to reduce fet counts from the conventional JKFF. For the AND-NOR JKFF, it's reduced from 26 fets (adding 2 for the reset input) to 17 in the new JKFF. Also, for an all-NAND JKFF, it's reduced from 22 to 17. The other advantage is to reduce the loading on the J, K inputs - which are now driving one nftet (smaller than pfet, typically) each, as opposed to a complementary pfet-nfet pair in the conventional JKFF.
The advantages will be more prominent if we could run the new JKFF with all smallest sized fets (L,W at 45nm). It works with the following edits to the netlist given in Item#353
.SUBCKT NAND3a in1 in2 in3 out vdd vss
MN1 out in1 midx vss NMOS L=0.045um W=0.045um
MN2 midx in2 midx2 vss NMOS L=0.045um W=0.045um
MN3 midx2 in3 vss vss NMOS L=0.045um W=0.045um
MP1 out in1 vdd vdd PMOS L=0.045um W=0.045um
MP2 out in2 vdd vdd PMOS L=0.045um W=0.045um
MP3 out in3 vdd vdd PMOS L=0.045um W=0.045um
.ENDS
.SUBCKT NAND2 in1 in2 out vdd vss
MN1 out in1 midx vss NMOS L=0.045um W=0.045um
MN2 midx in2 vss vss NMOS L=0.045um W=0.045um
MP1 out in1 vdd vdd PMOS L=0.045um W=0.045um
MP2 out in2 vdd vdd PMOS L=0.045um W=0.045um
.ENDS
MPJclk jxbar pulsein vdd vdd PMOS L=.045um W=0.045um
MNJclk jxbars pulsein vss vss NMOS L=.045um W=0.045um
MNJ jxbar Jx jxbars vss NMOS L=.045um W=0.045um
XNAND2_1 jxbar qqbar qq vdd vss Nand2
MPKQclk ctl pulsein vdd vdd PMOS L=0.045um W=0.045um
MNKQclk qqbars pulsein vss vss NMOS L=0.045um W=0.045um
MNQ kxbars qq qqbars vss NMOS L=0.045um W=0.045um
MNK ctl Kx kxbars vss NMOS L=0.045um W=0.045um
XNAND3_1 ctl qq sysreset qqbar vdd vss Nand3a
vpulses pulsein 0 PULSE (0 1 10ns 1ps 1ps 25ps 40ns)
Please, note that I had to thin the clk edge down to adjust to the new relative delays.
***
Item#355> JKFF of Item#352 reconnected as a TFF - an upgrade to the clk-to-q TFF of Item#175
The following design replaces the single Fet with an clk edge query ladder to sense T, and also the clk-to-q compare NAND is replaced by another clk edge query ladder to sense QQ.
[July 31, 2017, Arya Raychaudhuri, Santa Clara, California]
Please, note that the edge query TFF corresponds to the Jx=1, Kx=1 situation of the JKFF of Item#352, Since Kx will be always 1, the Kx nfet is removed from the ladder, and the Jx is turned into the toggle input T. The removal of the Kx nfet makes the ctl dip a bit faster, so, the pfets of the lower NAND are made a bit sluggish with higher values of L (compare with the netlist in Item#353), in order to achieve proper toggle. The clk-to-q method relies on relative gate delays.
One interesting thing to note is the difference in configuration of this TFF with the one in Item#314 - even keeping in mind that the clk-to-q compare NAND gates in that TFF could also be replaced by edge query ladders.
***
Item#354> JKFF of Item#352 reconnected as a DFF
[July 31, 2017, Arya Raychaudhuri, Santa Clara, California]
Note that when D (J input) is 1, the Dbar dip (K input) keeps the right ladder off. When D is 1, the left ladder is off, and the right ladder creates the ctl dip at a clk edge if QQ is 1.
***
Item#353> The netlist used in the simulation of the clk edge query JKFF of Item#352
Please, note that I have used the PTM HP model for this, and not the PTM LP model (higher Vth0) as in many of the other simulations. Simply because, Item#183 was also simulated with the HP model.
[July 31, 2017, Arya Raychaudhuri, Santa Clara, California]
** Simulation tool used: LTSpiceIV
***
Item#352> An interesting application of the edge query method (Item#339) in the clk-to-q JK flipflop of Item#183
The discussion on the fets bundle approach (Item#s 326-351) suggests that the static CMOS bundles would be more appropriate for datapath delay reduction purposes, than the edge query based approach. Because, the latter will need a NAND and a latch to produce a static CMOS type output - sort of like always needing to carry its umbrella. But, in situations where the dip from the edge query bundle can be directly used in the connected circuitry (e.g., a lookup table for direct decimal arithmetic, as in Item#325), we see its better usefulness.
Another example is shown below, where the clk-to-q JK FlipFlop of Item#183 is being upgraded wih the help of the edge query approach - eliminating the need for the inverted clk edge, and the single pfet with the source connected to vdd, and the gate to the J input. In the following design, only the positive clk edge is needed to drive both the Jx and Kx input ladders [we are calling J, K as Jx, Kx here]
[July 31, 2017, Arya Raychaudhuri, Santa Clara, California]
Please note that the typical AND-NOR JK flipflop (courtesy, www.allaboutcircuits.com) was simulated to serve as a reference only.
Similarly, in other clk-to-q circuits I have earlier discussed, the edge query ladder can replace the clk-to-q compare NAND gate (producing the ctl dip) , to make the design smaller.
***
Item#351> Logic stitching at the fets level - creating the nfets and pfets bundle from gates based logic
Please, refer to the discussion in Item#s 347,349 to understand the transformations in the below diagram
[July 27, 2017, Arya Raychaudhuri, Santa Clara, California]
One interesting point about the AND-OR/OR-AND complementarity is that it also requires swapping the branch circuits. Note how the J is 0 space (discussed in Item#347) comes into play while inverting the logical inputs to the pfets bundle, after expanding the gates based logic expression (F).
Note the 50% fets count reduction through bundling, plus delay reduction through combining three gates into one. Helping the Moore's law in the transverse direction.
[July 27, 2017, Arya Raychaudhuri, Santa Clara, California]
In the above diagram, looking at the pfets side from J is 0 (~J is 1) perspective, please note that [1 1 1 1] successfully passes the power down. Also, looking at the nfets side from J is 1 (~j is 0) space, we can see that [0 0 0 0] successfully passes the ground up. Again, we can verify that [1 1 1 1] stops the ground propagation, while [0 0 0 0] stops the power propagation. Exemplifying complimentarity with two lines from the truth table. You can construct the truth table from the expression of F and verify that all the lines are correct.
[July 27, 2017, Arya Raychaudhuri, Santa Clara, California]
For example, [0 0 0 1] passes the power down, [1 1 1 0] passes the ground up, etc.
***
Item#350> Simulation of the reduced jumbo gate of Item#349
Please, refer to the discussion in Item#349.
The following simulation compares the reduced fets bundled CMOS gate (Fj) against the gates based drawing of the Karnaugh mapped expression (F). Both use the LVT fets (Vth0= +/- 0.3V). Please note that the truth table from the simulation matches that in the bottom diagram of Item#347. Obviously, the bundled gate runs a little bit faster (~25ps), and consumes 40% less fet counts. The gains will vary from case to case. But, almost certainly, there will be some gains from bundling.
[July 25, 2017, Arya Raychaudhuri, Santa Clara, California]
Please, note that it is always advisable to use LVT fets for the bundles, otherwise the series resistance of multiple fets in series can cause significant voltage drops.
***
Item#349> Looking at the bottom jumbo gate in Item#347
Once the gates are configired following the method indicated in Item#347, always note that the successful ground propagation (as also indicated in Item#328 for pulsed ground) is defined by the nfet bundle, and since the ground is 0 the logical expression for the gate is inverse of the nfet bundle logic.
[another interesting pointer is that the logical expression of the gate is the non-inverted pfet bundle logic with each of its inputs inverted]
For example, for the bottom jumbo gate in Item#347, the truth table evaluates to
F= ~AC + A~B~C
[using online Karnaugh mapper: http://www.32x8.com/]
The nfet bundle logic as in the jumbo gate:
(A+B+~C)(A+~B+~C)(~A+B+C)
minimize ~((A+B+~C)(A+~B+~C)(~A+B+C)) to get exactly:
a~b~c + ~ac
[using online boolean reducer: http://tma.main.jp/logic/index_en.html]
This verifies the logic of the jumbo gate.
Now, sometimes, the boolean reduction of the nfet bundle is not as straightforward as in Item#348, so, one can use a boolean reducer like the one noted above. For example, (A+B+~C)(A+~B+~C)(~A+B+C) reduces to ~a~c + ac + b~c
[July 24, 2017, Arya Raychaudhuri, Santa Clara, California]
Of course, knowing that the nfet bundle logic is just the inverse of the logical expression for the gate, one could first obtain the minimal logical expression for the gate from the truh table using the Karnaugh mapper, and then invert it using the boolean reducer to obtain the same form as on the right of the above diagram. But, Item#347 indicates a method that is basic and clearer.
***
Item#348> Reduction of the top jumbo gate in Item#347
The nfet bundle reveals the logic
~A.~B.C + A.~B.C = (~A + A) . ~B.C = ~B.C [four fets only needed for the gate]
[July 23, 2017, Arya Raychaudhuri, Santa Clara, California]
Which means A has no control over the outcome F, and need not be connected. This is also seen from the truth table:
A B C F
0 0 1 0
1 0 1 0
Obviously, for the same set of logcal values of B and C [0 1], A's logical state 0/1 doesn't influence the outcome F.
From the methodology discussed in Item#347, it is quite clear that very few truth table lines with either F=0, or with F=1 yield smaller gates. For example, all lines 1, except [1 1 1] -> 0 would yield the 3-input NAND gate; all lines 0, except [0 0 0] -> 1 would yield a 3-input NOR gate.
In the limiting cases, where all lines are 1 - it's just vdd, no fets are needed, similarly, all lines 0 is just vss. NAND, NOR gates still remain *universal* gates in the sense that you can construct any truth table with them. But, now you can directly build a truth table with fets - no need to go via NAND, NOR. In fact, the truth tables of NAND, NOR, INV, etc can also be built with the methodology discussed in Item#347.
***
Item#347> Static bundled CMOS gate from Truth Tables
So far we have discussed the creation of bundled nfets from Truth Tables, for use with edge query situations. Moving on from the 'dynamic' bundles, we now look at the creation of static bundled gates - just like typical AND, NOR, etc. gates, but more complex. Because, they capture a longer logic sream into a single CMOS bundle.
[July 23, 23, 2017, Arya Raychaudhuri, Santa Clara, California]
Note that for F=0, vss needs to propagate up, so the ORed AND columns are associated with vss. For F=1, vdd needs to propagate down - so, the ORed AND columns are associated with vdd.
Why ORed AND columns? Because, an AND column transmits only when *all* fets are ON for a unique set of inputs or their inverse. Since each line in the truth table is unique, when one AND coumn transmits, all others must be OFF. Naturally, the AND columns will be all OFF when a set of inputs are presented, that are not coded in. For example, in the top diagram, only two sets of inputs, coming from the two F=0 line should transmit, so, two AND columns. The inputs are presented to the fets in an AND column in such a way (J/~J) so that each fet finds it favorable to turn ON.
Why J/~J from J is1 space when focussing on the F=0 lines, and J/~J from J is 0 space when dealing with F=1 lines? For an nfet, J=1is favorable, for a pfet J=0 transmits. Doing the design from the pfet side should be logically identical with a design done from the nfet side. For example, if we did an inverter from the pfet side with J is 1, it would have both its fets gated to ~J - so, it would become a non-inverting buffer.
Why are ANDed OR rows on the complementary side needed?Just to make sure that the gate works truly like a CMOS gate, when one bundle (nfet/pfet) is ON, the other should be OFF. Since, an OR row is OFF, when *each* fet in the row is OFF, All the inputs that makes an AND culumn ON, must go to its complemrntary OR row, to keep it OFF. When these OR rows are ANDed, the OR bundle is OFF when any of the AND columns is ON. Naturally, for any other combination of inputs, the OR bundle is ON, and the AND columns are OFF.
***
Item#346> Unequal OO=0 and OO=1 lines - an arbitrary 3-input truth table - the 3-input XOR (Item#345) had equal 0 and 1 lines
[July 23, 2017, Arya Raychaudhuri, Santa Clara, California]
The scheme is such that it can be programmed to create a netlist from a truth table rather easily.
***
Item#345> Directly converting a truth table to fets bundle - a 3-input XOR example
[July 22, 2017, Arya Raychaudhuri, Santa Clara, California]
In the above bundle, check for the off conditions, e.g., for A=0, B=1, C=1, the bundle should be off. Since B=1, the first and the third columns are off, Since C=1, the second column is off, since A=0, the fourth column is off. Verified.
[July 22, 2017, Arya Raychaudhuri, Santa Clara, California]
In the above bundle, check for the on conditions, e.g., for A=0, B=1, C=0, the bundle should be on. Since B=1, the bottom row is on. Since A=0, the second and the fourth from the bottom are on, since C=0, the third row from the bottom is on. Verified.
***
Item#344> Simulation of the fets bundle including the XOR (Item#343)
On the logic gates side a NOR based XOR was used (from wikipedia)
[July 21, 2017, Arya Raychaudhuri, Santa Clara, California]
The simulation was done for three cycles of a 1 GHz clock. The control biases are indicated on top of the panels (below). Note in the top panel how the qq and qqf0 rises and falls under the influence of the control biases, and track OO and X respectively.
In the second panel, since C is 0 and D is initially 0, we see the first rise of qq, but qq falls only after the third query. While qqf0 rises and falls as before, tracking X. When D rises in mid-cycle (2.5ns), OO immediately falls - now, this could be avoided by letting the D rise come through a flop (synchronized with B). Interestingly, the edge query based approach automatically synchronizes - note that qq falls only after the third edge (P00e), and not abruptly as D rises.
[July 21, 2017, Arya Raychaudhuri, Santa Clara, California]
***
Item#343> Fanout from the output of the XOR in Item#342
Please, note that in Item#342,
(~A + B) . (A + ~B) =
~(A . ~B) . ~(~A . B) =
~(A . ~B + ~A .B)
Which is XNOR - that is the reason for swapping the f0,f0inv inputs to the QQf0 latch in the following diagram. The C fet is added to cut off the XNOR section from the electron packet through the ~C fet when C=0, as indicated in Item#339, 337
[July 20, 2017, Arya Raychaudhuri, Santa Clara, California]
***
Item#342> A fets bundle involving XOR
Please, note how we can use Boolean Algebra to transform the logic towards fets based flow
[July 20, 2017, Arya Raychaudhuri, Santa Clara, California]
***
Item#341> A faster fets bundle equivalence for Item#328
[July 19, 2017, Arya Raychaudhuri, Santa Clara, California]
What if the AND following the NOR is replaced by a NAND
[July 20, 2017, Arya Raychaudhuri, Santa Clara, California]
***
Item#340> Making the falling QQ and QQf0 faster - refer to the simulation in Item#335
The bottom pane of the middle panel in Item#335 shows that the falling QQ and QQf0 are slower than the rising QQ, QQf0 as in the bottom pane of the top panel. This is due to the fact that the falling QQ, QQf0 are caused by the p0inv, f0inv dips generated by the slow 'inverters' using LP fets (Not LVT), while the rising QQ, QQf0 are caused by p0, f0 directly.
I have now optimized the 'inverters' of Item#338 to use all LVT fets except the nfets driven by p0,f0 (as indicated in the below netlist) to integrate them to the fanout circuit of Item#335
[July 18, 18, 2017, Arya Raychaudhuri, Santa Clara, California]
***
Item#339> Imaginative description of the fets bundle approach - electron packet movement under the field effect of control gates
The fets bundle method is a new and setup time friendly way to do logical synthesis. In this approach the speed is increased by minimizing inter-gates delays, and reducing the number of fets per gate. Also, instead of looking at the signal transmitting through logic gates, we are looking at a pulsed ground (electron packet) propagating under the field effect of the fets' control gates. Here is a pictorial view that corresponds to the main fets bundle in Item#337
Why the ~C fet was necessary
[July 16, 16, 2017, Arya Raychaudhuri, Santa Clara, California]
The small inteval of time (~100ps) taken by the edges is seen to easily retain the 1 (even with a positive charge pump bounce) at p0, f0 if the logical flow does not send the 0 dips there - a charge decay rate of interest is the typical dynamic RAM refresh rate, 64ms (wikipedia).
Fets bundles have been used in other dynamic logic circuits including domino logic, but, its application here with edge based query is somewhat different. The edge based query method processes a chunk of datapath logic, latches it with a dynamic NAND (Item#338) and a NAND latch, or even a multiple of them if FanOuts are extracted. Once latched, the next chunk can just be static logic, or another fets bundle replacing the static logic, operating with a slighly delayed query edge. The main purpose here is to replace the datapath or sections of it with a logical lookup table to attack the path delay time. No equal precharge and evaluate cycles (requiring high frequency), static inverters and such as in domino logic. Just focus on what you can do inside a thin query edge, which can come from phasing a clock edge, or by just sensing logic transitions.
***
Item#338> Removing the droopy (negative) charge pump bounce seen in p0inv and f0inv - mid pane of the top panel in Item#335
Please, note that the bounce has been successfully leaked by adding the small pfets (in yellow circles) - compare the newly simulated p0inv, f0inv (mid pane) to those in the mid pane of the top panel in Item#335. This makes the p0, f0 'inverters' structurally similar to the CMOS NAND gate - so, no additional power dissipation.
[July 17, 2017, Arya Raychaudhuri, Santa Clara, California]
***
Item#337> fanout from deep inside a longer logic stream
Please, note that in this case, an approach similar to the second diagram in Item#336 is used
[July 15, 2017, Arya Raychaudhuri, Santa Clara, California]
***
Item#336> What if a fanout is needed from the output of the A,B AND gate, out of the fets bundle
[July 15, 2017, Arya Raychaudhuri, Santa Clara, California]
Another more generalizable way
[July 15, 2017, Arya Raychaudhuri, Santa Clara, California]
***
Item#335> Fanout from an intermediate node of the fets bundle of Item#334
In this Item the circuit edit to extract a fanout from an intermediate node is shown. Note that the node X on the left is equivalent to the node f0 on the bundle.
[July 15, 2017, Arya Raychaudhuri, Santa Clara, California]
The following simulation shows that extracting the fanout (QQf0) delays the QQ only by 10ps when compared with the top panel of Item#334. Also, there is no steady state current leakage through vdd during P00e, only spikes during its 10ps rise/fall times. This is due to the CMOS mode operation of the ladders - when the top fet is off the bottom is on, and vice versa.
When both QQ and QQf0 track rising OO and X
[July 15, 2017, Arya Raychaudhuri, Santa Clara, California]
When both QQ and QQf0 track falling OO and X
[July 15, 2017, Arya Raychaudhuri, Santa Clara, California]
When D = 0, only QQf0 track rising X , QQ stays flat at 0 due to the p0inv dip
[July 15, 2017, Arya Raychaudhuri, Santa Clara, California]
In fact, if the previous state of QQ were 1, it would have been brought down to 0 by the p0inv dip, upon receiving an updated D during this cycle.
***
Item#334> Simulation of the fets bundle equivalence shown in Item#333
Please, note that by consideing the logical success (ground propagation) speed of achieving the p0 dip, I have moved the P00e edge closer to rising and falling B. The first simulation shows rising OO tracking, and the second simulation shows falling OO tracking
[July 14, 2017, Arya Raychaudhuri, Santa Clara, California]
Note the small negative charge pump bounce on p0inv which should be ideally flat - but the small bounce is harmless on the succeeding NAND latch.
The OO falling edge tracking (below) by QQ shows less gain on the delay time than with the rising edge tracking (above), but, still it give about 40% advantage over the gate-based for the small logic stream simulated
[July 14, 2017, Arya Raychaudhuri, Santa Clara, California]
***
Item#333> Letting the QQ of Item#326 stay beyond the next inverted clk edge, letting it alter state only through changing logical inputs- just like gates based logic streams
From the discussion at the bottom of Item#332, it is clear that the maximum delay time impact is seen when regular LP Vth0 gates are replaced with LVT Vth0 fet bundle. That's what is being targeted in the follwing example. In certain situations, when QQ is getting fed to a FF, unsetting it with the next inverted clk edge may be okay, but, what if QQ is shared elsewhere as an input. That's why, considering the general logical requirements, we need to let QQ follow only the changing logical input B (and/or, other inputs) for a particular clk cycle. In fact, the fet bundle pulse P00e (simplified as a spice PWL in the simulations) could also be genenerated by sensing the B rise/fall as shown below, rather than by phasing the clk edge as discussed earlier. But, if other inputs (A,C,D) can also change, one must use phased clk edge.
[July 14, 2017, Arya Raychaudhuri, Santa Clara, California]
Please, note that on the right side (fets bundle), only the p0 inverting is being done using regular LP fets (on the HVT side), all other fets used on the fets bundle side are LVT fets. This is due to the fact that p0 is slightly delayed with respect to the rise of P00e, so ideally you would need a slightly delayed P00e for the special inversion. But, we can avoid bringing another delayed P00e into the mix by simply using slightly slower fets for the inverter. This inverter is special - whem p0 is a dip, p0inv should be flat at 1, when p0 is flat at 1, p0inv should be dip.
***
Item#332> Impact of using Low Vt fets on delay reduction through fets bundling - the case of Item#326 resimulated with LVT fets.
The simulation in Item#327 was done with the generic PTM 45nm LP fets, NMOS Vth0= 0.62261, PMOS Vth0= -0.587, which are on the HVT side.
Simulation shows that if low vt fets (Vth0 = +/- 0.3V) are used [just took the PTM 45nm LP model sets and edited just the Vth0 values, and the model names to nmoslvt and pmoslvt], the small logic stream of Item#326 doesn't give any delay time advantage when bundled (the top pane below). But, for longer logic streams there is path delay advantage (the bottom pane below)
[July 13, 2017, Arya Raychaudhuri, Santa Clara, California]
This indicates that fets bundling helps in the case of longer logic streams where the use of LVT fets cannot sufficiently cut the path delay. But, if one chooses to use only HVT fets, even for small logic streams bundling produces impact on path delay as seen in Item#327.
***
Item#331> Example of fets count reduction with fets bundling in a small logic stream
The goal of fets bundling in Item#s 326, 328 was to show logic stream path delay reduction - the fets count reduction gets more noticeable with the bundling of longer pieces of logic stream. But, in some logical situations, even a small logic piece shows very prominent fets count reduction. For example, we want to get a negative edge from a positive edge if the states A,B,C,D,E are all 1s. The negative edge may be used to set a shift register through its p(n) terminal (Item#314). The following figure shows the progressive fets count reduction with the fets bundle on the right showing maximum reduction
[July 11, 2017, Arya Raychaudhuri, Santa Clara, California]
Such striking reduction of fet counts happens because we are looking at reducing a 6 input NAND gate. Instead, if we reduced a 2 input NAND gate (see, for example, the middle fets ladder in Item#328) the reduction would be from 4 fets to 3 fets.
***
Item#330> Two ANDs pipes - one for clocking the FlipFlops, and the other phase-shifted (see Item#313) from the first to generate the P00e type edges (see Item#326) that drive the fet bundles
Instead of delaying a particular clk edge to create the P00e as shown in Item#326, one can use a second ANDs pipe to generate phase shifted edges, as in Item#313. These phase shifted edges can then drive multiple fet bundles in different datapaths. Also, there is a possibility of deriving multiple phases from a single ANDs pipe - to drive fet bundles in different sections of a datapath. It's also possible to convert only a section of the datapath into a fet bundle, rather than the entire logic stream.
***
Item#329> Simulation of the fet bundle equivalence shown in Item#328
Please, note that the bundling helps save 30ps this time, compared to 74ps in the OR case (Item#327). But, then, NOR is one inverter less than OR.
[July 11, 2017, Arya Raychaudhuri, Santa Clara, California]
The phased clk edge P00e is a means of querying the logical state in each clock cycle. So, this is different from the conventional situation where a logical state can stay unchanged over cycles. Here, the state is reset after the query and presentation, and then set back to its old state during the next query, if the logical environment did not change.
***
Item#328> Fet bundle equivalence when the OR gate in Item#326 is replaced by a NOR gate
[July 11, 2017, Arya Raychaudhuri, Santa Clara, California]
In any of the logical fet ladders, the ground propagation success defines the logical flow, and the output is negative (~) because it is a dip. For example, in this case, by looking at the logic stream on the left we can say that QQ needs to be D. ~(A.B + C). So , p0 should be ~(D. ~(A.B + C)) which is equivalent to ~(D . ~(A.B) . ~C) [by deMorgan's rule]. This defines the third ladder to the right. While the ~(A.B) and ~C come from the first and the second parallel ladders respectively.
***
Item#327> Simulation of the fet bundle equivalence shown in Item#326
The spice netlist for the simulation is included at the top. Please, note that the model used was PTM LP 45nm. As is seen in the top panel (below), we gain about 74ps on the output OO by phasing the P00e by 50ps from the rise of B. However, if A is held at 0, nothing happens except the little 1 bounce in p0 (as earlier discussed in Item#325).
[July 13, 2017, Arya Raychaudhuri, Santa Clara, California]
Interestingly, even for the extremely small logic stream between the two FFs, 74ps are seen to have been saved. For longer datapaths or sections of datapaths, more can be saved by fets bundling. The bundling is more effective when the logic gates are 'positive' gates like AND, OR. It gets a bit tricky wih 'negative' gates like NAND, NOR inserted into the mix - we see that next.
***
Item#326> The special case of the Add table that made the reduction of Item#325 possible - moving a datapath to match the special case
The reduction of a logic stream into a bundle of fets is possible only if the states are updated before the edge (e.g., P00e ) comes in. Interestingly, this special situation will always hold true for the addition table (e.g., Item#320), by design.
How about moving other situations to match the special situation, and then apply the reduction! Here is an example of how a datapath in a clocked digital circuit is being moved to match the special case - the goal is to reduce path delay.
[July 11, 2017, Arya Raychaudhuri, Santa Clara, California]
The challenge will be to quickly create the timing models for these bundles and integrate them into the Physical Design framework.
Please, note that the path delay on the left is 2 NANDs, 1 NOR and 3 INVs - and, the path delay on the right is only ~2 NANDs. Also, the fanout related loading on A, B, C, D goes down as well, an indication on how introducing an edge into the mix helps.
***
Item#325> Addressing the slow rise of p0 (Item#324) if the load pfet were always ON
In the case of a decimal table, the number of nfets in a bundle will be larger than in the case of a base 4 situation shown in Item#324. The p0 dip will rise slower due to a larger number of drain caps in parallel under the pfet load, and also due to the limitation on the aspect ratio (W/L) of the pfet as discussed in Item#323. This problem can be addressed by applying the P00e edge to the pfet's gate, rather than the vss which kept it always ON. This will also remove the limitation on the W/L of the pfet, and the W/L can be made appropriately larger to get sharp rise of p0. More like CMOS.
[July 07, 2017, Arya Raychaudhuri, Santa Clara, California]
A simulation where x2, y2 were moved to 1 before the P00e is shown below - please note the sharp fall and rise of the p0 dip.
[July 08, 2017, Arya Raychaudhuri, Santa Clara, California]
On the other hand, if all of the four paths under p0 are open (e.g., x0=0, x1=0, x2=0, x3=0), p0 won't dip to 0. Instead, a small charge pump related bounce is seen in the simulation (below) - this is good because it does not impact the 1-ness of p0, as desired in this case. Had P00e not been an edge, but a state, p0 would have eventually relaxed to somewhere between 1 and 0 - that would not be acceptable.
[July 09, 2017, Arya Raychaudhuri, Santa Clara, California]
***
Item#324> Further logic reduction based on the principle shown in Item#323
The entire logic stream leading up to the generation of the p0 0-dip can be reduced
[July 07, 2017, Arya Raychaudhuri, Santa Clara, California]
Please, note that in the above case, 42 fets are reduced to 10, and the delay is reduced from 4 gates delay to just two nfets.
Obviously, in the decimal add table's case there will 11 such bundles of fets generating p0 ... thru ... p9, and c0. Of course, the fet counts reducion will vary from one logic stream to another, but, the delay will always be 2 nfets (1 gate).
***
Item#323> An interesting advantage of edge based design - Logic reduction for delay and fet counts in the Add Table (e.g., Item#320)
An example is shown below for the p0 0-dip generation OR-NAND gates of Item#320
[July 07, 2017, Arya Raychaudhuri, Santa Clara, California]
Please, note that MPx is always on, but conducts current only during the thin edge P00e - so, steady state current is ~0.
The nature of the Add Table is such that only 1 of the inputs to the OR gate (left) can be 1 at one p0 write time. So, I hold s0c_2 and s0 at 0 for the following simulation. Please, look up the p0 dip I get in the second panel below
[July 07, 2017, Arya Raychaudhuri, Santa Clara, California]
Since MPx is always on, the value of L for MPx was set by doing a Q-point analysis in the top panel (above, where P00e was replaced by vdd only for the Q-point analysis). A 0-dip voltage value of 35mV is small enough. If we let L of MPx be lower, the 0-dip voltage would be higher, etc.
The interesting thing to note is that we have reduced the fet counts from 12 to 5, and the gate delay to just one nfet delay.
***
Item#322> Extending the addition look-up table based approach of Item#320 to direct decimal full addition - the schematic drawing is shown
Three AND pipes with slight offsets in Pipe1 and Pipe2 with respect to Pipe0 are used. Pipe0 moves the input X, Y data to the addition shift registers, and also shifts the pointer in horizontal register in the middle. Pipe1 helps write the add table's half add results into the pointer-pointed result register/carry bit. Pipe2 generates the edges that do carry add just as in Item#307. Pipe3 is a separate pipe that does the result data translation just as in Item#321. Only one add table is used for the Full Add.
[July 05, 2017, Arya Raychaudhuri, Santa Clara, California]
***
Item#321> Transverse movements using LR, RL in a matrix of shift register stages - getting the data out of the result registers in a 3-digit base 4 adder (see Item#320)
The transverse movement will be very relevant for matrix structure of decimal data, as we will see in the next Item. The following diagram shows the special connections
[July 05, 2017, Arya Raychaudhuri, Santa Clara, California]
Please, note how the carry bit of the most significant bit is getting translated out only if it is a 1 at the end of the full add.
The matrix movement capability adds another feather to the bi-directional shift stage's (Item#173) cap - makes it look as versatile as a "transistor" in the domain of data shifts.
***
Item#320> Addition look up table for a base 4 half Adder as indicated in Item#319
Please, note that the following half adder looks up the basic additions
3+0=3, 3+1=10, 3+2=11, 3+3=12
2+0=2, 2+1=3, 2+2=10
1+0=1, 1+1=2
0+0=0
[for example, 12 means qs2 is set 1, and qc0 is set 1, 3 means only qs3 is set 1, etc.]
[July 03, 2017, Arya Raychaudhuri, Santa Clara, California]
As is seen from the above diagram, just one edge P00e is needed to accomplish the half addition, while 20 or so edges were needed for the half adder of Item#301.
***
Item#319> Similarity between perl/shell forking (see Item#s 207, 181) and the parallel mode direct decimal full comparer (Item#s 318,316), subtractor (Item#310), adder (Item#307)
Parallelize sub-operations and collect the results, wait for the lengthiest sub-operation to finish, and then process the results - that's the paradigm used in the direct decimal *parallel mode* full comparer/adder/subtractor. This is similar to Perl/Shell forking shown in Item#s 207, 181 - taken into the circuit domain.
Now, can these sub-operations that are taking 20 or so edges for the half subtractor/adder, 10 or so for the half comparer run faster? Yes, by using look-up tables for half addition/subtraction/compare, just as was indicated for multiplication in Item#311. Then the 20/10 edges needed will reduce to just 1, as we will see in the next Item, with a base-4 half adder (a simpler addition table leads to a smaller diagram).
***
Item#318> The parallel mode full compare circuit of Item#316 simulated
Two numbers 653 and 454 being compared
[July 01, 2017, Arya Raychaudhuri, Santa Clara, California]
Two numbers 300 and 310 being compared
[July 01, 2017, Arya Raychaudhuri, Santa Clara, California]
Two numbers 974 and 974 being compared
[July 01, 2017, Arya Raychaudhuri, Santa Clara, California]
***
Item#317> Crucial logical piece in the full compare circuit of Item#316
It's the small NOR-OR piece that generates the p(n) to set the Y latch of the digit compare results (see top right of the bottom diag in Item#316). If for a particular pair of digits qx, qy,
qx>qy:
we need, xeqy to be a zero dip, xgty to be a zero dip, and p to be 1
qy>qx:
we need, xeqy to be a zero dip, xgty to be 1, and p to be a zero dip
qx=qy:
we need, xeqy to be 1, xgty to be 1, and p to be 1
The following simulation bears it out
[June 28, 2017, Arya Raychaudhuri, Santa Clara, California]
***
Item#316> Converting the digit compare circuit (half comparer) of Item#265 into a parallel mode full compare circuit - 3 digits number compare example shown
Follows the same line of thinking as with the parallel mode full adder/subtractor discussed earlier.
[June 28, 28, 2017, Arya Raychaudhuri, Santa Clara, California]
Please, note that for the units place, the ANDs pipe extends beyond the comparison of the digit.
***
Item#315> On the generalized nature of the *compact* clk-to-q bi-directional shift stage (Item#s 173, 258, 314)
The compactness can be easily seen by noting that all the control gates for each stage are gone in the new shift register. Please, compare in Google Images with keywords "bidirectional shift register" and "bidirectional shift register lvs". Because of this compactness, the new shift register has evoked the possibility of its wide-spread use, such as in the mazes (Item#257), and so on.
The new shift register which developed out of shifts in a latch cluster (Item#172) is based on the 11 (plus vdd,vss) terminal generalized shift stage (Item#314). It has been successfully used in many simulations so far, as seen on the code snippets pages. A generalization makes sense when it is reducible to special cases with some modifications, as shown in Item#s 309, 314, and also to a one directional shift register (Item#172).
The transformation of the shift stage in Item#314 shows another important thing - the clk-to-q mechanism already worked (implicitly) in the case of the traditional clk edge driven T-flipflop, and it was well accepted by users. So, the same mechanism when applied to the shift stage should be too.
***
Item#314> Showing that the clk-to-q bi-directional shift stage (Item#s 173, 258) is easily reconfigurable to a typical clk edge driven T-flipflop
Reconfigurability is the name of the game - one example was shown in Item#309 where the carry bit connections were adjusted to suit the needs. Here is another one, where a bi-directional stage is reconfigured into a typical clk edge driven t-FlipFlop (look for all NAND designs with keyword "T flip flop" in Google Images)
[July 31, 2017, Arya Raychaudhuri, Santa Clara, California]
Please note that the unused inputs are dummy tied to vdd, and the LR=RL=vdd serves as the Toggle terminal of a T-flipflop. The input to the ctlout(n)rl NAND gate is re-connected to qq(n)bar instead of to qq(n) as in the shift stage. Also, the ctlout(n)rl output is being redirected to the ctlout(n+1)rl input to the shift stage
A simulation is shown below where instead of using two bi-directional stages as in Item#263, one reconfigured T-flipflop stage (as shown above) is being used to divide the frequency of the ANDs pipe edges (as in Item#313) by 2.
[June 25, 2017, Arya Raychaudhuri, Santa Clara, California]
This reconfigurability adds to the confidence in the bi-directional shift stage. Since 3 of its terminals were dummy vdd-tied to get the T-flipflop, it means that the 11-terminal shift stage contains only 6 fets more than those in the traditional clk edge driven T-flipflop. Quite compact for its versatility.
***
Item#313> ANDs pipe for clock distribution and phase shifting - can play the role of a PLL
Considering the fact that the delay at each step of the ANDs pipe is constant and that the edge generation is taking place using the same circuit everywhere, the edges that are coming out are strictly related to the incoming ref clock by exactly the same measures of phase delay. So, a single buffered ref clock can drive several ANDs pipes at various locations of the chip to produce exactly phase matched edges. Here the phase matching is just a result of the nature of the simple design - no PLL/VCO, feedback needed. The following simulation shows phase adjustments indicated by the double arrowheads
[June 24, 2017, Arya Raychaudhuri, Santa Clara, California]
By picking P05E,P11E pair instead of P02E,P08E for ORing, we can increase the edge delay with respect to the rising edge of the ref clk (mid panes).
***
Item#312> Frequency multiplier with the ANDs pipe used in Item#s 297-311
In the previous Items, a finite number of edges were required and generated out of the ANDs pipe. This happens (as in Item#297) when T-clk>pipe-length in time - please see below.
[June 22, 2017, Arya Raychaudhuri, Santa Clara, California]
But, if the T-clk<pipe-length, continuous edge generation can occur, because when the E1 tends to cease near the right end of the pipe, another edge E2 comes up to maintain the continuity. Here, the ANDs pipe starts behaving like a ring oscillator, except that unlike with the ring osc, the edges are synced in phase with the clk entering the ANDs pipe. The ANDs pipe acts like an open bracelet giving more flexibility.
The following simulation examples show how the clk frequency is multipled to 12X and 6X edges. And, the edges are then divided by 2 (as indicated by the yellow verticals) in frequency by a pair of shift register stages as in Item#263. So, the shift register stages yield 6X and 3X frequency (wrt clk) pulses.
[June 24, 2017, Arya Raychaudhuri, Santa Clara, California]
The 12X and 6X edges are generated by ORing all or alternate edges from the ANDs pipe, for eaxmple, for 12X edges,
XNOR01 P01E P02E P03E P04E P05E P12345 vdd vss NOR5
XNOR02 P06E P07E P08E P09E P10E P67890 vdd vss NOR5
XNOR03 P11E P12E P13E P14E P15E PP12345 vdd vss NOR5
XNOR04 P16E P17E P18E P19E P20E PP67890 vdd vss NOR5
XNAND01 P12345 P67890 clk1 vdd vss NAND2
XNAND02 PP12345 PP67890 clk2 vdd vss NAND2
XOR1 clk1 clk2 Ex12 vdd vss OR2
For 6X edges, the first four NOR5s get alternate edges
XNOR01 P01E vss P03E vss P05E P12345 vdd vss NOR5
XNOR02 vss P07E vss P09E vss P67890 vdd vss NOR5
XNOR03 P11E vss P13E vss P15E PP12345 vdd vss NOR5
XNOR04 vss P17E vss P19E vss PP67890 vdd vss NOR5
Please, note that other factors for frequency multiplication can be created by making the ANDs pipe slower/faster, picking up different set of edges for ORing.
***
Item#311> Comments on the Item#s 265, 307 and 310
The qx>qy? algorithm (Item#265), the parallel implementations of the full Adder (Item#307) and the full Subtractor (Item#310) form the basic subroutines for performing more complex tasks such as direct decimal multiplication and division - with some additional logic added for each task.
For example, for multiplication, a half multiplier is nothing but a basic integers multiplication table (9x9, 9x8, ....9x1, 8x8, 8x7, ...8x1, 7x7,7x6...7x1, .......2x2,2x1,1x1,0x*) that can be coded in with AND and OR gates. Let's say, we are trying to get the product of 63x7. Put 6 in A column, 3 in B, and 7 in C. Now, B3.C7+C3.B7 will be coded to write a 1 in D0s1, and a 1 in D0c2 (sum and carry registers), also A6.C7+C6.A7 will be made to write a 1 in D1s2 and a 1 in D1c4. Then add the D0c2 to the D1s register, it should make two shift-ups, yielding a result of 441 - 1 in D1c4, 1 in D1s4, and a 1 in D0s1. If it is 63X47, do the same thing for 4, as done for 7 to get a 1 in E1c2, 1 in E1s5, and a 1 in E0s2. Now, shift E to the left by one register (with a 1 at its 0 bit) and add.
For division, which is repeated subtraction, the goal is to keep the number of subtractions minimal. Always, take the divisor to the order of the dividend by adding 0s to the right (that is, blank registers with 1 at the 0 bit). At every step, the subtraction stops when the remainder is < the divisor (with added 0s) at that stage, and the new divisor will one order lower. Keeping count of the number subtractions at each step yields the quotient. Let's say, we are dividing 6523 by 23. So at the first step, 2300 will be repeatedly subtracted from 6523, after 2 subtractions, the remainder is 1923 (<2300). In the second step, 230 will be repeatedly subtracted from 1923 - after 8 subtractions, the remainder is 83 (<230). In the thid step, 23 will be repeatedly subtracted from 83 - after 3 subtractions, the remainder will be 14 (<23). Hence the quotient is 283, and the remainder is 14. Obviously, this will reduce the number of full subtractions to 13, instead, if we kept subtracting 23 from 6523, it would be 283 4-digit full subtractions!
***
Item#310> Parallel implementation of a 3-digit direct decimal full Subtractor with 3 half Subtractors of Item#302 running in parallel
[June 17, 2017, Arya Raychaudhuri, Santa Clara, California]
In this case, the full Subtractor will run in half Subtractor time plus a few more edges. For example, a sequential mode Subtractor (Item#309) working with 10 digits will take 130ns, while the above scheme will run in 20ns. Also, if another parallel mode Subtractor runs simultaneously to get the negative results [-(B-A)], no saving in separate registers is needed, because the carry related adjustments is a post processing on the result.
***
Item#309> Result register and carry bit connections for the direct decimal subtractor simulated in Item#308
[June 17, 2017, Arya Raychaudhuri, Santa Clara, California]
For C and F please see Item#305, for pulsein1 (the clocking edges), please see Item#303 top panel.
Note that the carry bit is connected in such a way that ~C sets it at the start of a subtraction, and in the LR=1, RL=0 mode, when qq9 rotates to qq0, the carry bit is unset as discussed in Item#302.
While in the LR=0, RL=1 mode, when qq0 turns around to qq9, the carry bit is set from its unset state - this will be made use of in the parallel implementation of the subtractor.
***
Item#308> A sequential mode 3-digit Full Subtractor based on the half Subtractor of Item#302 simulated
One case is shown below - spice data input, followed by the simulation results. In the simulation results, the left most pane shows the 13ns clock used, the next pane shows the transferred contents of the left column (minuend, A*), the third pane shows the transferred contents of the right column (subtrahend, B*), with carry related shifts if/when it occurs, the next pane shows the FullSubtractor difference digits as output after each digit subtract, the fifth column shows the carry bit movements, the sixth shows the result register shifts.
[June 16, 16, 2017, Arya Raychaudhuri, Santa Clara, California]
Please, note that if the final state of the carry bit is 1, that means A is less than B. So, in that case, you would want
- (B-A)
to be the answer. But, initially, you don't know if A>=B or B>A. A possible solution may be that you run two subtractors simultaneously, one for (A - B), the other for (B - A), and save the results in three registers for each case. If the carry bit for (A - B) is 1, it's A < B, so release the results for (B - A) saved registers, and the carry bit as the (-) sign.
***
All snippets presented here are strictly proprietary to LVS DEBUG SOLUTIONS LLC, any commercial use of these is prohibited, without proper distribution agreement with the owner
Arya Raychaudhuri
[the dates added to the drawings and plates indicate last edit dates]
This "code_snippets3" page is a continuation of the "code_snippets2" page.
- PGEN, VMON based telephone exchange simulation
- RC timer based sequential timer, duty cycler
- Piston-Pawl Bike, 4-wheeler, boat
- ElectoMagnetic engine with pawl based (no crank) power transfer
- Transverse Moore's Law (TML) concept
- Fully fets bundled clk-to-q bi-directional shift stage with 20 fets only, dff feature added - 3d data shift
- Reduced clk-to-q JKFF, TFF, DFF with fets bundling
- Fets bundling for datapath delay reduction and fets count reduction
- Addition look-up table based 10 digit decimal integer full add scheme
- Matrix data shift with clk-to-q bi-directional shift stages
- single edge half addition using addition look-up table
- parallel mode decimal integers compare circuit
- Reconfigurabiliy of the clk-to-q bi-directional shift stage into the edge-driven T-Flipflop
- frequency multiplication and phase-synched clock distribution use of the ANDs pipe
- direct decimal multiplication and division indications
- sequential and parallel mode direct decimal Full Subtractor
Copyright 2011 LVS DEBUG SOLUTIONS & 2012 LVS DEBUG SOLUTIONS LLC All rights reserved.
LVS DEBUG SOLUTIONS LLC
980 Kiely Blvd, Unit 308
Santa Clara, CA 95051
United States
ph: 1-408-480-1936
arya