static timing analysis interview questions with answers pdf

Static timing analysis (STA) is a crucial component of the modern-day digital integrated circuit development process. It is a tool used to verify the design’s timing requirements in order to guarantee the correct operation of the digital integrated circuit. The STA process involves a variety of techniques, techniques that are often quite complex and require a deep understanding of the intricacies of the integrated circuit design. As such, it is a skill that many employers seek in prospective integrated circuit design engineers. Therefore, for those interested in pursuing a career in this field, it is important to be well-versed in STA concepts and techniques. To assist engineers in their job search, this blog post provides a detailed overview of the typical STA interview questions and answers that one might expect to encounter. A PDF of these questions and answers is also provided for easy reference. Thus, those seeking a job in integrated circuit design can use this information to better prepare themselves for the STA portion of the interview.

Static timing Analysis Interview questions Part1

Are you knowledgeable about Static Timing Analysis (STA)? Use our Static Timing Analysis (STA) job interview questions and answers to better your career. There are numerous positions available in STA functional roles if you are skilled in creating constraints, CDC checks, logical/physical synthesis, formal verification, and pre/post layout timing closure. Without using a full circuit simulation, static timing analysis (STA) is a method of estimating the timing of a digital circuit. Find and apply for the right jobs on wisdomjobs, such as those for static timing analysis engineers, RTL designers, Synthesis STA experts, physical design engineers, senior physical design engineers, tech leads, and architects, among others. Visit the Wisdom Jobs online job board for increased employment prospects.

Finding every route a signal might possibly take from its source to its destination is known as path tracing. This is crucial for STA because it enables us to recognize all potential timing delays that might happen. We can then improve the design to cut down on the overall delay by knowing the route that a signal takes.

A technique called Static Timing Analysis (STA) is used to check a digital circuit’s timing. Any digital design engineer needs to have it, so being able to answer STA questions with confidence can help you get the job. In this article, some typical STA interview questions are covered, along with advice on how to respond to them.

To ensure that a digital circuit will operate within timing specifications, static timing analysis is used. STA’s primary objective is to guarantee that the circuit won’t defy any timing restrictions that might cause a malfunction. STA can assist in lowering the power consumption of the digital chip by ensuring that the circuit will function properly.

Because they can cause timing errors in a design, multi-cycle paths are crucial. Any path that requires more than one clock cycle to complete is known as a multi-cycle path. This can occur when a path is too long or contains an excessive number of logic levels. Because they can cause system delays, multi-cycle paths can be problematic. The system may malfunction or produce inaccurate results as a result of these delays.

The first step is to develop a circuit timing model. The delay of each gate and the connections between the gates will be included in this model. The next step is to simulate the circuit to ascertain the delay of each path after the timing model has been created. The timing model is then used to confirm that the circuit complies with the timing requirements.

What is STA, and what are setup and hold time violations? 3) Signal Integrity. 4) Variation. 5) Clocks. 6) Metastability. 7) Misc. Answer W1): Static Timing Analysis is a technique for analyzing timing paths in a digital logic by adding up delays along a timing path (both gate and interconnect) and comparing it with constraints (clock period) to see if the path satisfies the constraint. Static timing analysis uses very basic models of device and wire delays to perform a worst-case analysis, in contrast to dynamic spice simulation of the entire design. The device is modeled using a lookup table or a straightforward constant current or voltage source. Wire delays are quickly calculated using the Elmore delay or an equivalent model. Because it is easy to use and only requires commonly available inputs like the technology library, netlist, constraints, and parasitics (R and C), static timing analysis is very popular. Comprehensive and offering a very high level of timing coverage, static timing analysis Additionally, it respects timing exceptions to reject paths that are either not true paths or are not used in a real design. A good static timing tool correlates well with actual silicon. What all are the things checked by static timing analysis? Response to Question W2): The setup and hold time checks are the main things that static timing analysis is used to check. However, it also makes sure that the presumptions made during timing analysis are accurate. It primarily checks that cells’ input slope and output load capacitance are within the library’s characterization range. To confirm the accuracy of the clock waveform assumptions, it also verifies the integrity of the clock signal. Setup Timing, Hold Timing, Removal and Recovery Timing on Resets, Clock Gating Checks, Min/Max Fanout, Max Capacitance, and Max/Min Timing Between Two Points on a Segment of Timing Path Requirements for the Latch Time Borrowing Clock’s pulse width 3 Setup and Hold Time Infractions Question S1): Describe a timing path. Answer S1: The following figure shows a basic timing path for cell-based designs. A latch or a flip-flop could be the sequential (storage element) where the timing path typically begins. The flip-flop/latch’s clock pin is where the timing path begins. This element’s active clock edge causes the data at its output to change. This is the initial delay, also known as the clock to data out(Q) delay. Then, interconnect wires and stages of combinational delay are applied to the data. Each of these stages has a unique timing delay that builds up along the way. At some point, the information reaches the sampling storage element, which is once more a flip-flop or a latch. Data must then pass setup and hold tests against the clock of the receiving flip-flop or latch at that point. Additionally, note that the sampling and generating flip-flop clocks come from the same source, which is referred to as the point of divergence, for the timing paths in the same clock domain. The first time clocks diverge to form the generating path and sampling path, as shown in this image, is the actual start point for synchronous clock-based circuits. This point is also known as the point of divergence. We agree that the clock will essentially arrive at a fixed time at the clock pin of all sequentials in the design to simplify analysis. This simplified the analysis of the timing path. from one sequential to another sequential. 4 Figure S1. Timing path from one Flipflop to another Flipflop. What different kinds of timing paths can digital logic be divided into, according to response to S2 of the question? Any of the following can be a timing path: Figure S2. Various types of timing paths. 5 i. a route from one register/latch’s clock pin to another register/latch’s d-pin ii. a route from primary input to a register’s or latch’s d-pin iii. a route from a register’s clock pin to the primary output iv. A timing path from primary input to macro input pin. v. a timing route from the primary output pin to the macro output pin vi. A timing path (not depicted in the figure) from a macro output pin to another macro input pin a route that goes through a block’s input and output pins and through combinational logic there Answer S3 to S3: A launch edge is the point in synchronous design where a certain amount of computation or activity occurs within a clock cycle. In synchronous designs, memory components like flip-flops and latches are used to keep the input values stable throughout the clock cycle while the computations are being run. Start the activity at the beginning of the clock cycle, and finish it and have the results ready by the end of the clock cycle. In a design, memory components move data from input to output on either the rising or falling edge of the clock. This edge is called the active edge of the clock. Data propagates through the combinational logic to the input of the second memory element during the clock cycle from the output of one memory element. At the input of the second memory element, the data must adhere to a specific arrival time requirement. Figure S3. Launch edge and capture edge. 6 As seen in the above figure, the first memory element’s active edge of the clock (shown in red) makes new data available at the memory element’s output and initiates data propagation through the logic. Although input “in” has reached one before the clock’s first active (rising) edge, this value is only transferred to Q1 pin when the clock rises. Because it launches data at the output of the first memory element, which ultimately needs to be captured by the next memory element along the data propagation path, this active edge of the clock is known as the launch edge. Answer to Question S4: As was previously discussed, synchronous circuits require a certain amount of computation to be completed within a clock cycle. This is what capture edge is. Memory elements transfer a brand-new set of data at the output pin of the launching memory elements at the clock’s launch edge. This new information propagates through the combinational logic used to complete the required computation. New computed data must be accessible at the subsequent set of memory elements by the end of the clock cycle. Because the memory element’s D2 pin captures the computed results at the end of one clock cycle’s next active clock edge and transfers them to the Q2 pin for the subsequent clock cycle. The capture edge is the name given to the following active edge of the clock, which is depicted in figure 1 in blue, because it actually captures the results at the conclusion of the clock cycle. There are some caveats to be aware of. To ensure proper capture, the data D2 must arrive at a specific time before the clock’s capture edge. Setup time requirement is what this is, and we’ll talk more about it later. Contrary to popular belief, computation does not always need to be completed within a single clock cycle. In general, it is true that computation must be completed in one clock cycle, but this is not always the case. When this happens we call it a multi cycle path. What is the setup time for any sequential element? Question S5) g. When the clock-capture edge is active, the input data for either a latch or a flip-flop must be stable. Since a sequential element (such as a latch or flip-flop) can enter a metastable state and take an unpredictable amount of time to resolve it, settling at a state that is different from the input value, capturing unintended value at the output, data must actually be stable for a period of time before the clock-capture edge activates. The setup time of that sequential element refers to the amount of time needed for input data to be stable before the clock capture edge activates. What is hold time? Question S6) As was mentioned in the previous answer regarding setup time, any sequential element e g. When the clock-capture edge is active, data must be kept stable using a latch or flip-flop. Since sequential elements can enter a metastable state and capture the wrong value at the output if data is changing close to the clock-capture edge, it is actually necessary to maintain data stability for a while after the clock-capture edge deactivates. The hold time requirement for that sequential process is the amount of time that data must be kept stable for after the clock capture-edge deactivates. What factors affect a flip-flop’s setup time? Answer S7: A flip-flop’s setup time is influenced by the input data slope, clock slope, and output load. The Input data slope, Clock slope, and Output load are what determine a flip-flop’s hold time, according to the response to Question S8. Question S9) Describe how combinational delays are used to propagate signals from one flip-flop to another. Answer S9) The structure shown below is straightforward, with a flop’s output passing through stages of combinational logic (represented by pink bubbles) before being sampled by a receiving flop. Timing constraints are placed on the input data signal by the receiving flop, which samples the FF2_in data. Signal transitions should be able to propagate through the logic from FF1_out to FF2_in quickly enough for the receiving flop to catch them. Before the capture clock edge at the flop, the input data must arrive and stabilize for a while in order for the flop to successfully capture it. This requirement is called the setup time of the flop. When there is too much logic between two flops or the combinational delay is too short, setup time problems typically arise. As a result, this is occasionally referred to as a max delay or slow delay timing issue, and the constraint is known as a max delay constraint. Figure shows a maximum delay restriction on the FF2_in input when receiving a flop. You now understand that the max delay or slow delay constraint depends on frequency. If your setup to a flop is failing and you decrease the clock frequency, your clock cycle time increases, giving your slow signal transitions more time to propagate through and allowing you to now satisfy the setup requirements. Usually, your digital circuit operates at a specific frequency, which establishes your max delay 8 restrictions. Setup or max, slack, or margin refers to the amount of time the signal is late compared to the setup time. Figure S9. Explain setup failure to a flip-flop in terms of signal timing propagation from flip-flop to flip-flop (Question S10). Answer S10) Following figure describes visually a setup failure. As you can see, the first flop releases the data at the clock’s active edge, which is also its rising edge. FF1_out falls sometime after the clk1 rises. Clock to out delay is the term used to describe the time interval between the clock rising and the data changing at the output pin. The signal travels with a finite delay through some combinational logic from FF1_out to FF2_in. After this delay, the second flop receives the signal, and FF2_in drops. The orange/red vertical dotted line indicates that FF2_in falls after the setup requirement of the second flop due to the significant delay from FF1_out to FF2_in. This indicates that the second flop’s input signal, FF2_in, is not maintained stable for the flop’s required setup time, which causes the flop to become metastable and improperly capture the data at its output. 9 As you can see, one would have anticipated that the Out node would go low, but it doesn’t due to setup time or a failure of the maximum delay at the input of the second flop. Input signal stability is required during the setup window, which is the period of time prior to the clock capture edge. As was previously mentioned, decreasing frequency lengthens our cycle time, which eventually enables FF2_in to arrive in time and prevent setup failures. Observe the clock skew at the second flop as well. The setup failure is made worse by the second flop’s clock, clk2, arriving earlier than clk1 and not being in phase with it anymore. Given that clocks won’t arrive at all receivers at the same time in the real world, the designer will need to take the clock skew into consideration. Well talk separately about clock skew in details Figure S10. Setup/Max delay failure to a flip-flop. Question S11) Explain hold failure to a flip-flop. Answer S11: Each sequential element (such as a latch or flop) has a hold requirement, similar to setup. According to that requirement, input data must be stable for a predetermined amount of time or window following the assertion of the sequential element’s active/capturing edge. The 10 sequential elements’ output could become metastable if input data changes during the hold requirement time/window or it could accidentally capture input data. As a result, it is imperative that input data be retained until the relevant sequential’s hold requirement time is satisfied. The data at the first flop’s input pin In is correctly captured in the illustration below because it complies with setup requirements. The first flop’s output, FF1_out, is an inverted version of the input, In. As you can see, after a specific clock to out delay, output FF1_out drops once the active edge of the clock for the first flop occurs, which is here the rising edge. Now, for the sake of understanding, assume that the signal travels incredibly quickly from FF1_out to FF2_in and that the combinational delay is extremely small, as shown in the figure below. In practice, this could occur for a variety of reasons, including design (consider a signal path with only a small wire connecting the first and second flops and no other devices), device variation, which could result in extremely fast devices along the path, capacitance coupling with nearby wires, which could favor transitions from the FF1_out to the FF2_in, and a node adjacent to the FF2_in m. In reality, there are a number of factors that can cause a device’s delay to increase along the signal transmission path. Fast data ultimately leads to FF2_in transitioning during the hold time requirement window of the flop timed by clk2, effectively violating the hold requirement for the clk2 flop. While the design intention was to capture the falling transition of FF2_in in the second cycle of clk2, this results in the falling transition of FF2_in being captured in the first clk2 cycle. One would ideally have satisfied the hold requirement for the first clock cycle of clk2 at the second flop and FF2_in would meet setup before the second clock cycle of clk2, and when second clock cycle starts, at the active e. In a normal synchronous design, where you have a series of flip-flops clocked by a grid clock (clock shown in figure below), the intention is that in first clock cycle for clk1 & clk2, Now, if you’ve noticed, there is a skew between clk1 and clk2, which causes clk2 edge to appear later than clk1 edge (ideally, clk1 & clk2 should be perfectly aligned, that’s ideal!!). If both clocks were perfectly aligned, FF2_in fall could have occurred later and would have satisfied the hold requirement for the clk2 flop, and we would not have captured incorrect data!! 11 Figure S11 In our example, this is exacerbating the hold issue. Hold/Min delay requirement for a flop. Answer S12): No, you cannot sign off the design if there are hold violations. If there are hold violations in the design, is it still acceptable to do so? Because hold violations are functional failures. Setup violations are frequency dependent. You can reduce frequency and prevent setup failures. Hold violations caused by the same clock edge race are frequency independent and functional failures because they may result in the collection of unintended data and place your state machine in an unknown state. The purpose of clock gating is to block the clock pulses and prevent clock toggling. What are setup and hold checks for clock gating, and why are they necessary? With the aid of an AND gate, an enable signal either masks or unmasks the clock pulses. Given that the signal being considered is a clock signal, caution must be taken to avoid altering the shape of the clock pulse that we are passing through and introducing any glitches. 12 Figure S13. Clock gating setup and hold check As shown in the figure, the enable signal must be configured prior to the clock’s rising edge in order to prevent the clock’s rising edge from being chopped. The clock gating setup or clock gating default max check is what this is. In order to avoid being cut off, the enable(EN) signal’s tuning off or going away edge must occur well after the clock’s turning off or going away edge. The clock gating hold or clock gating default min check is what this is. What determines the highest frequency a digital design will operate at is question S14). Answer S14): Worst max margin determines the maximum frequency a design will work on. That is why hold time is not taken into account in the calculation for the above. As setup failure is frequency dependent. Hold failure is not frequency dependent, so it is not taken into account when calculating frequency. Question S15). One returned chip from manufacturing fails the setup test, and another one fails the hold test. Which one is still applicable and why? Response S15): Setup failure depends on frequency. If a particular path fails the setup requirement, you can reduce the frequency until the setup succeeds. This is so that the flop/latch input data has more time to match setup requirements when the frequency is reduced. Hence we call setup failure a frequency dependent failure. While hold failure is not frequency dependent. Hold failure is functional failure. Following figure shows frequency dependence of setup failure. 13 Figure S15a. Frequency dependence of setup failure. The above figure shows that the setup fails when the clock is clocked faster. The setup window for the capture flop is indicated by the red vertical line. The red dotted vertical lines represent the setup window, which the D input should have arrived before. The output node OUT becomes metastable as D fails setup and takes some time to stabilize. This metastability could cause problems downstream in the circuit. You can see that D will meet the setup for the capture flop now if the clock is slowed down. Although it is not depicted in the figure for simplicity’s sake, the launch clock is now also slow, even though it can be assumed that it is the same as a fast clock. You can see that the setup window is independent of clock because it is a property of the capture flop. That is why we can meet setup with slow clock. The following graph demonstrates why decreasing frequency does not eliminate hold failures. 14 Figure S15b. Frequency independent hold failure. The hold failure is a data race, as you can see in the figure. The Q output of the launch flop (LF) goes low because “IN” does, and this is supposed to be captured by the capture flop (CF), whose output is supposed to go low after a clock cycle. However, because Q and D don’t have any (very slight) lag time, D goes low during the hold window of the capture flop. For the capture flop, D goes low and violates the hold time. In these circumstances, either the capture flop output can become metastable or the new value of D can be immediately captured at the capture flop’s output, “OUT.” The figure above illustrates how ‘OUT’ also goes low immediately. The intended behavior of ‘OUT’ was for it to go low after a clock cycle, but due to fast data, data from input ‘D’ appeared at ‘OUT’ during the current clock cycle, giving it the incorrect value for this clock cycle. Due to the incorrect “OUT” value, this indicates that the downstream logic is in an unknown state. The issue continues to exist even when the slower clock is used, as you can see in the bottom part of the waveforms. Because of the fast data delay from Q to D, which persists even if the clock frequency is changed because it is frequency independent, this is actually a data race problem. Consequently, it is clear that hold failures may not be frequency dependent. The best way to understand the Max Timing Equation is to look at the waveforms, according to response S16 to question S16. Please go through following figure carefully. Figure S16 Max timing equation. The diagram above illustrates what makes up a setup or maximum timing path as well as all the elements involved in determining the maximum timing slack. The source clock in the illustration above is the clock’s original source, which could be a PLL output or another location where the source clock’s starting point is specified. Clocks that do not originate from the primary chip input or the direct output of PLLs are referred to as derived clocks, virtual clocks, or generated clocks. This is our master reference clock, and it typically marks the beginning or 0 ps point. 16 The source clock is set to 0 ps at the beginning. Clock network delay from the source clock to the launch flop is present, and it is added. We add this up once the launch clock active edge reaches the launch flop and releases data after the clock to Q delay. Data arrives at the D input pin of the capture flop through cells and wires from the flop’s Q pin. Given that this is the route taken from launch flop to capture flop, it is known as the path delay. The current total indicates the arrival of data at the capture flop input pin. To put it another way, this sum must be less than or equal to the setup or capture requirements, which are depicted in the figure with the vertical dashed red line. Always keep in mind that in STA we analyze in the worst case, so we will use the slowest delay up until the capture flop input. Lets look at the capture requirement. Since we are aware that capture occurs one cycle after launch clock, we begin with source clock capture edge, which is one cycle after launch edge at a time equal to one clock cycle. We add the clock network delay from the source clock to the capture flop, which is similar to the launch delay but actually delays the capture clock. In the worst-case scenario, we use the quickest capture clock delay because it will give us more time to complete setup. The input data at the flop now needs to comply with the setup requirement once the capture clock arrives at the capture flop. This requirement calls for the input data to the capture flop to arrive that much earlier, so we deduct setup time from our calculation of the capture requirement. Additionally, we must factor in clock uncertainty because actual clock arrival times may differ due to variation, IR drop, and other factors. To account for this uncertainty, we must add extra margin. Since this is a fine or requirement that makes data arrive even earlier, we deduct this amount from the capture requirement. Source capture clock edge (one clock cycle) + Source launch clock edge (zero ps) + Launch clock network slowest delay + Clock to Q slowest delay + Slowest Path delay (cell + interconnect) = Max clock uncertainty Also. Question S17): What is the min timing equation? Answer S17): Let’s look at the following waveforms to better understand the min timing equation. Max margin/slack = [Source capture clock edge(One clock cycle) + Capture clock network fastest delay – Setup time – Max clock uncertainty] – [Source launch clock edge(0 ps) + Launch clock network slowest delay + Clock to Q slowest delay + Slowest path delay (cell + interconnect)]. 17 Figure S17. Min timing equation. Since launched data is intended to be captured one cycle later and not in the current clock itself, the min timing check or hold time check basically makes sure that the data launched on the launch edge at the launch flop is not inadvertently captured by the capture flop at the launch edge. Similar to how the maximum timing source clock is the master reference and the start(rising) edge of the source clock is the start point Only clock travels from there to the launch flop through the launch clock network, so we add up the delay of the launch clock network. After a clock to Q delay, the launch clock edge releases the data at the launch flop’s output once it reaches the launch flop. We add up this delay. Next we add up path delay. Now the data has arrived at capture flop. This information must have been received after the hold or minimum time period. Next we calculate hold time requirements. We begin with the same clock edge that we began with at the launch side for the hold requirement on the capture side. The setup capture clock edge for the same launch and capture flop timing path and the clock edge that is one clock cycle earlier are two alternative ways to look at this. Actually, many timing tools, like PrimeTime, determine which edge to compare the required hold time against by using this method. The tool first determines the setup requirement capture clock edge, which is one clock cycle after the launch edge, and then it proceeds to determine the launch edge’s clock edge by tracing backwards one clock cycle. We subtract the hold time requirement from this clock edge because the input data must hold after the hold time requirement for the capture flop before it can be used. We include clock uncertainty because the capture flop’s clock edge might come in much later. The launched data must have arrived at the capture flop after this hold time requirement. Once more, we use the fastest delay up until the capture flop input and the slowest delay for the capture clock network to make the analysis worst case. Fastest path delay (cell + interconnect delays) >= Source clock launch edge(Source clock capture edge corresponding to the setup path – 1 clock period, same as 0ps) + Capture clock network slowest delay + Capture flop library hold time + Hold time clock uncertainty And Min margin = [Source clock launch clock edge(0ps) + Launch clock network fastest delay + Clock to Q fastest delay + Fastest path delay (cell + interconnect delays)] We know the max timing equation. Max margin = [0ps + launch clock network slowest delay + clk to q slowest delay + slowest path delay] Max margin = [Clock cycle + capture clock network fastest delay – setup time – max clock uncertainty] 5ns(2Ghz) + 1. 1ns – 0. 05ns – 0. 1ns ] – [ 1ns + 0. 05ns + 0. 5ns ] Max margin = [ 1. 45ns ] – [ 1. 55ns] = -0. If the 1ns Max margin is negative, the capture flop setup time check is broken and the clock period is insufficient. Reset recovery time is determined by the asserting edge (active edge) of the reset for flip flops with asynchronous reset pins (question S19). Therefore, only the reset signal going down (falling) can occur asynchronously and without the clock’s knowledge if the reset pin is active low (low on the reset bar). However, once the reset has been activated, it must eventually de-assert and remove the flip flop from the reset state. This reset de-assertion can not happen independently of the clocks. Due to the design of these flip flops, the reset de assertion must occur a predetermined amount of time before the clock’s active edge. The requirement of reset deassertion before the active edge of the clock is known as the recovery time, and it is very similar to setup check for data. Figure S19. What is rest removal time (Reset recovery time Question S20)? Answer S20): Removal time is the counterpart of recovery time. It is exactly hold time equivalent of recovery time. Similar to how reset deassertion in recovery time must occur a certain amount of time before the clock’s active edge, reset deassertion in removal time must continue after the clock’s active edge. Reset deassertion must occur a certain amount of time after the clock’s active edge in order to be effective. 20 Figure S20. How does the timing analysis tool choose to conduct the hold check given a setup check from a launch element to a capture element? Question S21): This may seem like a vague question at first, but the key is to understand the timing analysis’s subsequent behavior. This is mostly relevant to the PrimeTime tool; other STA tools might not use the same approach. One essential fact to keep in mind is that hold checks are always conducted in relation to setup checks. Timing tools then infer hold checks based on the setup check after first determining which clock edges to perform setup checks on. We assume that the launch and capture flops are both rising edge triggered for the analysis that follows. The capture clock edge, which occurs one clock cycle after the launch edge, is typically chosen to be the active clock edge for a setup check. Figure S21. Hold & Setup clock edges. We are aware that after choosing an active clock edge as the launch clock edge, the setup check is carried out on the capture edge one clock cycle later. The timing tools examine two scenarios after identifying the setup check edges to find the clock edges that should be picked up for the actual hold check. It first determines if the data launched by the launch clock edge associated with the setup check is sufficiently held to avoid accidentally being captured by the same edge at the capture flop. Figure 1’s green dotted arrow number 1 illustrates this. Then it looks for second scenario. This time, it begins at the capture clock edge that corresponds to the setup check, ensuring that the data released at the launch flop’s output by that clock edge is not unintentionally captured by the capture flop at the same clock edge. Figure 2 illustrates this with a green dotted arrow. After considering both possibilities, the timing tool selects the stricter hold check and executes that hold time check. Given that scenarios 1 and 2 in the example of the aforementioned figure are identical, the timing tool would simply choose one of them. One clarification about hold checks. When the capture edge is very close to the launch edge, the hold check is supposed to be more rigorous because it is more likely that data launched by the launch clock may accidentally be captured by the nearby capture edge when it is actually intended for the following capture edge. As you can see, the likelihood of a hold-time violation decreases as the launch edge occurs later in time than the capture edge. Basically, the farther the launch edge travels past the capture edge, the more confidently we can predict that the data it launches will be retained beyond the capture edge. When the launch and capture clocks are not the same frequency, what kind of setup and hold checks are carried out? The answer to question S22): Let’s look at three case scenarios. The launch clock is a multiple of the capture clock and is twice as fast as the capture clock in scenario 1. Scenario #2: Launch clock is multiplied by capture clock, which is twice as fast. The launch and capture clocks are not multiples of one another in scenario three. One has to hammer this deep into their mind. Static timing analysis is a worst case analysis. The tool will always analyze or check in the worst-case scenario whenever a specific check is made. Take setup check. STA tool will always perform worst case setup check. This means that after data is launched at the launch flop by an active clock edge, the tool will seek out the earliest possible active edge at which the data can be captured at the capture flop. In other words, it will use the launch clock and the capture clock to calculate the smallest distance between the active launch edge and the active capture edge that is greater than zero (it won’t choose the same edge because it is obvious that it is the incorrect edge). And we already know that it derives hold check from setup check from the previous question. Once more in hold check, it considers two possibilities and selects the worst one. 22 Figure S22 Setup and Hold clock edges. We will assume rising edge launch and capture flops here. Examining situation 1, the launch clock is faster than the capture clock. For the setup timing check, the tool will choose the distance that is the shortest between two active edges in the launch and capture clock. A dotted red line indicates the clock’s edges and the actual setup check. Setup check is relatively straightforward. After discovering setup check, it will consider the two common ways to discover hold check. Two hold check possibilities are shown with green dotted line. The first one starts one clock cycle before the setup capture edge of the capture clock and goes from the start of the setup check. This is hold check number 1. The second possibility is the hold check number 2 from the active edge of the launch clock, which is one cycle after the setup check launch edge to the setup capture edge of the capture clock. Because hold check 2 in the figure is more demanding, the tool will choose scenario 2 for hold check. The launch clock in scenario two is slower than the capture clock. Because hold check 1 is more stringent than hold check 2 in the analysis, which is similar to the earlier scenario, the tool chooses hold check 1. In scenario 3, hold 2 will be chosen as the hold check because it seems to be more stringent. Timing tool will adhere to the worst-case behavior to carry out the timing checks unless specific overrides or exceptions to instructions are given to it. Timing tool 23 doesn’t care about the nature and frequency of the launch and capture clocks. Numerous times, the tool may carry out incorrect checks by deviating from the design intent while carrying out the worst-case check. We will look at such case in subsequent questions. Does the Static Timing Analysis tool detect clock domain crossing issues? (Question S23) No, the STA tool does not detect clock domain crossing issues. The tool merely attempts to determine the worst-case setup and hold checks between launch and capture edge, as was previously mentioned. Designer has to design for clock domain crossings. How does the lockup latch prevent hold violations? (Question S24) Answer S24): If you are very familiar with hold time check or have been studying the waveforms for hold time check, you will understand that hold time issues begin to occur as soon as the launch and capture clock edges align with or are very close to one another. We are aware that the hold time concern decreases as the launch and capture edges are spaced further apart so that the launch edge is later than the capture edge. The greatest distance between an edge in launch clock and an edge in capture clock cannot be greater than clock phase when launch and capture clock are from the same source and have the same waveform. Because if you attempt to do so, you will get closer to one of the edges on the opposite side. We know that launch and capture edges would be a phase apart if the falling edge of the clock is the launch edge and the rising edge of the clock is the capture edge. As long as launch edge occurs after capture edge, we would have a phase’s worth of margin for hold check. This holds true when the rising edge is the launch edge and the falling clock edge is the capture edge. The important thing is that launch occurs after capture and that they are separated by a clock phase. This is what exactly a lock up latch achieves. The launch edge shifts from rising to falling, while the capture edge stays rising. In order to provide us with the best hold time protection, we get the launch and capture edges to be the furthest apart (clock phase). Additionally, launch occurs after capture, which is what we want. To better understand this, let’s look at the figure below. 24 Figure S24a. Lock up latch. Here, we assume that the launch and capture flops are edge-triggered. It’s a straightforward setup and hold check, as shown in the figure before the lock up latch. This type of hold check, also known as a race because the launch and capture edges are identical and resemble a data race, is problematic because it may be difficult and expensive to fix hold violations of this kind. Additionally, if the launch and captured clock common points are far apart, there may be quite a bit of clock uncertainty. When the first cell of the subsequent scan chain is in a different clock domain than the last flop of the previous scan chain in a scan or test clock, this is very typical. There could be large hold violations for such paths. Data is launched at the falling edge of the clock by the low phase latch, which is transparent throughout low phase. By introducing the lockup latch, we essentially changed the launch edge from rising to falling. As a result, our launch and capture edges are now one clock phase apart, giving us the margin (slack) needed to meet the hold time requirement. 25 Figure S24b Test clock hold violation. There may be a significant amount of uncertainty between testclock_a and testclock_b, as shown in figure. If you remember from the hold margin equation, the more negative slack that needs to be fixed depends on how uncertain the clock is. The lockup latch is added between the two chains in these circumstances to address the hold violation. Figure S24c. One must understand that the inter scan chain lockup latch isn’t completely free. We are altering our setup or maximum timing path from the original launch flop to capture flop from a full clock cycle to a half clock cycle (clock phase) because it changes the launch edge from rising to falling. Normally, you would only consider adding a lockup latch if you were experiencing hold issues, indicating that there was no initial setup issue. Because hold issues indicate a short path delay from the launch flop to the capture flop. What would happen if you moved the lockup latch in the previous example from near the launch flop to the capture flop? Question S25): The location of the lockup latch very much matters. By adding a lockup latch in position 26 between two flops, the timing path is essentially split into two sections. Both a timing path from the lockup latch to the original capture flop and a path from the original launch flop to the lockup latch are possible. The timing path from the launch flop to the lockup latch wasn’t important to us for a reason. Data is launched by the original launch flop at the rising edge of the clock, and data is captured by the low phase lockup latch at the same time. Although we timed the lockup latch with the same clock that timed the launch flop, this is not likely a hold time issue. Putting the low phase lockup latch next to the launch flop is crucial, in fact. As the same clock net is essentially driving both, doing so will ensure that there are no hold time issues from the launch flop to the lockup latch, preventing a data race from the launch flop to the lockup latch. Figure S25a. Timing path split after lockup latch Although this figure depicts that a hold check should occur from the launch flop to the lockup latch, it really is not a problem because the same clock edge launches data first before capturing it. Many timing tools are aware of this configuration and may not report this hold check; however, even when they do, they should pass. 27 Figure S25b. Wrong lockup latch location. As you can see, the hold violation from the launch flop to the lockup latch becomes the real problem once the lockup latch is placed close to the capture flop. This is because both clocks are now different and may originate from different domains, as we saw in the test clock example, and the lockup latch is really doing nothing to fix the hold violation. Therefore, it is crucial to install the lockup latch in the proper location with the proper clock. What options do you have for fixing a timing path? (Question S26) (Answer S26): There are a number of different options. – Obvious logic optimization. Can you drive with fewer buffers or inverters? If there is a NAND followed by a latch, do you have a library NAND-latch available to replace it? Do you have redundant chain of buffers or inverters? If a subgroup of logic along the path is simply too far away from the launch and capture flop, can you simply move that logic closer to the launch and capture flop? Can you move the launch flop close to the capture flop along with the logic in between? Or vice versa? – More pipelining Basically, by adding extra flop, you are adding one extra clock cycle along the timing path and harming overall logic throughput. The question is whether architectural performance allows for this. You must be careful not to break functionality and your formal equivalence with RTL must still pass if you move any logic from the current failing path to before the launch flop. If there is a NAND gate, for instance, immediately following a launch flop, one can check to see if the other input of the NAND gate, which is not coming from the launch flop in question, has a previous clock cycle version available. If so, the NAND gate can be moved before the flop. – Replicate drivers. If a particular stage is running slowly and there are several receivers involved, duplicate the driver and divide the number of receiving gates among the replicated drivers. Duplicate the buffer and have each one drive four receivers if a single buffer was driving eight receivers. – RTL Parallelism Look for RTL opportunities where you can convert serial operations to parallel operations. Since a clock cycle has more stages, serial operations take longer. We can easily meet timing on each individual operation if we can divide a lengthy serial operation into several shorter length parallel operations. – Use of Macro. RAMs are much faster than synthesized flops if there is synthesized logic that is actually a memory. Map the logic to SRAM or register file. – Synthesized if. elseif. elseif series. If random logic has been synthesized for such if. If such a library cell is available, the logic can be mapped to a passgate 4:1 mux otherwise. Such a mux is typically going to be quicker than static gates. – One Hot State Registers rather than Binary Coded State Registers If at all possible, switch to one hot state registers. Only one of them will switch in a single hot configuration at a time, making overall operation faster. – Physical design techniques The driver will experience more capacitance in either case, so driver strength will need to be increased. Can metal wires be promoted to higher metal or can the wire be made wider? Can the distance between the wires be expanded to decrease capacitance and accelerate wire delays? – Power trade off techniques. Switch to low threshold voltage library cells. These cells operate at a higher speed, higher gate leakage, and lower threshold voltage. You must stay within a chip’s overall budget for the use of such devices as you increase speed at the expense of power leakage. You can use time borrowing capture flip flops. Such flops have clock delayed by certain stages. This delays the capture clock, which helps with setup time since capture edge occurs later. However, more devices lead to more leakage and variation, more clock buffers along the clock path mean more clock toggling and active power. Design compilers (DC) try to optimize the path with the worst violations by default, according to question S26. Is there anything that can be done to make it work on more paths than just the ones that are worse? Answer S26): You can do this by using the group_path command. If one wants DC to focus on a slack range, they can specify critical_range on that group. 30 Timing Exceptions(Overrides). What are multi cycle paths, and why are they different from single cycle paths by default? Here is what it really means. Memory components in digital circuits, such as flip-flops or latches, launch new data at the start of the clock cycle. Combinational logic is used to carry out the actual computation during the clock cycle, and when it is finished, data is ready and is read into the following memory element at the rising edge of the subsequent clock cycle, which is the same as when the current clock cycle ends. Following figure illustrates this. Figure TE1a. Single cycle (default) timing path As depicted in the figure, each time the rising edge of a clock cycle occurs, the launching flop generates a fresh set of data at its output pin Q. Similarly, every rising edge of the clock cycle is sampled by the capture flop. The data launched on rising edge “1” (shown in red) is supposed to be captured by capture edge “1” (shown in blue), as can be seen in the figure. The launch edge “2” corresponds to the capture edge “2,” and so on. This is called a single cycle timing path. From the data’s launch to its capture, there is one clock cycle. Timing tools automatically assume that this is how the circuit will behave. Timing software will carry out a setup check in relation to a capture clock edge, which occurs one clock cycle after the launch clock edge. But this may not be the case every time. It frequently occurs that there is a combinational delay of more than one clock cycle between the launch flop and the capture flop. In these circumstances, it is impossible to continue launching data at the start of every 31 clock cycles and expect to capture accurate data at the end of each clock cycle. Data launched at the start of a clock cycle in these circumstances simply won’t reach the capture edge at the end of the clock cycle. When this occurs, the circuit designer must take this into consideration when creating the circuit. The circuit designer must design the circuit so that data is launched from the launch flop every other clock cycle rather than every clock cycle if the combinational delay from the launch flop to the capture flop is greater than one clock cycle but less than two clock cycles. Additionally, the information launched at the start of a clock cycle is captured not after one clock cycle but rather after two clock cycles. Following figure depicts this. Figure TE1b. Multi(2) cycle timing path. Let’s assume we have a circuit, as shown in the figure, where we know that the combinational delay from the launch flop to the capture flop is more than one clock cycle but considerably less than two clock cycles, so that it can comfortably meet setup time requirements in two clock cycles but not in one clock cycle. The data launched at launch clock “1” can be seen in the figure arriving at the capture flop (Data to be captured (D)) after roughly one and a half clock cycles. Timing tools by default believe that all timing paths are one clock cycle long, as was previously mentioned. In other words, the timing tool will believe that the data needs to be captured at the capture edge, which is one clock cycle after the launch edge and is represented by the capture edge in the figure with the black rising arrow, if the data was launched at launch clock “1”. Timing tool by default checks setup with respect to the capture edge depicted in Figure with the Black Rising Arrow, and reports that input data to the capture flop (Data to be captured (D)), fails the setup to the capture flop as it arrives after the capture edge. This check is shown in figure with dotted line. In reality we know that this is false setup check. The capture edge indicated by the blue color should be checked during the capture flop input setup. As was previously mentioned, our design here is such that we anticipate data to travel from launch flop to capture flop to take two clock cycles, and we have designed our circuit such that launch flop doesn’t launch new data every clock cycle but rather every other clock cycle, as indicated by the red color launch edges. In this situation, we need to give the timing tool an exception or override and instruct it to delay performing its default setup check by one clock cycle. To put it another way, we must request from the timing tool one extra clock cycle for the setup check. Usually this is achieved by something like following. set_multi_cycle 2 -from -to, where ‘2’ denotes the number of clock cycles. When we want timing tool to use two clock cycles, it is instructed to do so rather than just using the default value of one. Question TE2). False paths are what? (TE2) Static timing analysis is by its very nature exhaustive. A timing tool will thoroughly examine all potential timing paths and run timing checks. As a result, it will also carry out timing checks on timing paths that are improbable. Best way to understand this is by examples. Consider the circuit described in the below. 33 Figure TE2a. False timing path. In digital circuits, this kind of circuit configuration is very common. Test clock is active only in test mode, while functional clock is only active in functional mode. As a result, when a timing path begins with a functional clock launching data at the functional flop’s (FF) output QF, the receiving flop (RF) should capture it and the capture clock should only be a functional clock. Because timing tools are exhaustive, they will also time a path where functional clock launches data at the QF output of the function flop (FF) and it is received at the D input of the receiving flop (RF) via test_clk. This timing path is false and never occurs because the functional clock and test clock are not operating simultaneously. Only functional clock and not test clock can sample data when it is launched by functional clock at the QF output of the functional flop (FF) and received at the D input of the receiving flop (RF). Examine the following circuit to illustrate this further. Figure TE2b. One more false path. A mux is used in the aforementioned figure to switch between the functional clock and test clock. Only the functional clock is active while the test clock is off in functional mode. Only the test clock is running when the functional clock is off. Only two valid timing paths are available for the circuit in question. The first timing path is where the functional clock launches the data at the launch flop’s (LF) Q output and then captures it again at the capture flop’s (CF) input D. Second timing path is similar but with test clock, i. e. where test clock captures data at capture flop (CF) input D and launches data at Q output of launch flop (LF). However, because static timing analysis is so thorough, timing tools automatically generate four timing paths. 1) Functional clock launch => Functional clock capture. 2) Functional clock launch => Test clock capture. 3) Test clock launch => Test clock capture. 4) Test clock launch => Functional clock capture. As you can see, only paths 1 and 3 are legitimate, while routes 2 and 4 are erroneous. To address these false paths, a clear exception or override needs to be provided to the timing tool. What happens if a multicycle exception is only provided for setup or maximum time in PrimeTime? Answer: Generally speaking, it is not sufficient to just provide a multicycle exception for maximum only in PrimeTime. Since hold timing check is related to setup check, as was previously discussed, providing a multicycle exception for setup only modifies the setup check’s default behavior, which in turn modifies the default behavior of hold check. As a result, we also need to provide a multicycle exception for hold time, but this exception serves more as a corrective behavior. Following diagram should clarify this. 35 Figure TE3. Multicycle setup only exception problem. As you can see in the figure, the first waveform is completely consistent with what we have previously examined. Second waveform is with setup only multicycle exception. The capture edge for the setup check is pushed out by the number of additional cycles specified in the exception when a multicycle exception with the -setup option is provided in PrimeTime-like tools. In the illustration above, an additional cycle is specified so that the capture edge is advanced by one cycle. The actual multicycle exception would have looked like this: set_multicycle_exception 1 -setup where ‘1’ denotes the number of additional cycles we want to permit, in this case one additional cycle. Remember this number is in additional to default ‘1’ cycle. Setting the multicycle exception by one more clock cycle gives the setup check a total of two clock cycles. Now that we are aware, hold checks are conducted in relation to the setup check. As shown by the green dotted lines, new hold check options are derived from the new exception-based setup check. Both hold checks are violating by about a cycle, as you can see. Is this violation real ? Not typically. We still have back-to-back rising edge triggered flops in our design, and their likely cell and interconnect delay is significant enough to call for a multicycle exception. However, since launched data is intended to be captured at the end of a clock cycle, we still want the hold check to reflect the normal timing check to make sure that it does not rush through to the same cycle capture edge. This indicates that even with the setup multicycle exception, we want the hold check to appear as a simple hold check without any 36 exceptions, similar to the first set of waveforms. It is obvious that the hold check with multicycle setup exception that the tool inferred is incorrect. We actually need the launch edges for the hold check to be pushed out by one cycle, as you can see if you look at the incorrect hold checks. Which is what a multicycle exception with -hold option does. In light of this, we draw the conclusion that whenever we use the multicycle exception with the -setup option, we must also add the multicycle exception with the -hold option, along with an equal number of additional cycles. The most recent waveforms demonstrate that when both the setup and hold exceptions are in place, we receive the proper setup and hold timing checks. 37 Signal Integrity. What is signal integrity, and why is it important for integrated circuit designers to be aware of it? Question SI1): What is cross-coupling effect or cross-talk on wires? This effect may cause your signal to lose its integrity, which could lead to the capture of inaccurate signal data. On integrated circuits, insulating material is placed between adjacent wires as they are routed. This creates the capacitors, and the way one wire switches affects how other wires switch. In the same way that one wire can couple high to low with other wires, if a signal is rising, or moving from low to high value on one wire, it will couple with neighboring wires in the same direction. This coupling effect may cause the signal to lose its validity and integrity. Figure SI1. Wire to wire coupling. Wires “a” and “b” are routed next to one another as shown in the aforementioned diagram, and they carry the signals “a” and “b,” respectively. Let’s assume that signal b should remain weak throughout the observation period. As depicted in the figure, signal “a” will couple with wire “b” if it switches and goes from low to high. This coupling effect will cause signal “b,” which is typically at low level, to begin rising. Despite the fact that signal “b” won’t be rising for very long There are two reasons for that. Signal “a” coupling with signal “b” is a momentary event; signal “a” will only couple with signal “b” while it is rising; once signal “a” stops rising, the effect is gone, and signal “b” will no longer be propelled to rise by any additional effects. The second reason is that when signal “b” is low, it means that the nmos device is turned on at the driver of signal “b,” and as soon as a new charge on wire “b” appears due to neighbor coupling, nmos device will discharge wire “b” and pull it down to low level. As you can see, the capture flip-flop will capture the incorrect value for signal “b” if the clock’s capture edge for signal “b” occurs to be very close to the coupling glitch on signal “b.” As you can see, the capture flop actually captured a high value for the signal “b,” which means that the signal’s integrity has now been compromised. The following variables will affect the size and length of the coupling glitch seen on signal “b,” among others. – Capacitance between two wires. – The driver’s force for signal “a” and the signal “a”‘s slope or slew rate. – Strength of the driver for signal ‘b’. Keep in mind that this is just one of the occurrences that results in the loss of signal integrity. Other factors, such as power droop and propagated glitch, can also result in the loss of signal integrity. How can I resolve a crosstalk issue? (Question SI2) (Reply SI2): Layout tools will consider crosstalk analysis when you start routing. You can instruct your physical layout tool to omit high frequency nets from adjacent routes, such as clock trees. You can obtain some sort of cross talk reports once you have a routed database, and then you can fix it in the layout tool. You could route the nets in a jogged formation so that the adjacent area between the victim and the aggressor is comparatively less dense, or you could add stronger buffers to the victim nets (of course, that can affect your timing). How is signal integrity (SI) related to timing, and how can timing be improved as a result of SI effects? Question SI3: How is shrinking technology related to cross-coupled capacitance between metal layers? Answer SI3 As a result, a signal that is actively switching may cause a victim network signal to experience a “cross talk glitch” or “cross talk delay.” Consider a situation where one net is being driven by a strong driver and the other net (victim) is being driven by a driver who is relatively weaker. If the aggressor transitions from low to high while the victim transitions from high to low, the victim will switch from high to low a little later as a result. Imagine that the aggressor pulled it in the same direction as itself to prevent it from immediately transitioning from high to low. Failures may result as a result of the victim net’s increasing delay as it moves from high to low in the logic cone. This is called “crosstalk delta delay”. 39 In a different scenario, the aggressor switches from a low to a high position if the victim is weak. For a brief period of time, this will cause the victim net’s state to be pulled from low to high, resulting in a glitch known as a “crosstalk glitch.” This error can corrupt the state machine and be caught by a flop further down. The amount of delay or the height of the glitch in either case of crosstalk depends on the cross coupling capacitance, the strength of the aggressor, and the strength of the victim. Cross coupling capacitance will increase the closer the aggressor and victim are. Stronger the aggressor, more coupling effect will be. Stronger the victim driver, the less coupling effect it will experience because it will be able to restore the victim net’s original logic level more quickly and recover from glitches more quickly. One can increase spacing to reduce cross coupling. To minimize the cross coupling timing slowdown or the cross talk glitch, one can either increase the victim drive strength or decrease the aggressor drive strength. 40 Variation. Answer V1): For timing sign-off, digital designers typically simulate circuits at extreme process corners. Question V1): What is on chip variation? Typically, that analysis makes the assumption that the corner-by-corner gate and interconnect performance on the chip or die is consistent. All cells or all interconnect across the entire chip are presumptively at the given, worst-case or best-case corner. Unfortunately, that assumption is no longer valid. Because in reality that is not the case. The variation between devices and interconnect characteristics on the same die can no longer be disregarded due to the complexity and intricate nature of deep-submicron processes. A device of a certain size is anticipated to operate at a particular speed in a particular process corner on the chip. We desire that all components on the chip of the same size operate at the same speed. In reality that is not the case. because different chip parts are actually manufactured with some variation. Different areas of the chip reveal different shapes for the actual devices. These layers are typically described by manufacturers libraries at the worst-case, average-case, and best-case corners. The final die most likely won’t have transistors with worst-case and best-case speeds on it, but it will have a significant range in the effective channel length and transistor width. Due to some manufacturing limitations, some devices that were designed to have the same width (size) actually have a slightly different width than what was intended. The following are some variations across chips or die that can have a direct impact on timing. Variations in transistor width, channel length, and threshold voltage (Vth, Le) – Interconnect variation. – IR drop variation. – Temperature variation. Some of the major sources of this variation are. – The CMP (chemical-mechanical-planarization) process. – The proximity effects in the photolithography. – The proximity effects in etch processes. The dielectric material is harder than the interconnect material due to the CMP (chemical-mechanical-planarization) process. The CMP process removes the unwanted copper from the wafer after the designer has etched trenches into the dielectric beneath an interconnect layer and copper, leaving only wire lines and vias. Because the copper line is softer than the dielectric material, there will be uneven removal of the copper and dielectric due to “dishing” and erosion. Erosion depends on line space and density, while dredging depends on line width and density Figure V1a Dishing and Erosion. Interconnect thickness varies more randomly as a result of CMP, creating a gradient across the wafer. For large die, this gradient can be seen in die-to-die variations as well as across-die variations. Electron beam lithography provides the necessary high resolution patterning for sub-micron technologies due to proximity effects in photolithography. However, the result of electron scattering in resist and substrate has an undesirable impact on the areas close to where the electron beam is exposed. This effect is called the proximity effect. Result is unintended effect on neighboring circuits. Device and interconnect characteristics vary across the chip as a result of the effects, which depend on pattern and density and vary across die. The proximity effects in etch processes. Reactive ion etching (RIE) of silicon is dependent on the total exposed area. This is called the loading effect. However, regional variations in the pattern density will result in regional variations in the etch rate in a similar manner. The microloading effect is a phenomenon brought on by a local depletion of reactive species. Fundamentally, etch rate depends on pattern density, and as pattern density varies from chip to chip or from die to die, etch rate varies as well, leading to variation. Isolated wires are 42 over etched. Wider over-etched trenches produce wider wires for isolated wires. Figure V1b. Isolated wires turn out wider. On-chip variation can be characterized as having a random and a systematic component. Critical parameter variation has a random component that varies from lot to lot, wafer to wafer, and die to die. Examples include changes in implant doses, metal or dielectric thickness, and gate-oxide thickness. The variations in the systematic component can be predicted based on the wafer’s location or the surrounding patterns’ characteristics. These variations are related to device proximity effects, device density effects, and relative device distance Examples include variations in gate channel length or width and interconnect height and width. Timing impact and delay variation: Variations affect the width of a transistor’s active area as well as the device’s effective channel length. RC-interconnect variation is one of the problems that makes timing analysis difficult. When creating a mask and in the stages of the process for active and polysilicon layers, variations in the critical dimensions of minimum-width lines can happen. Longer nets may experience significant RC variations due to changes in the width or height of connecting metal lines. Variations in wire resistance and capacitance are caused by variations in interconnect height and width. As geometries get smaller, interconnect delay in deep-submicron processes can frequently outweigh total delay. Therefore, it is important to model interconnect variation as precisely as possible. IR drop will result from metal line resistive variation, which directly impacts the standard cell delays. Even though the original intention was to have the same voltage across the chip, different parts of the chip end up having different voltages. This is due to the fact that different chip components will experience different IR drops due to various resistive variations along various paths and possibly different temperatures. Temperature changes can also affect electrical behavior, which then affects timing. Fortunately, it is uncommon to find corners of the same die operating at extreme temperatures. However, the actual operating temperatures can be affected by the uneven distribution of power on the chip, interconnect 43 heating, and the thermal properties of the die and package materials. Some techniques for reducing variation Include selecting cells that are wider than the minimum You can ensure that there is the least variation and, consequently, the least mismatch among the different subtrees by limiting the clock-tree cells to high-drive cells. Clocks. What are the primary clock distribution styles used in digital designs? Response C1: There are two primary clock distribution styles used in digital designs. 1) Clock mesh or Clock grid distribution system. 2) CTS(Clock tree synthesis), or Clock tree distribution system. The purpose of the clock mesh or clock grid distribution system is to provide a fixed amount of delay from the source of the clock (PLL) to all end receivers of the clock, which are flops, latches, macros, and clock-gathers. In other words, it seeks to reduce clock skew across all final clock receivers. By balancing the number of stages from the source to all distribution and having the same stage delay for all stages in such distribution, the clock distribution scheme attempts to achieve the same overall delay. Additionally, this distribution aims to reduce the number of stages to the absolute minimum. This figure would vary primarily based on the die’s size, the type of process technology, etc. Typically, a two step distribution can be used to describe such a distribution. The main cause is that a chip is frequently divided into blocks. There are two levels of distribution, and the PLL is located within one of the blocks. two distributions: one from the PLL to all block boundaries within the chip and one from the block boundary to the block interior. Following figure shows the first level of distribution. 44 Figure C2a. Clock distribution from PLL to blocks. The second level of distribution happens inside block. Here, a grid or mesh of clock buffers is used to distribute the clock through a fixed number of stages. A symmetric mesh or grid of final clock buffers is created within the region if clock distribution is desired. The idea is that there should be a local clock buffer close by to that receiver, regardless of where the flop or final receiver is located within that region. Another buffer that is once again a symmetric mesh or grid and drives the final clock buffer is less dense because it only needs to drive the symmetrically arranged grid of last stage drivers. Below figure illustrates the idea of the clock mesh. Lets assume that figure represents 45 Figure C2b. Clock mesh distribution. the region or block denoted by the letter “F” where final clock receivers (flops) are dispersed throughout the area. As you can see, the placement of the letter “F” may have been done at random. The region’s starting point for the distribution is shown in this figure by the green buffer. The green buffer stage is the first stage, and there are typically only a few stages of clock buffers driving from the PLL. The green inverter drives all of the red inverters, which are symmetrically positioned with respect to the green drivers to balance delays. Blue drivers follow red drivers, who are then symmetrically positioned in relation to blue drivers for balancing purposes. Finally blue drivers drive the nearby flops. This method’s goal is to evenly distribute the clock delay from the green driver to all of the flop receivers. As you can see, the delay to all flop receivers will be very close, but not exactly the same due to a number of factors, such as device variation, etc. To reduce coupling effects and variation brought on by coupling effects, clock routes are typically shielded. Clock tree distribution system differs from clock mesh distribution at the block level in terms of how clock is distributed to all final clock receivers, according to response C3. The main distinction is that a clock buffer tree is constructed at the block input from the source (main) clock driver to all receivers. This clock tree aims to have the most branch stages possible leading to the clock receiver. 46 Figure C3. Clock tree distribution. As you can see, different receivers may require a different number of clock inverter/buffer stages. The goal is to have an optimal number of stages along each branch rather than the same number of stages for all receivers. This alternative was chosen because, despite the fact that clock mesh provides significantly better skew, it requires more devices to achieve balancing and frequently shorts the output of specific stages to reduce skew. Clock tree distribution is typically used in your design for relatively slower clocks. It should be noted that while clock skew is not entirely disregarded by clock tree distribution, it does not make an effort to balance clock delays to the receivers. Question C4): Explain CTS (Clock Tree Synthesis) flow. Answer C4: Clock tree synthesis and distribution are frequently confused terms. They are two different things. The process of creating the clock tree distribution is called clock tree synthesis. CTS flow’s objective is to reduce clock skew and clock insertion delay. The actual clock distribution tree is created in this flow. Before CTS timing tools use ideal clock arrival times. Real clock arrival times are used after CTS due to the availability of the real clock distribution tree. What is clock skew, and why is it important in synchronous circuit design? Question C5): Static timing analysis is made simpler when all sequential receivers of the clock arrive at the same time. With the help of a balanced clock distribution scheme, we try to accomplish this. The clock arrives at different times at different clock receivers in the design because we can’t get the exact same clock arrival time at all receivers. It’s known as “clock skew” when a clock strikes different hours at different locations. The majority of clock skew is unintentional, but it may also be done on purpose. 47 Design restrictions are one of the primary causes of the clock’s skew. We want the clock to arrive at all receivers at the same time, but it doesn’t happen for a variety of reasons, including device delay variation due to threshold voltage and channel length variation, on-chip device variation, different interconnect/wire delays, interconnect delay variation, temperature variation, capacitive coupling, variable receiver load, and poor clock distribution tree design In industry clock skew is commonly referred as clock uncertainty. Sometimes the clock skew is introduced intentionally in the design. By delaying the clock’s capturing edge at the cost of more power loss, clock skew can help correct setup violations. The questions regarding how to correct setup violations contain more information. Skew could help or hurt in your design. If the clock actually arrives at a sampling element later than anticipated and the previous sampling element’s data is only slightly delayed, new data from the latter sampling element may rush through and unintentionally be captured at the sampling element where the clock arrives later. The late arriving clock compared to the data can also assist in meeting setup requirements at the sampling element if there is sufficient data delay from the previous sampling element to the current element. The false data capture caused by the fast data and the late clock is shown in the following figure. 48 Figure C5. As you can see, the input to the series of back-to-back flops is “din.” False data capture resulted from a late clock (clock skew). Initial input “din” is low, and just prior to the second rising clock edge, it rises. It is necessary for the first flop to capture the high value of “din,” which it successfully does because “din” satisfies the setup time. Immediately following clk1’s ascent (second edge), ‘din1’ rises as it follows ‘din’ There is a specific delay between din1 and din2, so “din2” follows “din1” after that delay. The second flop, which is the third rising edge of clk2, is supposed to catch this high-going edge of “din2.” That is how a back-to-back flop in a typical digital synchronous design should operate. However, what actually occurs is that second flop records the increasing value of “din2” at the second rising edge of the clock, which is incorrect information for second flop to record. Now subsequent calculations and the state machine enter an unknown state because the state at the second flop’s output is incorrect. This is the type of damage clock skew can do. As previously stated, we will address deliberate clock skew to aid in setup 49 failures in a different question. What is clock-gating? (Question C6) Clock-gating is a power-saving method. A logic gate (AND) is added to the clock net in synchronous circuits, and its other input can be used to switch the clock off for some receiving sequentials that are not in use, saving power. Figure C6 Gated Clock. Question C7). Answer C7) As you can see in the figure, clock gating allows us to mask specific clock pulses, or to put it another way, we can regulate clock toggling activity. In most cases, the chip’s fanout signal for the clock is very high. Because it typically toggles continuously and drives a lot of elements, it accounts for a significant amount of the chip’s dynamic power consumption. The most efficient dynamic power-saving mechanism is the ability to turn off clock toggling when not needed. Clock gating must be avoided from a timing perspective in order to prevent glitches and changes in the clock pulse’s shape. There are setup and hold checks to ensure this. Clock Reconvergence Pessimism Removal is what CRPR stands for and what it means, according to the response to question C8. Static timing analysis is a worst case analysis. It uses the slowest launch clock network delay, the slowest launch flop clock to Q, and the slowest path delay from the launch flop to Q pin to the capture flop D pin for setup analysis. Additionally, the capture clock uses the quickest clock network delay possible. This way it tries to worst case whole analysis. The above 50 worst case scenarios are pessimistic if the launch and capture clock networks share a common path because on a common path, neither the slowest delay nor the fastest delay can occur at the same time. Figure C8. CRPR For max analysis STA tool will do following. From A through B to L, use the slowest clock network delay. From A through B to C, use the fastest clock network delay. We are aware that using the slowest launch speed and the fastest capture speed for common segments A through B is unfavorable because there will only be one type of delay. The STA tool will award credit for the difference between the slowest delay from point A to point B and the fastest delay from point A to point B. This credit will be applied during the timing analysis. This is true for min timing analysis as well. 51 Metastability. What is metastability and how does it affect flip-flops? (Question M1): A flip-flop enters a state of unpredictable output whenever setup or hold time violations occur. This state of unpredictable output is known as metastable state. Understanding the operation of a latch or flop is necessary to comprehend the cause of metastability. A latch (or flop) has an inverter feedback loop that serves as a memory element, as shown in the following figure. Figure M1a. If you can remember the inverter voltage transfer curve, it resembles this. Figure M1b. Inverter voltage transfer curve. The voltage transfer curve superimposes for the inverter loop in the latch above, as shown in the figure below. 52 Figure M1c. Inverter loop voltage transfer curve. The stable points for the inverter loop are where two curves intersect. It has stability points along the X and Y axes, as can be seen. In those scenarios, the inverter loop’s input is either at “0” or Vmax voltage. You’ll see that the transfer curves intersect at one more place. This occurs when the inverter loop’s input voltage is at Vmax/2. This is the metastable point along the curve. Here, the loop may become stuck for a very long time. However, this point is not the inverters loop’s most stable point. The curve’s most stable points are those along the axis where the input is either “0” or “Vmax.” The loop will eventually converge to the most stable points, depending on the sizes of the investors. The input value is driven through the latch’s input pin “D” when a value needs to be written into it, and the pass gate is left open. As input is strongly driven through pin “D,” this results in the latch node “I” being written as either “0” or “1.” Latch node “I” may become stuck close to Vmax/2 value, which is a metastable point, if the pass gate is earlier turned off while the node has not fully reached “0” level or Vmax level. The input must be stable for a predetermined amount of time before the pass gate closes in order to prevent this from happening. Pass gate closes when the clock arrives. This is known as the setup time for the latch, as we will discover in another question. Metastable state is also called quasi stable state. The flip-flop eventually enters a metastable state and settles to either 1 or 0. The whole process is known as metastability. Answer M2) 53 By making sure that input data complies with setup and hold specifications, we can ensure that metastability is avoided. Occasionally, it is impossible to guarantee that setup and hold requirements will be met, especially when the signal generator uses a different clock domain than the sampling clock. In these circumstances, we allocate additional clock timing cycles and pair back-to-back flip-flops to sample the data. A metastability hardened flop is what is referred to as such a string of consecutive flops. Figure M2. Input q to the first flop clocked by clkb changes right when the clock is rising by violating the setup time of this flop, as shown in the figure. The flip-flop becomes metastable as a result during the first sampling clock cycle, and we give the flip-flop a full sampling clock cycle to recover. First flop recovers to the correct value within the first clock cycle. We then record the correct value at output second flip-flop at the start of the second clock cycle, and qs is the correctly synchronized value then available for clkb. If the first flop returns to the incorrect stage, we must wait for another cycle. e. beginning of the third sampling clock cycle to record the right value Three flip-flops in series can be used when the first flop occasionally requires more than one sampling clock cycle to recover to a stable value. At the cost of more cycles, adding more flops to a series lowers the likelihood that the output will not contain the correct value. 54 Response to Question M3): There are two ways to synchronize between two clock domains. 1) Asynchronous FIFO, 2) Synchronizer. Question M4): Explain the working of a FIFO. Answer M4: High throughput asynchronous data transfer is accomplished using FIFO. If high performance is needed when sending data from one domain to another domain, a simple synchronizer (Metaflop) won’t cut it. You develop a storage element and a fairly complex handshaking scheme for control signals to facilitate the transfer because you can’t afford to lose clock cycles (in a synchronizer, you simply wait for additional clock cycles until you can guarantee metastability free operation). There are two interfaces on an asynchronous FIFO: one is for writing data into the FIFO and the other is for reading data out of the FIFO. It has two clocks: a writing clock and a reading clock. Block B reads the data from the FIFO after Block A writes the data there. FIFO full and FIFO empty signals enable error-free operations. These signals are generated with respect to the corresponding clock. FIFO full signal is used by block A (when FIFO is full, we dont want block A to write data into FIFO, this data will be lost), so it will be driven by the write clock. However, keep in mind that since these domains are asynchronous to one another, these control signals must be synchronized through the synchronizer. Similarly, FIFO empty will be driven by the read clock. Asynchronous FIFO is used in situations where performance is more important, when one doesn’t want to waste clock cycles on the handshake, and when there are more resources available. Here, read clock refers to block B clock, and write clock refers to block A clock. Figure M4. Asynchronous FIFO. 55 Question D27): How is FIFO depth/size determined? Answer D27): Write and read data rates, clock skew, and both read and write clock domain frequencies affect FIFO size. The two clock domain’s operation, frequency, and requirement all affect data rate. The maximum data rate for writing operations and the minimum data rate for reading operations must be handled by FIFO. 56 Miscellaneous. Question MI1): Design a flip flop using a mux. There is a two-step solution to this problem (MI1). Two latches are used in the master-slave design of a flip-flop. To design a flip-flop, two latches are connected back to back, with the first latch serving as the master and the second latch as the slave. Also we can make a latch using a mux. Thus, to create a flip-flop, we first create a latch using a mux and then connect two such latches back to back in a master slave configuration. Figure MI1a. Latch using 2:1 MUX. When select is set to “0,” as shown in the figure, we hold the output state as it is fed back to itself through the D0 input of the MUX if we tie the output of a 2:1 MUX to the MUX’s D0 pin. We connect the input “D” to the MUX’s D1 pin and the select line to the clock. When the clock’s select line is high, we send the value on input pin D to the output O. Let’s now investigate how a flip-flop is constructed using two latches. Figure MI1b. As depicted in the figure, a flip-flop is created by connecting two latches—a low phase latch and a high phase 57 latch—without any time passing between them. Because second latch always follows the data that the first latch captures, the first latch is referred to as the master and the second latch as the slave. In a sense, the first latch serves as a master device, and the second latch is always a slave device. Input data at the master latch’s “D” pin must be setup to the rising edge of the clock because that is when the master latch closes, or captures, the data. Since the master latch is active low phase, it is open during the low phase of the clock. It is crucial to understand that the data at input pin “D” of the master latch cannot pass transparently through the slave because, while the master latch is open during the low phase of the clock, the slave latch is closed at this time. When the clock rises, the master latch captures and transfers data from the input “D” pin to the master latch’s output. At the same time, the slave latch opens and turns transparent, allowing the data at the slave latch’s input to appear on the master-slave interface’s “Q” pin. As you can see, the only time the output “Q” pin receives a valid output from input “D” is when the clock rises. This way the positive edge triggered master slave flip-flop works. After learning how to create a latch using a 2:1 MUX and how to create a flip-flop using latches, we are now able to create a flip-flop using a 2:1 MUX in the manner shown below. Figure MI1c. Flip-flop with 2:1 MUX Answer MI2): If the clock and D input of a D flip-flop are shorted and the clock is connected to this shorted input, one can anticipate that the flip-flop will be in metastable state for the majority of the time. Because the setup and hold times for that flip flop will be violated whenever the clock and data input are linked, the data will rise whenever the clock does. Flip flops are likely to be in a metastable state for at least a very long time if setup and hold times are continuously violated. Latch to latch setup time violation is fixed similarly to flop to flop path setup time violation, where you either speed up the data from latch to latch by either optimizing logic count, speeding up the gate delays, or speeding up the wire delays, or both. Gates can be made larger to speed up processing, and wires can be made faster by being promoted to higher layers, having their width increased, or having their spacing or shielding increased. Timing problems can also be resolved by slowing down the sampling clock or speeding up the generating clock. Inherent protection against a phase or half of a clock cycle is present for latch to latch hold violations. Question MI4: What distinguishes a latch from a flip-flop? Response MI4: A latch is a level-sensitive device, whereas a flip-flop is an edge-sensitive device. A D flip-flop is actually constructed using two back-to-back latches in a master-slave configuration. To create a rising edge sensitive D flip-flop, a low level master latch is followed by a high level slave latch. Compared to flip-flop, latch uses fewer components and consumes less power, but latch is susceptible to errors while flip-flop is not. What happens to delay if load capacitance is increased? (Question MI5) Usually, the device slows down if load capacitance is increased. Three factors affect device delay: 1) the device’s strength, which is typically its width; 2) the input slope or slew rate; and 3) the output load capacitance. Increasing strength or the width increases self load as well. STA tool reports a hold violation on the circuit in question MI6) What would you do ? Answer MI6) Figure MI6. Hold/Min delay requirement for a flop. You’ll see 59 hold violation typically occurs when the generating clock and sampling clock are physically separate clocks and because there is typically a large clock skew between the generating flop and the clock to sampling flop if you review the previous question about hold violation failure. In this case, we’re referring to the hold violation detected by the tool, where the timing path begins at the CLK pin and ends at the D input pin of the same flop after passing through the Q pin, buffer, and MUX. Evidently, creating a CLK edge and sampling a CLK edge are two different edges. Given that we are referring to the same CLK edge, this path would never actually violate a hold. STA tools frequently have limitations, but they are not always aware of this. We’ll never have a hold violation because the data is released by the very active edge of CLK, against which the hold check is done, as long as the total delay for CLK -> Q, buffer, and MUX is greater than the intrinsic hold requirement of the flop. Recall from the previous hold time question that sequentials (flop or latch) have an intrinsic hold time requirement, which would typically be greater than zero ps. The crucial point to grasp is that we are discussing the same CLK edge, so there is no CLK skew and no hold violation. What is the maximum fan out of a typical CMOS gate, question MI7? Or alternatively, discuss the limiting factors. Answer MI7: The ratio of the load capacitance (the capacitance it is driving) to the input gate capacitance determines fanout for CMOS gates. The fanout is determined to be the ratio of the size of the driven gate to the size of the driver gate because capacitance is inversely proportional to gate size. The load capacitance and how quickly the driving gate can charge and discharge the load capacitance determine how far a CMOS gate can fan out. Digital circuits are mainly about speed and power tradeoff. Simply put, the range of the CMOS gate load should allow the driving gate to charge or discharge the load in a reasonable amount of time with a reasonable amount of power dissipation. The CMOS gate delay models can be used to determine the typical fanout value. Some of the CMOS gate models are extremely complex. Luckily there are simplistic delay models, which are fairly accurate. In order to understand this problem, we will examine an overly simplistic delay model. We can’t really assume that a transistor is a resistor when it is ON because we know that CMOS transistor I-V curves are not linear, but as was already mentioned, we can assume that a transistor is a resistor in a simplified model for understanding. Following figure shows a NMOS and a PMOS device. Let’s assume that the NMOS 60 device has a unit gate width of “W,” and that its resistance is “R.” With very recent process technologies, the P/N ratio to get the same rise and fall delay is getting close to 1/1, so if we were to assume that the mobility of electrons is double that of holes, that would give us an approximate P/N ratio of 2/1 to achieve the same delay. In other words, a PMOS device must be twice as wide as an NMOS device in order to achieve the same resistance, or R. Because of this, the device must be “2W” wide in order to obtain resistance “R” through the PMOS device. Figure MI7a. Our model inverter has NMOS with a width of “W” and PMOS with a width of “2W,” with equal rise and fall delays. We are aware that gate capacitance and gate width are directly proportional. Let’s also assume that the gate capacitance is C for width W. This indicates that the NMOS gate capacitance of our device is C and the PMOS gate capacitance is 2C. Let’s assume that the diffusion capacitance of transistors is zero once more for the sake of simplicity. Let’s assume that an inverter with a gate width of “W” drives an inverter with a gate width of “a” times the driver transistor’s width. This multiplier ‘a’ is our fanout. Given that gate capacitance is inversely proportional to gate width, the NMOS gate capacitance for the receiver inverter (load inverter) would be a*C. Figure MI7b. Let’s now represent this back-to-back inverter in terms of their R and C only models: unit size inverter driving ‘a’ size inverter Figure MI7c. Elmore’s delay approximation can be used to calculate the delay at the driver output node for the inverter R&C model of this RC circuit. If you can remember the Elmore delay model, starting with the first node of interest and moving downstream along the desired path will allow you to find the total delay through multiple nodes in a circuit. Stop along the path at each node, calculate the total resistance running from that node to VDD/VSS, and then multiply that resistance by the node’s total capacitance. Sum up such R and C product for all nodes. In our circuit, there is only one node of interest. That, or the termination of resistance R, is the driver inverter output. In this instance, total capacitance on the node is ‘aC+2aC=3aC’ and total resistance from the node to VDD/VSS is ‘R’. In order to find the typical value of the fanout “a,” we can construct a circuit with a chain of back-to-back inverters, as shown in the following circuit. Figure MI7d. Chain of inverters. The goal is to use the inverter chain to drive load CL with the least amount of delay possible. Let’s assume that the first inverter’s input capacitance is ‘C,’ as shown in the figure with unit width. Inverter width would be “a” after fanout, and so on. As a function of CL and C, the number of inverters along the path can be depicted as follows. Total delay along the chain D = Total inverters along the chain * Delay of each inverter Total inverters along the chain D = Loga(CL/C) = ln(CL/C)/ln(a) The delay through the driver inverter is 3aRC for a back-to-back inverter with input gate capacitance of “C” and fanout ratio of “a,” as previously discovered. If we want to determine the minimum value of the total delay function for a particular value of fanout “a,” we must take the derivative of “total delay” with respect to “a” and set it to zero. This provides us with the minimum of the “total delay” in relation to “a.” D = 3*RC*ln(CL/C)*a/ln(a) dD/da = 3*RC*ln(CL/C)[(ln(a)-1)/ln2(a)] = 0 For this to be true, (ln(a)-1) = 0 which indicates that ln(a) = 1, the root of which is a = e. We arrive at the fanout of ‘e’ as an ideal fanout for a chain of inverters in this way. Plotting the value of the total delay “D” against “a” for such an inverter chain yields the following results. 63 Figure MI7e. Total delay v/s Fanout graph As you can see in the graph, the shortest delay is obtained by using a series of inverters centered on the ratio e. The zero diffusion capacitance was one of many simplifying assumptions that we made. Even after making the inverter delay model extremely accurate, the graph still has a similar contour. What actually occurs is that the delay ranges from less than 5% to fanouts of 2 to 6. Due to this, a fanout of 2 to 6 is used in practice, with ideal being close to ‘e’. One more thing to keep in mind is that we assumed an inverter chain here. In reality, a gate driving a long wire is quite common. The theory nonetheless holds true; all that is required is that one determine the effective wire capacitance that the driving gate observes and use that to calculate the fanout ratio. What are the different types of RC delay models? Question MI8: How are RC delays modeled by tools? Answer MI8: RC delays are modeled as Pi models of varying degrees of accuracy. Most popular model for RC network is Elmore delay model. The following illustrates the RC structure if you assume that your RC network is made up of a Pi segment of R resistance and Capacitance. 64 Figure MI8. Elmore delay model. Total Delay at node B = R1C1 + (R1+R2)C2 is the equation for Elmore RC delay for the image above, question MI9. If this modular structure is extended, delay at the last node can be represented as follows. Total Delay at node N is equal to R1C1 plus (R1+R2)C2 plus (R1+R2+R3)C3 plus… (R1+R2+. What is Statistical STA? +RN)CN Response to MI10: Statistical STA is a distinction between statistical and deterministic STA. Tradition STA is deterministic STA. Since gate delays and interconnect delays in traditional STA are deterministic, we can determine with certainty whether the circuit will operate at a given frequency at the conclusion of the analysis. Modeling accurate variation in die or on chip is one area where traditional STA falls short. On chip variation is becoming increasingly prevalent as geometry is rapidly shrinking. Traditional STA models for in-die or on-chip variation using worst case analysis, clock uncertainty, clock and data derating, and explicit margins. Since it is impractical to assume that all devices on the die are experiencing the worst case scenario at once, worst case analyses tend to be very pessimistic. The variation effect tends to be reduced for clock trees and data branches as the stage counts rise. This is where the statistical STA comes into picture. By offering a statistical method for timing analysis that addresses the pessimism in modeling variation, statistical STA attempts to solve this modeling problem. Timing criticality of the 65 circuit is expressed in terms of probability density functions because STA gate and interconnect delays and primary input arrival times are not modelled as deterministic values but rather as random variables. What is STA? Setup and Hold Time Violations 66 Table of Contents Timing Exceptions(Overrides). Signal Integrity. Variation. Clocks. Metastability. Miscellaneous. 3 4 31 38 41 44 52 57 67.

FAQ

How do you perform a static timing analysis?

STA also considers the following types of paths for timing analysis:
  1. Clock path. a route used for data setup and hold checks that goes from a clock input port or cell pin through one or more buffers or inverters to the clock pin of a sequential element.
  2. Clock-gating path. …
  3. Asynchronous path.

What is slack in static timing analysis?

Slack is the degree to which a deadline is met or not. A requirement’s margin of fulfillment is indicated by positive slack, and its margin of non-fulfillment is indicated by negative slack.

How can you avoid setup time violations Mcq?

Slowing the clock (expanding the clock’s time period) will stop setup violations.

What do you mean by timing analysis?

In order to determine whether the timing constraints imposed by components or interfaces are met, I say timing analysis is the methodical analysis of a digital circuit. This usually means that you are attempting to adhere to all set-up, hold, and pulse-width requirements.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *