-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cpu-o3: add stats for BPU override bubbles and UFTB pred/update #263
base: xs-dev
Are you sure you want to change the base?
Conversation
* TODO: Ensure `overrideByL1` and `overrideByL2` are accurate when `enableTwoTaken` is enabled, as they are currently accurate only when `enableTwoTaken` is disabled Change-Id: I64d7a3e9586a0db3d12d7b7d1604ebc9d5bdabdc
src/cpu/pred/ftb/ftb.cc
Outdated
DPRINTF(FTB, "FTB: Looking up FTB entry index %#lx tag %#lx miss, ftb is not full\n", ftb_idx, ftb_tag); | ||
break; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool ftb_is_full = (ftb[ftb_idx].size() >= numWays);
if (ftb_is_full) {
ftbStats.predMissWhenFull++;
DPRINTF(FTB, "FTB: Looking up FTB entry index %#lx tag %#lx miss, ftb is full\n", ftb_idx, ftb_tag);
} else {
ftbStats.predMissWhenNotFull++;
DPRINTF(FTB, "FTB: Looking up FTB entry index %#lx tag %#lx miss, ftb is not full\n", ftb_idx, ftb_tag);
}
I prefer this.
Change-Id: I25fd082476f3a41607de11c96d3f588c3927eff6
9196bb1
to
73a96ec
Compare
Change-Id: I4b5669f3fe774c0c8bdf28f00b6aaa894a84cc1b
So far, xs-gem5 has not yet implemented sv48 (though I will find time to implement it as soon as possible). To avoid situations where slices with sv48 cause deadlocks and make debugging difficult, a function has been added to identify sv48 addresses based on their structure. If the first five addresses appear to be sv48, a warning will be issued. However, since the determination is based on address patterns and is not entirely accurate, the address will still be sent to the MMU for translation. The warning serves only as a reminder, and if a deadlock occurs, developers may want to consider whether the slice is using sv48. Change-Id: Ie5883e7a16860dde592009263e26310729fcf721
numBr must be a power of 2, see getShuffledBrIndex() find it by valgrind Change-Id: I6358ac985bae974e356531c5258e7c4f117b1d42
larger tage reduce MPKI reference pull 241 Change-Id: I671fbb39d85cd0780c6ddaffc19868030c846109
add init.sh to build DRAMsim3 easily. Update README: add a simple example to run coremark. Change-Id: Id964442cd5eabfecc0aebb506d784ef5ca3e20c1
Change-Id: I58448dc970fa56cb0fca5b34bd15a990b8e57799
Change-Id: Iafccda7d47bce2a2fe6fbe37b8fc84dc8cde1c2b
If this instruction is cancelled, the associated wake event should be descheduled. Change-Id: I595541aa5f96163350aa5f6e3825f78520a0e660
Transform the load/store execution logic into a multi-stage pipeline form Change-Id: Iaf7558ad75ed8fe2bbf4a776359db113b6126453
Originally, the fence instruction will be dispatched to mem's dispatchQueue, but its opType is No_OpClass, which will cause it to wait for the integer issue queue IQ2(IntMisc) to have free items before it can continue execution. If the subsequent instructions of the fence instruction occupy the intIQ2, the fence cannot be executed and cpu stucks. Therefore, change the opType of the fence instruction to MemReadOp to prevent this situation (in fact, the fence will not be dispatched to IQ) Change-Id: Ie38a901e038db9906c43f78675e69391e847c88b
Now initiateAcc only does tlb access and is located at s0 of the load/store pipeline. Load makes cache access and query violations at s1, receives the cache response at s2, and writes back at s3. Store updates sq and query violations at s1, and writes back at s4. AMO operations are now executed using `executeAmo`. Change-Id: Iac678b7de3a690329f279c70fdcd22be4ed22715
This commit is only for normal load. The uncache/amo load is the same as the original process. Change-Id: Idc98ee18a6e94a39774ebba0f772820699b834de
Add a fence before and after the LRSC instruction. Change-Id: I66021d0a5a653d2a7e30cd262166363a84184ed6
Change-Id: Ifc1a586df8beab65772d48a75106155f9e723cba
Adjust cache miss load replay logic: replay all loads cannot get data at load s2, now we don’t need cache to send `sendCustomSignal` when miss. Add RAW nuke replay at load s1&s2 Move most of the writeback logic to load s2 and actually writeback at s3 Change-Id: Idfd3480969958826f4820349168f17c9522f791e
set `EnableLdMissReplay` to True to enable replaying missed load from replayQ set `EnablePipeNukeCheck` to True to detect raw nuke replay in loadpipe NOTE: if `Enableldmissreplay` is False, `EnablePipeNukeCheck` can't be set as True Change-Id: Ic4235bffba01d5dc4c39cec8ae92f2d27b28d98a
store writeback at S4 by default when using --ideal-kmhv3, store writeback at S2 Change-Id: I6a318ff6c182daca0ab041840d76575a16e45d82
Change-Id: I5829589df8ca01724ffa4369d23d7e4693e0aea1
Previously, the delay of the write packet operation did not take into account whether the block was ready. In fact, if the block is not ready, the actual timing for the write to return the TimingResp should be delayed. Change-Id: I65de8d47e2f24ad4be867e1867cddee06092f22f
Currently, at the xbar, except sending the actual TimingResp, a Hint signal is sent N cycles in advance. (N is set by hint_wakeup_ahead_cycles in Caches.py) This Hint signal first queries the MSHR, finds all the associated load instructions, and issues a custom wake-up. Once all the custom wake-ups are received by the load instructions, they wake up the load instructions in the replayQueue. When these waken instructions reach stage s1 or s2 at load pipeline, data is forwarded from the bus. The actual TimingResp will place the data on the bus until the Dcache finally writes the data to itself then clears it. Change-Id: I8960acc14e95c06d8b1a86220f36a181588ff7f4
+ add LdPipeStages and StPipeStages parameters + remove redundant storePipeSx code + fix dumpLoadStorePipe Change-Id: Ie8cb7865c3a53265520f11f016dd467c25a3e2a5
be9283c
to
817c578
Compare
Change-Id: I23b14b776dc0a67a384fbe41efb005aa55adec1c
overrideByL1
andoverrideByL2
are accurate whenenableTwoTaken
is enabled, as they are currently accurate only whenenableTwoTaken
is disabled