Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

virt_mshv_vtl: Proxy irr filtering #609

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open
1 change: 1 addition & 0 deletions Cargo.lock
Original file line number Diff line number Diff line change
Expand Up @@ -2593,6 +2593,7 @@ name = "hcl"
version = "0.0.0"
dependencies = [
"bitfield-struct",
"bitvec",
"build_rs_guest_arch",
"fs-err",
"getrandom",
Expand Down
1 change: 1 addition & 0 deletions openhcl/hcl/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ fs-err.workspace = true
libc.workspace = true
nix = { workspace = true, features = ["ioctl"] }
bitfield-struct.workspace = true
bitvec = { workspace = true, features = ["std", "atomic"] }
safe_intrinsics.workspace = true
open_enum.workspace = true
getrandom.workspace = true
Expand Down
22 changes: 21 additions & 1 deletion openhcl/hcl/src/ioctl.rs
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ use crate::protocol::HCL_VMSA_GUEST_VSM_PAGE_OFFSET;
use crate::protocol::HCL_VMSA_PAGE_OFFSET;
use crate::protocol::MSHV_APIC_PAGE_OFFSET;
use crate::GuestVtl;
use bitvec::prelude::BitArray;
use bitvec::prelude::Lsb0;
use hvdef::hypercall::AssertVirtualInterrupt;
use hvdef::hypercall::HostVisibilityType;
use hvdef::hypercall::HvGpaRange;
Expand Down Expand Up @@ -1590,7 +1592,11 @@ impl HclVp {
isolation_type: IsolationType,
) -> Result<Self, Error> {
let fd = &hcl.mshv_vtl.file;
let run = MappedPage::new(fd, vp as i64).map_err(|e| Error::MmapVp(e, None))?;
let run: MappedPage<hcl_run> =
MappedPage::new(fd, vp as i64).map_err(|e| Error::MmapVp(e, None))?;
// SAFETY: Initializing `proxy_irr_blocked` to block all initially
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not how SAFETY comments work. SAFETY comments are intended to explain why the unsafe operation you're performing does not violate any of rust's safety rules. If you're interested I'd suggest reading chapter 1 of the nomicon (https://doc.rust-lang.org/nomicon/safe-unsafe-meaning.html). But for this case it's probably enough to look up other places where we touch the run page and mimic what they say.

let proxy_irr_blocked = unsafe { &mut *addr_of_mut!((*run.0.as_ptr()).proxy_irr_blocked) };
proxy_irr_blocked.fill(0xFFFFFFFF);

let backing = match isolation_type {
IsolationType::None | IsolationType::Vbs => BackingState::Mshv {
Expand Down Expand Up @@ -1857,6 +1863,20 @@ impl<'a, T: Backing> ProcessorRunner<'a, T> {
}
}

/// Update the `proxy_irr_blocked` in run page
pub fn update_proxy_irr_filter(&mut self, irr_filter: &BitArray<[u32; 8], Lsb0>) {
// SAFETY: `proxy_irr_blocked` is accessed in both user and kernel, but from current VP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest saying something like "SAFETY: proxy_irr_blocked is not accessed by any other VPs, so we know we have exclusive access"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is true. The kernel will access it in an interrupt context--only from the current VP yes, but we still need to use at least relaxed atomic reads/writes to access the values to satisfy the Rust and Linux kernel memory models.

Copy link
Author

@kmehtaintel kmehtaintel Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh that's right, thanks for catching this... OpenVMM has U/K model, so yes I agree, same VP running on user context can be interrupted and then kernel may access this field. Will move to atomics, like the way it has been done while accessing proxy_irr.

// By default block all (i.e. set all), and only allow (unset) given vectors
let proxy_irr_blocked =
unsafe { &mut *addr_of_mut!((*self.run.as_ptr()).proxy_irr_blocked) };
proxy_irr_blocked.fill(0xFFFFFFFF);

for irr_bit in irr_filter.iter_ones() {
tracing::debug!(irr_bit, "update_proxy_irr_filter");
proxy_irr_blocked[irr_bit >> 5] &= !(1 << (irr_bit & 0x1F));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a comment explaining this math? Why are we shifting things?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler can be relied upon to strength reduce a constant power-of-two division/remainder to a shift/and. So I'd probably write this as proxy_irr_blocked[irr_bit / 32] &= !(1 << (irr_bit % 32)) to make this clearer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having said that, it seems like you can just directly store irr_filter.into_inner().map(|v| !v) into proxy_irr_blocked rather than do this iteration. That seems a lot simpler (maybe just a little more complicated once you switch to using atomics).

I'd also suggest just taking a &[u32; 8] as a parameter rather than adding BitArray to the public interface for this crate.

}
}

/// Gets the proxy_irr_exit bitmask. This mask ensures that
/// the masked interrupts always exit to user-space, and cannot
/// be injected in the kernel. Interrupts matching this condition
Expand Down
47 changes: 46 additions & 1 deletion openhcl/virt_mshv_vtl/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ cfg_if::cfg_if!(
use processor::tdx::TdxBackedShared;
use std::arch::x86_64::CpuidResult;
use virt::CpuidLeaf;
use bitvec::prelude::BitArray;
use bitvec::prelude::Lsb0;
type IrrBitmap = BitArray<[u32; 8], Lsb0>;
} else if #[cfg(target_arch = "aarch64")] { // xtask-fmt allow-target-arch sys-crate
pub use crate::processor::mshv::arm64::HypervisorBackedArm64 as HypervisorBacked;
use hvdef::HvArm64RegisterName;
Expand Down Expand Up @@ -223,6 +226,10 @@ struct UhPartitionInner {
no_sidecar_hotplug: AtomicBool,
use_mmio_hypercalls: bool,
backing_shared: BackingShared,
#[inspect(skip)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like something we'd want wired up to inspect

#[cfg(guest_arch = "x86_64")]
// N.B For now, only one device vector table i.e. for VTL0 only
device_vector_table: RwLock<IrrBitmap>,
}

#[derive(Inspect)]
Expand Down Expand Up @@ -628,7 +635,8 @@ struct WakeReason {
message_queues: bool,
hv_start_enable_vtl_vp: bool,
intcon: bool,
#[bits(28)]
update_proxy_irr_filter: bool,
#[bits(27)]
_reserved: u32,
}

Expand All @@ -638,6 +646,8 @@ impl WakeReason {
const MESSAGE_QUEUES: Self = Self::new().with_message_queues(true);
const HV_START_ENABLE_VP_VTL: Self = Self::new().with_hv_start_enable_vtl_vp(true); // StartVp/EnableVpVtl handling
const INTCON: Self = Self::new().with_intcon(true);
#[cfg(guest_arch = "x86_64")]
const UPDATE_PROXY_IRR_FILTER: Self = Self::new().with_update_proxy_irr_filter(true);
}

/// Immutable access to useful bits of Partition state.
Expand Down Expand Up @@ -749,6 +759,39 @@ impl UhPartitionInner {
self.vps.get(index.index() as usize)
}

/// For requester VP to issue `proxy_irr_blocked` update to other VPs
#[cfg(guest_arch = "x86_64")]
fn request_proxy_irr_filter_update(&self, vtl: GuestVtl, device_vector: u8, req_vp_index: u32) {
tracing::debug!(
?vtl,
device_vector,
req_vp_index,
"request_proxy_irr_filter_update"
);

// Add given vector to partition global device vector table (VTL0 only for now)
{
let mut device_vector_table = self.device_vector_table.write();
device_vector_table.set(device_vector as usize, true);
}

// Wake all other VPs for their `proxy_irr_blocked` filter update
for vp in self.vps.iter() {
if vp.cpu_index != req_vp_index {
vp.wake(vtl, WakeReason::UPDATE_PROXY_IRR_FILTER);
}
}
}

/// Get current partition global device irr vectors (VTL0 for now)
#[cfg(guest_arch = "x86_64")]
fn get_device_vectors(&self, _vtl: GuestVtl, irr_vectors: &mut IrrBitmap) {
let device_vector_table = self.device_vector_table.read();
for idx in device_vector_table.iter_ones() {
irr_vectors.set(idx, true);
}
}

fn inspect_extra(&self, resp: &mut inspect::Response<'_>) {
let mut wake_vps = false;
resp.field_mut(
Expand Down Expand Up @@ -1609,6 +1652,8 @@ impl<'a> UhProtoPartition<'a> {
no_sidecar_hotplug: params.no_sidecar_hotplug.into(),
use_mmio_hypercalls: params.use_mmio_hypercalls,
backing_shared: BackingShared::new(isolation, BackingSharedParams { cvm_state })?,
#[cfg(guest_arch = "x86_64")]
device_vector_table: RwLock::new(IrrBitmap::new(Default::default())),
});

if cfg!(guest_arch = "x86_64") {
Expand Down
12 changes: 12 additions & 0 deletions openhcl/virt_mshv_vtl/src/processor/hardware_cvm/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -723,6 +723,18 @@ impl<T: CpuIo, B: HardwareIsolatedBacking> UhHypercallHandler<'_, '_, T, B> {
multicast: bool,
target_processors: &[u32],
) -> HvResult<()> {
// Before dispatching retarget_device_interrupt, add the device vector
// to partition global device vector table and issue `proxy_irr_blocked`
// filter wake request to other VPs
self.vp.partition.request_proxy_irr_filter_update(
self.intercepted_vtl,
vector as u8,
self.vp.vp_index().index(),
);

// Update `proxy_irr_blocked` for this VP itself
self.vp.update_proxy_irr_filter(self.intercepted_vtl);

self.vp.partition.hcl.retarget_device_interrupt(
device_id,
hvdef::hypercall::InterruptEntry {
Expand Down
49 changes: 49 additions & 0 deletions openhcl/virt_mshv_vtl/src/processor/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ cfg_if::cfg_if! {
use virt::x86::MsrError;
use virt_support_apic::LocalApic;
use virt_support_x86emu::translate::TranslationRegisters;
use bitvec::prelude::BitArray;
use bitvec::prelude::Lsb0;
} else if #[cfg(guest_arch = "aarch64")] {
use hv1_hypercall::Arm64RegisterState;
use hvdef::HvArm64RegisterName;
Expand Down Expand Up @@ -934,6 +936,14 @@ impl<'a, T: Backing> UhProcessor<'a, T> {
}
}
}

#[cfg(guest_arch = "x86_64")]
if wake_reasons.update_proxy_irr_filter()
&& self.partition.isolation.is_hardware_isolated()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this wake reason ever get set on non-isolated?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we are only issuing this wake request from impl<T: CpuIo, B: HardwareIsolatedBacking> UhHypercallHandler<'_, '_, T, B>::retarget_physical_interrupt, so this will never be set on non-isolated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check that we're isolated in this if then?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree its duplicate; we can get rid of it. Was just making it more explicit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can put an assertion if you want.

{
// update `proxy_irr_blocked` filter
self.update_proxy_irr_filter(vtl);
}
}

Ok(wake_reasons_vtl.map(|w| w.intcon()).into())
Expand Down Expand Up @@ -966,9 +976,26 @@ impl<'a, T: Backing> UhProcessor<'a, T> {
#[cfg(guest_arch = "x86_64")]
fn write_msr(&mut self, msr: u32, value: u64, vtl: GuestVtl) -> Result<(), MsrError> {
if msr & 0xf0000000 == 0x40000000 {
// If updated is Synic MSR, then check if its proxy or previous was proxy
// in either case, we need to update the `proxy_irr_blocked`
let mut irr_filter_update = false;
if matches!(msr, hvdef::HV_X64_MSR_SINT0..=hvdef::HV_X64_MSR_SINT15) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we change the order of these ifs, can we then merge the two checking hv?

if let Some(hv) = self.backing.hv(vtl).as_ref() {
let sint_curr =
HvSynicSint::from(hv.synic.sint((msr - hvdef::HV_X64_MSR_SINT0) as u8));
let sint_new = HvSynicSint::from(value);
if sint_curr.proxy() || sint_new.proxy() {
irr_filter_update = true;
}
}
}
if let Some(hv) = self.backing.hv_mut(vtl).as_mut() {
let r = hv.msr_write(msr, value);
if !matches!(r, Err(MsrError::Unknown)) {
// SINT updated, check if proxy filter update was required
if irr_filter_update {
self.update_proxy_irr_filter(vtl);
}
return r;
}
}
Expand Down Expand Up @@ -1128,6 +1155,28 @@ impl<'a, T: Backing> UhProcessor<'a, T> {

self.request_sint_notifications(vtl, pending_sints);
}

#[cfg(guest_arch = "x86_64")]
fn update_proxy_irr_filter(&mut self, vtl: GuestVtl) {
let mut irr_bits: BitArray<[u32; 8], Lsb0> = BitArray::new(Default::default());

// Get all not maksed && proxy SINT vectors
if let Some(hv) = self.backing.hv(vtl).as_ref() {
for sint in 0..NUM_SINTS as u8 {
let sint_msr = hv.synic.sint(sint);
let hv_sint = HvSynicSint::from(sint_msr);
if hv_sint.proxy() && !hv_sint.masked() {
irr_bits.set(hv_sint.vector() as usize, true);
}
}
}

// Get all device vectors
self.partition.get_device_vectors(vtl, &mut irr_bits);

// Update `proxy_irr_blocked` filter in run page
self.runner.update_proxy_irr_filter(&irr_bits);
}
}

fn signal_mnf(dev: &impl CpuIo, connection_id: u32) {
Expand Down
1 change: 0 additions & 1 deletion openhcl/virt_mshv_vtl/src/processor/snp/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -396,7 +396,6 @@ impl BackingPrivate for SnpBacked {
// TODO SNP GUEST VSM supporting VTL 1 proxy irrs requires kernel changes
if vtl == GuestVtl::Vtl0 {
if let Some(irr) = this.runner.proxy_irr() {
// TODO SNP: filter proxy IRRs.
this.backing.cvm.lapics[vtl]
.lapic
.request_fixed_interrupts(irr);
Expand Down
1 change: 0 additions & 1 deletion openhcl/virt_mshv_vtl/src/processor/tdx/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -883,7 +883,6 @@ impl UhProcessor<'_, TdxBacked> {
// Check for interrupt requests from the host and kernel IPI offload.
// TODO TDX GUEST VSM supporting VTL 1 proxy irrs requires kernel changes
if vtl == GuestVtl::Vtl0 {
// TODO TDX: filter proxy IRRs by setting the `proxy_irr_blocked` field of the run page
if let Some(irr) = self.runner.proxy_irr() {
// We can't put the interrupts directly on the APIC page because we might need
// to clear the tmr state. This can happen if a vector was previously used for a level
Expand Down
Loading