Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log fallback due to lack of resources as info rather than error. #590

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 24 additions & 7 deletions vm/devices/storage/disk_nvme/nvme_driver/src/driver.rs
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,12 @@ pub enum RestoreError {
InvalidData,
}

#[derive(Debug, Error)]
pub enum IOError {
#[error("no more io queues available")]
NoMoreIoQueues,
}

#[derive(Inspect)]
struct IoQueue {
queue: QueuePair,
Expand Down Expand Up @@ -797,12 +803,23 @@ impl<T: DeviceBacking> DriverWorkerTask<T> {
.find_map(|(i, issuer)| issuer.get().map(|issuer| (i, issuer)))
.unwrap();

tracing::error!(
cpu,
fallback_cpu,
error = err.as_ref() as &dyn std::error::Error,
"failed to create io queue, falling back"
);
// Error due to running out of IO queues as a result of
// lack of hardware resources should be logged as INFO
if err.downcast_ref::<IOError>().is_some() {
tracing::info!(
cpu,
fallback_cpu,
error = err.as_ref() as &dyn std::error::Error,
"failed to create io queue, falling back"
);
} else {
tracing::error!(
cpu,
fallback_cpu,
error = err.as_ref() as &dyn std::error::Error,
"failed to create io queue, falling back"
);
}
fallback.clone()
}
};
Expand All @@ -819,7 +836,7 @@ impl<T: DeviceBacking> DriverWorkerTask<T> {
cpu: u32,
) -> anyhow::Result<IoIssuer> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave it up to the storage folks to decide if they want to take this, but I think our preferred approach would be to convert these functions to returning full thiserror enums for all the different cases and then just matching on that, instead of using downcast_ref.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - many thanks for the contribution @EMancin3, and for the feedback here, @smalis-msft.

I am in favor of making a code change to reduce the severity of this message. I also agree with Steven's feedback. (Tho, while I'm confident in the storage space, I'm still learning Rust and I'm open to a conversation).

I would suggest we generalize this ... :

#[derive(Debug, Error)]
pub enum DeviceCapabilityError {
    #[error("no more io queues available, max {0}", usize)]
    NoMoreIoQueues,
}

Since anyhow::Result is a shortcut for Result<T, anyhow::Error> and anyhow::Error implements std::error::Error, we should be able to then do:

    async fn create_io_queue(
        &mut self,
        state: &mut WorkerState,
        cpu: u32,
    ) -> Result<IoIssuer, Error> {

...

        if self.io.len() >= state.max_io_queues as usize {
            return Err(DeviceCapabilityError::NoMoreIoQueues(state.max_io_queues))
        }

and then in the calling code match on Error type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @mattkur.

if self.io.len() >= state.max_io_queues as usize {
anyhow::bail!("no more io queues available");
anyhow::bail!(IOError::NoMoreIoQueues);
}

let qid = self.io.len() as u16 + 1;
Expand Down
Loading