Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nodeGroupOptions.extraNodeSecurityGroups throws "Expected an ID for for instanceProfile:InstanceProfile" #1584

Open
cshen-dev opened this issue Jan 13, 2025 · 6 comments
Assignees
Labels
impact/regression Something that used to work, but is now broken kind/bug Some behavior is incorrect or out of spec p1 A bug severe enough to be the next item assigned to an engineer

Comments

@cshen-dev
Copy link

cshen-dev commented Jan 13, 2025

What happened?

When I migrate to @pulumi/eks v3, this particular field nodeGroupOptions.extraNodeSecurityGroups gives me a hard time.
It keeps reporting something like the below:

Diagnostics:
  pulumi:pulumi:Stack (test-eks-dev):
    error: eks:index:Cluster resource 'brainfish-dev-test' has a problem: grpc: the client connection is closing

  aws:iam:InstanceProfile (brainfish-dev-test-instanceProfile):
    error: Expected an ID for urn:pulumi:dev::test-eks::eks:index:Cluster$aws:iam/instanceProfile:InstanceProfile::brainfish-dev-test-instanceProfile

Example

import * as aws from "@pulumi/aws";
import * as eks from "@pulumi/eks";
import * as pulumi from "@pulumi/pulumi";

import { ORG } from "./constants";
import {
  stack,
  defaultVpcSubnetsIds,
  defaultSecurityGroupId,
  defaultVpcId,
} from "./stackRef";
import * as config from "./config";

const securityGroup = aws.ec2.SecurityGroup.get(
  "default",
  defaultSecurityGroupId
);

const anotherCluster = new eks.Cluster(`${ORG}-${stack}-test`, {
  vpcId: defaultVpcId,
  subnetIds: defaultVpcSubnetsIds,
  nodeGroupOptions: {
    minSize: config.EKS_MIN_WORKER_NODE_NUMBER,
    maxSize: config.EKS_MAX_WORKER_NODE_NUMBER,
    desiredCapacity: config.EKS_DESIRED_WORKER_NODE_NUMBER,
    nodeRootVolumeEncrypted: true,
    nodeRootVolumeSize: 100, // 100GB, reasonable default
    extraNodeSecurityGroups: [securityGroup],
    instanceType: config.EKS_NODE_INSTANCE_TYPE,
    nodeRootVolumeType: config.EKS_NODE_ROOT_VOLUME_TYPE as
      | "standard"
      | "gp2"
      | "gp3"
      | "st1"
      | "sc1"
      | "io1",
  },
});

Output of pulumi about

CLI          
Version      3.145.0
Go Version   go1.23.4
Go Compiler  gc

Plugins
KIND      NAME        VERSION
resource  aws         6.66.2
resource  awsx        2.19.0
resource  docker      4.5.8
resource  docker      3.6.1
resource  eks         3.7.0
resource  kubernetes  4.19.0
language  nodejs      3.145.0

Host     
OS       darwin
Version  15.2
Arch     arm64

This project is written in nodejs: executable='/Users/c.shen/.nvm/versions/node/v20.18.1/bin/node' version='v20.18.1'

Current Stack: brainfish-ai/test-eks/dev

Found no resources associated with dev

Found no pending operations associated with dev

Backend        
Name           pulumi.com
URL            https://app.pulumi.com/brainfish-cshen
User           brainfish-cshen
Organizations  brainfish-cshen, brainfish-ai
Token type     personal

Dependencies:
NAME            VERSION
@types/node     18.19.70
typescript      5.7.3
@pulumi/aws     6.66.2
@pulumi/awsx    2.19.0
@pulumi/eks     3.7.0
@pulumi/pulumi  3.145.0

Additional context

By removing extraNodeSecurityGroups: [securityGroup],, everything will be deployed as expected!

Contributing

Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

@cshen-dev cshen-dev added kind/bug Some behavior is incorrect or out of spec needs-triage Needs attention from the triage team labels Jan 13, 2025
@cshen-dev
Copy link
Author

I solved the problem by replacing
extraNodeSecurityGroups: [securityGroup], to extraNodeSecurityGroups: [pulumi.output(securityGroup)],.

I figured this out by comparing

const extraSGOutput = pulumi.output(args.extraNodeSecurityGroups);

to
const extraNodeSecurityGroupIds = pulumi.all([args.extraNodeSecurityGroups]).apply(([sg]) => {

It looks like while migrating from createNodeGroupInternal to createNodeGroupV2Internal, the pulumi.output() wrapping group.extraNodeSecurityGroups was unintentionally removed. Unless, of course, this was meant to be a breaking change!

@flostadler flostadler added impact/regression Something that used to work, but is now broken and removed needs-triage Needs attention from the triage team labels Jan 13, 2025
@flostadler flostadler self-assigned this Jan 13, 2025
@flostadler flostadler added the p1 A bug severe enough to be the next item assigned to an engineer label Jan 13, 2025
@flostadler
Copy link
Contributor

Hey @cshen-dev, so sorry you're running into this! Thank you so much for providing a great repro!
At a first glance pulumi.all([args.extraNodeSecurityGroups]).apply(([sg]) => { ... }) and pulumi.output(args.extraNodeSecurityGroups) should behave the same way, but there might be some subtleties causing this issue.
I'll start looking into this right away.

@flostadler
Copy link
Contributor

Quick update from my side, I was able to reproduce the issue using your example. Thanks a lot!
I was also able to rule out whether this is caused by retrieving the security group with .get. The same error happens when passing a regular security group.
I'm currently digging deeper what exactly causes this issue

@flostadler
Copy link
Contributor

I was able to in-point it to the following bug in pulumi/pulumi: pulumi/pulumi#13802

The cluster's nodegroupOptions input property is defined as a plain property. But due to pulumi/pulumi#13802, plain inputs which have nested resources references are passed as Output into Node.js MLC.

This trips up the provider implementation and ultimately makes it go down the wrong branch in this if/else block here:

instanceProfile = aws.iam.InstanceProfile.get(
`${name}-instanceProfile`,
instanceProfileName,
undefined,
{ parent, provider },
);

The instanceProfileName argument is not an Input<string> that's defined, it's actually an Input<string | undefined> that resolves to undefined.
Wrapping the nested resource reference, the extraNodeSecurityGroups in this case, in an output works around this bug.
I'll see if we can work around this within the provider implementation.

@flostadler
Copy link
Contributor

While this is technically not new behavior, it didn't exist in the Node.js implementation of v2 of the provider because it didn't use the MLC tech. It was just a regular node library.

This was an issue for the other languages in v2 as well. In v3 the Node.js implementation is also susceptible to this bug now and therefore regressed.

@flostadler
Copy link
Contributor

We're expecting pulumi/pulumi#13802 to get tackled later next week. Given this I'm checking if there's a good stop gap we can put into pu-eks for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impact/regression Something that used to work, but is now broken kind/bug Some behavior is incorrect or out of spec p1 A bug severe enough to be the next item assigned to an engineer
Projects
None yet
Development

No branches or pull requests

2 participants