Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA]: Fix labels for a subset of nodes in Leiden clustering #4880

Open
2 tasks done
Tracked by #3337
abs51295 opened this issue Jan 22, 2025 · 3 comments
Open
2 tasks done
Tracked by #3337

[FEA]: Fix labels for a subset of nodes in Leiden clustering #4880

abs51295 opened this issue Jan 22, 2025 · 3 comments
Labels
feature request New feature or request

Comments

@abs51295
Copy link

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

Critical (currently preventing usage)

Please provide a clear description of problem this feature solves

We would want to fix labels for a subset of nodes in Leiden clustering and update the remaining nodes with either the same labels or new ones.

Describe your ideal solution

A function parameter similar to initial_membership from leidenalg package that can fix membership labels for a set of nodes.

Describe any alternatives you have considered

We are currently using this functionality from leidenalg package. Refer: #4828 (comment)

Additional context

No response

Code of Conduct

  • I agree to follow cuGraph's Code of Conduct
  • I have searched the open feature requests and have found no duplicates for this feature request
@abs51295 abs51295 added ? - Needs Triage Need team to review and classify feature request New feature or request labels Jan 22, 2025
@ChuckHastings
Copy link
Collaborator

We could certainly add an initial membership equivalent in our code. That would permit you to provide the initial label for each node.

Just want to check on your intention. Providing an initial membership would start each node out in the cluster id (label) that you provide. But the algorithm is going to immediately begin considering moving vertices out of the initial assigned cluster into other clusters in order to improve modularity. It seems like the Leiden algorithm would potentially move your intended vertices into other clusters. This isn't, at least as I interpret your title, going to fix the labels for a subset of nodes, or guarantee that those nodes stay together. I haven't examined the igraph implementation, but I suspect it wouldn't guarantee that behavior either.

There are ways that we could guarantee that a subset of nodes is combined into a cluster and stays together. Guaranteeing that labels stay consistent is a separate issue, if that's important. The label that gets applied in cuGraph is arbitrary (since we're operating in parallel, different threads race to make decisions on moving vertices and which label wins is not predictable a priori.

So, if adding initial_membership to our API and providing that as a starting point is sufficient, that's easy. If that's not sufficient it might be better to describe your end goal more precisely so we can make sure to add what is most helpful.

@abs51295
Copy link
Author

Adding more clarity here. I want to provide the initial membership labels for all nodes but fix them for a subset of the nodes. Here's how it's done currently using leidenalg package using is_membership_fixed parameter while optimizing partition.

G = _utils.get_igraph_from_adjacency(_utils._choose_graph(adata=adata_all_neo, obsp=None, neighbors_key=None), directed=True)

initial_membership = []
is_membership_fixed = []
singleton_value = adata_all_neo.obs['leiden_1p0'].astype(int).max() + 1
for cell in adata_all_neo.obs_names:
    if cell in fixed_labels.index:
        initial_membership.append(int(adata_all_neo.obs.loc[cell, 'leiden_1p0']))
        is_membership_fixed.append(True)
    else:
        initial_membership.append(singleton_value)
        singleton_value += 1
        is_membership_fixed.append(False)


partition = leidenalg.RBConfigurationVertexPartition(
                    G,
                    resolution_parameter=0.001,
                    weights=np.array(G.es["weight"]).astype(np.float64),
                    initial_membership=initial_membership,
)

opt = leidenalg.Optimiser()

## we set to 100 here instead of -1 from scanpy. In cuGraph it's set to 100
print("Optimising partition...")
start = time.time()
opt.optimise_partition(partition, is_membership_fixed=is_membership_fixed, n_iterations=-1)
end = time.time()
print(f"Optimising partition took: {end - start} seconds")

@ChuckHastings
Copy link
Collaborator

OK. So they have an additional parameter to identify which vertices are fixed. That's useful. We can definitely explore that as well.

I have added this to our list of new algorithms, we'll try and prioritize it soon.

@ChuckHastings ChuckHastings removed the ? - Needs Triage Need team to review and classify label Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants