-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA]: Fix labels for a subset of nodes in Leiden clustering #4880
Comments
We could certainly add an initial membership equivalent in our code. That would permit you to provide the initial label for each node. Just want to check on your intention. Providing an initial membership would start each node out in the cluster id (label) that you provide. But the algorithm is going to immediately begin considering moving vertices out of the initial assigned cluster into other clusters in order to improve modularity. It seems like the Leiden algorithm would potentially move your intended vertices into other clusters. This isn't, at least as I interpret your title, going to fix the labels for a subset of nodes, or guarantee that those nodes stay together. I haven't examined the igraph implementation, but I suspect it wouldn't guarantee that behavior either. There are ways that we could guarantee that a subset of nodes is combined into a cluster and stays together. Guaranteeing that labels stay consistent is a separate issue, if that's important. The label that gets applied in cuGraph is arbitrary (since we're operating in parallel, different threads race to make decisions on moving vertices and which label wins is not predictable a priori. So, if adding |
Adding more clarity here. I want to provide the initial membership labels for all nodes but fix them for a subset of the nodes. Here's how it's done currently using G = _utils.get_igraph_from_adjacency(_utils._choose_graph(adata=adata_all_neo, obsp=None, neighbors_key=None), directed=True)
initial_membership = []
is_membership_fixed = []
singleton_value = adata_all_neo.obs['leiden_1p0'].astype(int).max() + 1
for cell in adata_all_neo.obs_names:
if cell in fixed_labels.index:
initial_membership.append(int(adata_all_neo.obs.loc[cell, 'leiden_1p0']))
is_membership_fixed.append(True)
else:
initial_membership.append(singleton_value)
singleton_value += 1
is_membership_fixed.append(False)
partition = leidenalg.RBConfigurationVertexPartition(
G,
resolution_parameter=0.001,
weights=np.array(G.es["weight"]).astype(np.float64),
initial_membership=initial_membership,
)
opt = leidenalg.Optimiser()
## we set to 100 here instead of -1 from scanpy. In cuGraph it's set to 100
print("Optimising partition...")
start = time.time()
opt.optimise_partition(partition, is_membership_fixed=is_membership_fixed, n_iterations=-1)
end = time.time()
print(f"Optimising partition took: {end - start} seconds") |
OK. So they have an additional parameter to identify which vertices are fixed. That's useful. We can definitely explore that as well. I have added this to our list of new algorithms, we'll try and prioritize it soon. |
Is this a new feature, an improvement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
Critical (currently preventing usage)
Please provide a clear description of problem this feature solves
We would want to fix labels for a subset of nodes in Leiden clustering and update the remaining nodes with either the same labels or new ones.
Describe your ideal solution
A function parameter similar to
initial_membership
fromleidenalg
package that can fix membership labels for a set of nodes.Describe any alternatives you have considered
We are currently using this functionality from
leidenalg
package. Refer: #4828 (comment)Additional context
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: