Parameters and Attributes
Input Parameters
Input parameter |
Description |
---|---|
data |
(numpy.ndarray) n_samples x n_features. When using via_wrapper(), data is ANNdata object that has a PCA object adata.obsm[‘X_pca’][:, 0:ncomps] and ncomps is the number of components that will be used. |
true_label |
(list) ‘ground truth’ annotations or placeholder |
knn |
(optional, default = 30) number of K-Nearest Neighbors for HNSWlib KNN graph |
root_user |
(default is None) can be a list of strings, a list of int or None. When the root_user is set as None and an RNA velocity matrix is available, a root will be automatically computed. If the root_user is None and not velocity matrix is provided, then an arbitrary root is selected. If the root_user is [‘name_of_celltype_belonging_to_earlystage’] where the str corresponds to an item in true_label, then a suitable starting point will be selected corresponding to this group. If the root_user is in the form [678], where 678 is the index of the cell chosen as a start cell, then this will be the designated starting cell. It is possible to give a list of root indices and groups. [120, 699] or [‘traj1_earlystage’, ‘traj2_earlystage’] when there are more than one trajectories |
edgepruning_clustering_resolution |
(optional, default = 0.15) global level graph pruning for PARC clustering stage. Key tuning parameter. This threshold can also be set as the number of standard deviations below the network’s mean-jaccard-weighted edges. 0.1-1 provide reasonable pruning. higher value means less pruning. e.g. a value of 0.15 means all edges that are above mean(edgeweight)-0.15*std(edge-weights) are retained. We find both 0.15 and ‘median’ to yield good results resulting in pruning away ~ 50-60% edges |
edgepruning_clustering_resolution_local |
(optional, default = 1) Rarely needs to be tuned. local pruning threshold for PARC clustering stage: the number of standard deviations above the mean minkowski distance between neighbors of a given node. the higher the parameter, the more edges are retained |
resolution_parameter |
(optional, default = 1) Uses ModuliartyVP and RBConfigurationVertexPartition |
preserve_disconnected_after_pruning |
(optional, default = False) Cluster-graph pruning can occasionally cause fragmentation that can be repaired. However, if disconnected trajectories are believed to exist, then set this to True. |
cluster_graph_pruning |
(optional, default =0.15) Often set to the same value as the PARC clustering level of jac_std_global. To retain more connectivity in the clustergraph underlying the trajectory computations, increase the value |
memory |
(default = 5, reasonable ranges are 2-50) higher value means more memory, more retrospective/inwards randomwalk. memory = 1 runs the non-memory Via 1.0 mode. |
user_defined_terminal_cell |
(optional, default list = []) list of cell indices corresponding to terminal fate cells. |
user_defined_terminal_group |
(optional, default = list = []) list of group level labels corresponding to labels found in true_label, that represent cell fates. |
edgebundle_pruning |
(optional, default = None) This is automatically set to be the same as cluster_graph_pruning_std |
edgebundle_pruning_twice |
(optional, default = False) If the visualized cluster graph edges seem too busy, they can be further condensed by a second iteration of edge bundling by setting this to True. |
gene_matrix |
(optional matrix, not a numpy array, default = None) Only required when using RNA velocity to guide direction. Gene matrix not numpy array: adata.X.todense() |
velocity_matrix |
(optional matrix, default = None). Only required when using RNA velocity to guide direction. Matrix from scVelo with RNA velocities from: adata.layers[‘velocity’] |
velo_weight |
(optional, default = 0.5) #float between 0,1. the weight assigned to directionality and connectivity derived from scRNA-velocity |
too_big_factor |
(optional, default = 0.4) if a cluster exceeds this share of the entire cell population, then PARC will be run on the large cluster to increase granularity. |
x_lazy |
(optional, default = 0.95) 1-x = probability of staying in same node (lazy). Values between 0.9-0.99 are reasonable |
alpha_teleport |
(optional, default = 0.99) 1-alpha is probability of jumping. Values between 0.95-0.99 are reasonable unless prior knowledge of teleportation |
distance |
(optional, default = ‘l2’ euclidean) ‘ip’,’cosine’ |
random_seed |
(optional, default = 42) The random seed to pass to Leiden |
pseudotime_threshold_TS |
(optional, default = 30) Percentile threshold for potential node to qualify as Terminal State |
num_sim_branch_probability |
(optional), default = 500. Number of MCMCs run per terminal state. This can be safely reduced to 100 when computational resources are limited |
small_pop |
(optional, default = 10) Via attempts to merge Clusters with a population < 10 cells with larger clusters. |
Temporal Input Parameters
Input parameter |
Description |
---|---|
t_diff_step |
(optional, default = 1) Number of permitted temporal intervals between connected nodes. If time data is labeled as [0,25,50,75,100,..] then t_diff_step=1 corresponds to ‘25’ and only edges within t_diff_steps are retained |
time_series |
(optional, default False) if the data has time-series labels then set to True |
time_series_labels |
(optional, default None) list of integer values of temporal annoataions corresponding to e.g. hours (post fert), days, or sequential ordering |
knn_sequential |
(optional, default = 10) Number of knn in the adjacent time-point for time-series data (t_i and t_i+1) |
knn_sequential_reverse |
(optional, default = 0) Number of knn enforced from current to previous time point |
Spatial Input Parameters
do_spatial_knn |
(optional, default = False) Whether or not to do spatial mode of StaVia for graph augmentation |
---|---|
do_spatial_layout |
(optional, default = 0.9) whether to use spatial coords for layout of the clustergraph |
spatial_coords |
(optional, default = False) np.ndarray of size n_cells x 2 (denoting x,y coordinates) of each spot/cell |
spatial_knn |
(optional, default = 15) number of knn’s added based on spatial proximity indiciated by spatial_coords |
spatial_aux |
(optional, default = []) a list of slice IDs so that only cells/spots on the same slice are considered when building the spatial_knn graph |
Attributes
Attributes |
Description |
---|---|
labels |
(list) length n_samples of corresponding cluster labels |
single_cell_pt_markov |
(list) computed pseudotime |
single_cell_bp |
(array) computed single cell branch probabilities (lineage likelihoods). n_cells x n_terminal states. The columns each correspond to a terminal state, in the same order presented in the’terminal_clusters’ attribute |
terminal cluster |
(list) terminal clusters found by VIA |
super_cluster_labels |
Set this to v0.labels (clustering output of first pass “v0”) |
super_terminal_cells |
super_terminal_cells = via.get_loc_terminal_states(v0, data) |
full_neighbor_array |
full_neighbor_array = v0.full_neighbor_array. KNN graph from first pass of via - neighbor array |
full_distance_array |
full_distance_array = v0.full_distance_array. KNN graph from first pass of via - edge weights |
ig_full_graph |
ig_full_graph = v0.ig_full_graph igraph of the KNN graph from first pass of via |
csr_array_locally_pruned |
csr_array_locally_pruned = v0.csr_array_locally_pruned. CSR matrix of the locally pruned KNN graph |