Parameters and Attributes

Input Parameters

Input parameter	Description
data	(numpy.ndarray) n_samples x n_features. When using via_wrapper(), data is ANNdata object that has a PCA object adata.obsm[‘X_pca’][:, 0:ncomps] and ncomps is the number of components that will be used.
true_label	(list) ‘ground truth’ annotations or placeholder
knn	(optional, default = 30) number of K-Nearest Neighbors for HNSWlib KNN graph
root_user	(default is None) can be a list of strings, a list of int or None. When the root_user is set as None and an RNA velocity matrix is available, a root will be automatically computed. If the root_user is None and not velocity matrix is provided, then an arbitrary root is selected. If the root_user is [‘name_of_celltype_belonging_to_earlystage’] where the str corresponds to an item in true_label, then a suitable starting point will be selected corresponding to this group. If the root_user is in the form [678], where 678 is the index of the cell chosen as a start cell, then this will be the designated starting cell. It is possible to give a list of root indices and groups. [120, 699] or [‘traj1_earlystage’, ‘traj2_earlystage’] when there are more than one trajectories
edgepruning_clustering_resolution	(optional, default = 0.15) global level graph pruning for PARC clustering stage. Key tuning parameter. This threshold can also be set as the number of standard deviations below the network’s mean-jaccard-weighted edges. 0.1-1 provide reasonable pruning. higher value means less pruning. e.g. a value of 0.15 means all edges that are above mean(edgeweight)-0.15*std(edge-weights) are retained. We find both 0.15 and ‘median’ to yield good results resulting in pruning away ~ 50-60% edges
edgepruning_clustering_resolution_local	(optional, default = 1) Rarely needs to be tuned. local pruning threshold for PARC clustering stage: the number of standard deviations above the mean minkowski distance between neighbors of a given node. the higher the parameter, the more edges are retained
resolution_parameter	(optional, default = 1) Uses ModuliartyVP and RBConfigurationVertexPartition
preserve_disconnected_after_pruning	(optional, default = False) Cluster-graph pruning can occasionally cause fragmentation that can be repaired. However, if disconnected trajectories are believed to exist, then set this to True.
cluster_graph_pruning	(optional, default =0.15) Often set to the same value as the PARC clustering level of jac_std_global. To retain more connectivity in the clustergraph underlying the trajectory computations, increase the value
memory	(default = 5, reasonable ranges are 2-50) higher value means more memory, more retrospective/inwards randomwalk. memory = 1 runs the non-memory Via 1.0 mode.
user_defined_terminal_cell	(optional, default list = []) list of cell indices corresponding to terminal fate cells.
user_defined_terminal_group	(optional, default = list = []) list of group level labels corresponding to labels found in true_label, that represent cell fates.
edgebundle_pruning	(optional, default = None) This is automatically set to be the same as cluster_graph_pruning_std
edgebundle_pruning_twice	(optional, default = False) If the visualized cluster graph edges seem too busy, they can be further condensed by a second iteration of edge bundling by setting this to True.
gene_matrix	(optional matrix, not a numpy array, default = None) Only required when using RNA velocity to guide direction. Gene matrix not numpy array: adata.X.todense()
velocity_matrix	(optional matrix, default = None). Only required when using RNA velocity to guide direction. Matrix from scVelo with RNA velocities from: adata.layers[‘velocity’]
velo_weight	(optional, default = 0.5) #float between 0,1. the weight assigned to directionality and connectivity derived from scRNA-velocity
too_big_factor	(optional, default = 0.4) if a cluster exceeds this share of the entire cell population, then PARC will be run on the large cluster to increase granularity.
x_lazy	(optional, default = 0.95) 1-x = probability of staying in same node (lazy). Values between 0.9-0.99 are reasonable
alpha_teleport	(optional, default = 0.99) 1-alpha is probability of jumping. Values between 0.95-0.99 are reasonable unless prior knowledge of teleportation
distance	(optional, default = ‘l2’ euclidean) ‘ip’,’cosine’
random_seed	(optional, default = 42) The random seed to pass to Leiden
pseudotime_threshold_TS	(optional, default = 30) Percentile threshold for potential node to qualify as Terminal State
num_sim_branch_probability	(optional), default = 500. Number of MCMCs run per terminal state. This can be safely reduced to 100 when computational resources are limited
small_pop	(optional, default = 10) Via attempts to merge Clusters with a population < 10 cells with larger clusters.

Temporal Input Parameters

Input parameter	Description
t_diff_step	(optional, default = 1) Number of permitted temporal intervals between connected nodes. If time data is labeled as [0,25,50,75,100,..] then t_diff_step=1 corresponds to ‘25’ and only edges within t_diff_steps are retained
time_series	(optional, default False) if the data has time-series labels then set to True
time_series_labels	(optional, default None) list of integer values of temporal annoataions corresponding to e.g. hours (post fert), days, or sequential ordering
knn_sequential	(optional, default = 10) Number of knn in the adjacent time-point for time-series data (t_i and t_i+1)
knn_sequential_reverse	(optional, default = 0) Number of knn enforced from current to previous time point

Spatial Input Parameters

do_spatial_knn	(optional, default = False) Whether or not to do spatial mode of StaVia for graph augmentation
do_spatial_layout	(optional, default = 0.9) whether to use spatial coords for layout of the clustergraph
spatial_coords	(optional, default = False) np.ndarray of size n_cells x 2 (denoting x,y coordinates) of each spot/cell
spatial_knn	(optional, default = 15) number of knn’s added based on spatial proximity indiciated by spatial_coords
spatial_aux	(optional, default = []) a list of slice IDs so that only cells/spots on the same slice are considered when building the spatial_knn graph

Attributes

Attributes
Attributes	Description
labels	(list) length n_samples of corresponding cluster labels
single_cell_pt_markov	(list) computed pseudotime
single_cell_bp	(array) computed single cell branch probabilities (lineage likelihoods). n_cells x n_terminal states. The columns each correspond to a terminal state, in the same order presented in the’terminal_clusters’ attribute
terminal cluster	(list) terminal clusters found by VIA
super_cluster_labels	Set this to v0.labels (clustering output of first pass “v0”)
super_terminal_cells	super_terminal_cells = via.get_loc_terminal_states(v0, data)
full_neighbor_array	full_neighbor_array = v0.full_neighbor_array. KNN graph from first pass of via - neighbor array
full_distance_array	full_distance_array = v0.full_distance_array. KNN graph from first pass of via - edge weights
ig_full_graph	ig_full_graph = v0.ig_full_graph igraph of the KNN graph from first pass of via
csr_array_locally_pruned	csr_array_locally_pruned = v0.csr_array_locally_pruned. CSR matrix of the locally pruned KNN graph