-
Notifications
You must be signed in to change notification settings - Fork 0
Cluster Methods
See https://github.com/veg/hivtrace
Clusters are formed based off of the removal of edges from a completely connected graph. Filtering to create a subgraph with the same number of vertices but connected by only those edges below a specified cutoff threshold.
The nature of vertex relationships becomes more binary after component clustering (ie. They are either clustered together or not). Unintuitive cases where cases may share a cluster with very distant cases can arise, creating misleading graphs by sequential linkages
See igraph documentation https://igraph.org/r/doc/cluster_walktrap.html
A common theoretical solution to cluster assignment from weighted graphs using random walks. Cluster assignments are stochastic, as all possible walk paths in larger clusters becomes computationally restrictive. The process of optimizing the tn93 cutoff distance may be similar to an optimization of the steps parameter for walktrap().
See the growG() function in tn93Analysis.R.
Clusters are formed based on cases from year Y and earlier. Case growth is counted based on cases from year Y. Compatible with clmp and tree-based clustering methods.
Clusters from year Y would be different from year Y+1 due to new cluster formation and cluster merging. Any growth estimates we have for year Y, would then not ne applicable to year Y+1 using this method (as the growth measurements from year Y+1 would be on a different set of clusters)
See growthSim() function in tn93Analysis.R
Clusters are formed based on cases from year Y-1 and earlier. Case growth is counted based on cases from year Y being added individually to those clusters, as these are the clusters "Under Observation" and we would like to see how this exact set of clusters grow.
Opposite to the foresight problems experienced by embedded case growth clusters from year Y-1 will be different from those formed at year Y-2. This creates another unrealistic situation, where the clusters from the reference
As we simulate adding new cases, these cases may bridge multiple clusters together, creating an indexing problem, as what was once considered 2 separate clusters may now be considered one.
Default solution used by growthSim()
Alt option used by growthSim()
The growth at year Y should be somewhat predictable by the information within years up to and including Y-1. If we view growth at Y as an outcome variable and measure it using one of the methods described above, we still need a predictor variable.
See 2018 NY Study, Wertheim et al
We can use embedded growth measurement to establish the growth from year Y-6 to Y-1, counting all cases from all of those 5 years and dividing by 5. We may also divide by square root cluster size at year Y-1 if we would like to measure relative instead of absolute growth. This avoids the hindsight and foresight problems mentioned above.
The second figure in simulated growth measurement demonstrates the way this method only counts direct new case linkages. Compared to embedded growth measurement, this is likely to give much lower measurements of cluster growth. This means the predicted growth will be skewed to overestimate the growth at Y.
If we see new cases in clusters and treat those clusters as objects with a weight defined by size and a valency defined by edges leading from a new case cluster to an old case, then we can apply a method similar to the weighted merge solution.
We can also apply a method similar to the closest merge solution. Which will lead to more extreme variation in individual node growth.
The initial frequency function f(x) can be thought of as a Poisson-Linked GLM where the age of a given case predicts the likelihood that it will be connected to newer cases.
See Nakaya, T (2000)
As a way to obtain stats such as GAIC and VPC, we need to compare clusters at a given cutoff to clusters at a cutoff of 0 (ie. Every individual case is a cluster of size 1). The full model should represent maximum variation in the data and act as a frame of comparison for the variation at a given level of aggregation.
Unlike Nakaya's solution to the MAUP, our total network growth is effected by threshold cutoff. With a cutoff of 0, no new cases are added and therefore overall case growth will be 0.
See full option for growth and forecast functions in tn93Analysis.R To keep total network growth static, we can choose to only disaggregate old cases (ie. those coming before Y). We may still need to resolve merges (see merge solutions in Simulated growth measurement).
The full model has difficulty producing meaningful growth estimates using the Relative Recent Growth model past growth only offers binary outcomes for single cases (either they appeared in the last 5 years or did not). The selectively disaggregated full model is however, compatible with the Age-Dependent frequency model for forecasting growth.