Performance Benchmarks
Benchmark Methodology & Specifications
To ensure a fair and rigorous comparison, the benchmarks were conducted under the following conditions:
Hardware Specifications
- CPU: Intel Core i5-12400F (6 Cores / 12 Threads)
- RAM: 16 GB
- Multithreading: OpenMP was allowed to utilize all available CPU cores.
Algorithm Parameters
We generated different datasets of varying observations () and input dimensions (). We requested diffusion dimensions for all methods. Data was generated from standard normal distribution.
-
cDiffusion:- Sparse method:
n_iter = 500(for ) or1000(for ). - Dense method:
n_iter = 50. -
k_neighborsscaled dynamically with : (), (), (), and ().
- Sparse method:
-
destiny:- To ensure a fair comparison with a sparse method, we forced
destinyto use the exact same number of neighbors (k = K_NEIGHBORS). - These parameters where used to measure pure computational speed -
sigma = 10.0,density_norm = FALSE.
- To ensure a fair comparison with a sparse method, we forced
-
diffusionMap(CRAN):-
Note: This package requires a pre-computed distance matrix. To give it an advantage, we pre-calculated
dist(data)before starting the timer. The benchmark only measures its eigendecomposition time.
-
Note: This package requires a pre-computed distance matrix. To give it an advantage, we pre-calculated
Each test was repeated 3 to 5 times using the microbenchmark package. The tables below show the Median execution time in seconds.
1. Small to Medium Datasets
For smaller datasets, we compared both our Dense and Sparse methods against different R packages that implement diffusion maps.
2. Larger Datasets
For , dense matrices begin to cause severe memory issues (a dense distance matrix for 30,000 points requires ~7.2 GB of RAM). Therefore, Dense methods were excluded, leaving a direct head-to-head comparison between the two Sparse implementations.