- Switch OTD computation from `cvxpy` to `ot.emd2` since it's way faster (up to 10x) for OTD cases within neighbor counts 2000 x 2000.
- Add a new computation method `OTDSinkhornMix` that uses `ot.emd2` to compute OTD for smaller neighbors (less than `_OTDSinkhorn_threshold=2000` in default setting). This is because Sinkhorn works faster for larger cases, for smaller cases, the iterating process makes it slower than directly compute the `ot.emd2` written in C.
- Update the default `nbr_topk` to be 3000 because of the faster computations.
- Suppressed edge weight to zero message from `warning` to `trace` since the Ricci flow "converges" faster with exact Wasserstein distance.