About

Pluralis carries out foundational research on Protocol Learning: multi-participant training of foundation models where no single participant has, or can ever obtain, a full copy of the model. The purpose of Protocol Learning is to facilitate the creation of community-trained and community-owned frontier models with self-sustaining economics.

Research

Taming Curvature: Architecture Warm-up for Stable Transformer Training

S. Ramasinghe, T. Ajanthan, H. Dolatabadi, C. Koneputugodage, G. Avraham, V. Shevchenko, Y. Zuo, K. Pajak, A. Long  |  ICLR 2026

We introduce a fast online curvature estimator that tracks preconditioned Hessian behavior during billion-parameter Transformer training. It reveals depth-driven curvature surges behind loss spikes and motivates architecture warm-up: progressively growing depth to stabilize training without slowing convergence.

Nesterov Method for Asynchronous Pipeline Parallel Optimization

T. Ajanthan, S. Ramasinghe, Y. Zuo, G. Avraham, A. Long  |  ICML 2025

Pipeline parallelism trains large models by splitting them into stages, but idle “bubbles” slow training, especially when network latency is high. Our Nesterov method corrects stale updates and outperforms existing async techniques and the synchronous baseline.

▶︎ Code

Mixtures of Subspaces for Bandwidth-Efficient Context Parallel Training

S. Ramasinghe, T. Ajanthan, H. Dolatabadi, G. Avraham, V. Shevchenko, Y. Zuo, C. Koneputugodage, A. Long  |  NeurIPS 2025

We introduce a compression method for communication-efficient context parallelism that achieves over 95 % compression with negligible overhead and no convergence loss. By exploiting low-rank activation structure through learned mixtures of subspaces, it scales billion-parameter decentralized models to 100 K+ context lengths on 300 Mbps networks while matching centralized wall-clock convergence.

Media