About
Pluralis carries out foundational research on Protocol Learning: multi-participant training of foundation models where no single participant has,
or can ever obtain, a full copy of the model. The purpose of Protocol Learning is to facilitate the creation of community-trained and community-owned frontier models with self-sustaining economics.
Research
S. Ramasinghe, T. Ajanthan G. Avraham, Y. Zuo, A. Long | NeurIPS 2025
This is the first work that shows model-parallel training over low-bandwidth networks is possible. Specifically, it demonstrates an 8B LLaMA model being trained on par with centralized training when the devices holding subsequent transformer blocks are in four different locations and connected only via standard internet connections. This was considered completely impossible prior to this work.
T. Ajanthan, S. Ramasinghe, Y. Zuo, G. Avraham, A. Long | ICML 2025
Pipeline Parallelism allows large models to train across many small devices by slicing the network into stages. In pipeline parallelism, there is a problem of a “bubble” where devices are idle. It slows down both centralized and decentralized training, but the effect is more pronounced in the decentralized case as communication lag affects the size of the bubble. We solve this, outperforming all existing async techniques and even the synchronous baseline.
▶︎ Code
A. Long*, C. Koneputugodage*, S. Ramasinghe, T. Ajanthan, G. Avraham, Y. Zuo | NeurIPS 2025
UPM's facilitate decentralized training while ensuring a full weight set is never available to any single participant. UPMs thus enable collaborative training while making the model unextractable in practice.
S. Ramasinghe, T. Ajanthan, H. Dolatabadi, G. Avraham, V. Shevchenko, Y. Zuo, C. Koneputugodage, A. Long | NeurIPS 2025
We propose a compression method for communication-efficient context parallelism in decentralized settings, achieving over 95 % compression with negligible overhead and no loss in convergence. The key insight is to exploit the intrinsic low-rank structure of activations by dynamically constraining them to learned mixtures of subspaces via efficient reparameterizations. This allows scaling billion-parameter decentralized models to context lengths exceeding 100 K tokens on networks as slow as 300 Mbps, matching the wall-clock convergence of centralized models on 100 Gbps interconnects.