Agora Client¶

The quick-start has three sections:

Requirements: hardware, OS, network, and the required HuggingFace token
Running Agora: the full run flow, what logs to expect, how to stop and restart
Advanced: multi-GPU, CLI flags, manual installation, caveats

See Cloud Options for running Agora on AWS, GCP, RunPod, Tensordock, or Lambda Labs.

Summary¶

With the requirements met and port 49200 open, start the node with:

git clone https://github.com/PluralisResearch/agora
cd agora
python3 agora_cli.py

The CLI will guide you through the setup process.

Check run status before joining

Live wait times are on the Dashboard in the Overview tab.

Locate your compute instance in North America

The current run's peers are NA-based and the join gate is RTT < 80 ms to them. Pick an NA datacenter (or an NA-located host on community marketplaces) — instances outside North America are routinely rejected during the authorization check. See Requirements → Network latency for the details.

Progress is logged live on the Dashboard.

Run Pluralis Agora on Vast.ai¶

Use this section if you have never rented a cloud GPU before, do not have a Vast.ai account, and have never used SSH. The path below rents an RTX 4090 on Vast.ai, opens the port Pluralis Agora needs, connects from your Linux machine, and starts the node.

1. Create a Vast.ai account¶

Open cloud.vast.ai and sign up. Confirm the email Vast.ai sends, then add a payment method under Account → Billing. You cannot rent an instance until billing is active.

2. Generate an SSH key pair on your Linux machine¶

Open a terminal and run:

ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519

Press Enter at every prompt to accept the defaults. This produces two files: ~/.ssh/id_ed25519 (private — never share) and ~/.ssh/id_ed25519.pub (public — paste into Vast.ai).

Print the public half so you can copy it:

cat ~/.ssh/id_ed25519.pub

Select the entire output (one line starting with ssh-ed25519) and copy it to the clipboard.

3. Add the public key to Vast.ai¶

In the Vast.ai console, go to Account → Keys (left sidebar). Click New SSH Key, paste the line you just copied, and save.

4. Create a HuggingFace account and generate a read token¶

Pluralis Agora needs a HuggingFace token to download the model weights on first run.

Sign up at huggingface.co/join and verify your email.
Go to huggingface.co/settings/tokens.
Click New token, give it any name, choose Read access, and click Generate.
Copy the token (a string starting with hf_). You will paste it into the Agora CLI in step 10.

5. Pick the Pluralis template¶

In the Vast.ai console, open the Pluralis template. This template preconfigures CUDA + PyTorch and already publishes port 49200 to the host, so you skip the manual Docker port-mapping step that other templates require.

Why the Pluralis template?

Without it you have to edit the offer's Docker create/run options by hand and add -p 49200:49200. The template already includes that flag.

6. Choose a GPU and launch¶

With the template applied, the Vast.ai search page shows offers from individual hosts.

Filter the offer list:
- GPU: at minimum RTX 4090 (24 GB VRAM).
- RAM: at least 80 GB.
- Disk: at least 80 GB.
- Location: North America (open the Geolocation filter and tick US / Canada / Mexico). Hosts outside North America fail the < 80 ms RTT join gate.
Sort by price.
From the offers near the top of the sorted list, pick one whose reliability score is ≥ 99 % and whose DLPerf number is reasonable for the GPU class. Both values are displayed on the offer card.
On the chosen offer, confirm Launch Mode is set to SSH + TCP (not Jupyter).
Click Rent.

Pick -p, not EXPOSE

If you ever launch a non-Pluralis template, edit Docker create/run options and add -p 49200:49200. The separate Open Ports / EXPOSE list only documents the port in image metadata; it does not publish it.

7. Read the external port¶

Vast.ai maps the container's internal 49200 to a random external port. You need that external port for the next step and for the Agora launch flag.

On your Instances page, click IP & Port Info on the new instance's card. The dialog shows lines in this format:

65.130.162.74:33526 -> 49200/tcp

Record both halves. In the example above:

Host (public IP) = 65.130.162.74
External port = 33526
The instance's internal port 49200 is mapped to that external port.

8. Connect over SSH¶

The instance card also exposes a Connect or SSH button. Click it to reveal the SSH command. It looks like this, with your own host and SSH port:

ssh -p <ssh_port> root@<host>

Run that command in your terminal (note that Ghostly is not supported by vast.ai). The first time you connect, accept the host fingerprint by typing yes.

The SSH port is different from the Agora port

Vast.ai assigns one random port for SSH (used in the ssh -p command) and a separate random port for Agora (the one paired with 49200/tcp in step 7). Do not confuse the two.

9. Update pip and PyTorch on the instance¶

You are now in a shell on the rented machine. Bring the package toolchain up to the version Pluralis Agora expects:

python3 -m pip install --upgrade 'pip>=25.3'
pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 \
  --index-url https://download.pytorch.org/whl/cu128

10. Launch Pluralis Agora¶

Clone the repo and start the CLI:

git clone https://github.com/PluralisResearch/agora
cd agora
python3 agora_cli.py --host_port 49200 --announce_port <external_port>

Replace <external_port> with the value you read in step 7 (for example 33526).

The CLI then prompts you for:

HuggingFace token — paste the hf_… token from step 4.
Email address — optional; press Enter to skip.
GPU ID — only asked if the instance has multiple GPUs; for a single-GPU rental, press Enter.
Run inside Docker? — answer n. You are already inside the Vast.ai container, so a second Docker layer is impractical.

Your answers are saved to a config file. Subsequent runs of python3 agora_cli.py reuse them.

Since Vast.ai nodes automatically attach to a tmux session, your node will continue even if you disconnect.

11. Watch the startup phases¶

Once the CLI is running, Agora moves through four phases. The log lines below indicate a healthy startup.

Phase 1 — Network check and weight download

[NETWORK]  Running internet speed test...
[DOWNLOAD] Downloading model weights...
[DOWNLOAD] Model weights downloaded. Waiting for authorization...

A residential-grade uplink is not enough: the node needs ≥ 200 Mbps. If the speed test fails, the node is dropped from the queue.

Phase 2 — Authorization queue

[AUTH]     Authorization queue: position 2, estimated wait: 1m
[AUTH]     Access granted for your_user

Phase 3 — Sync (only if joining an active run)

[SYNC] Synchronising weights with peers. Node won't process batches in this phase.
[SYNC] This phase will last 400 steps (until local epoch <E>).
[SYNC] Synchronising optimizer state. Node is now processing batches, but doesn't contribute to weight averaging yet.
[SYNC] This phase will last 100 steps (until local epoch <E>).
[SYNC] Sync complete. Node is now fully contributing to training.

Sync can take several hours end-to-end.

Phase 4 — Training

[SERVER]   Training started
[TRAINING] Training step 1
[PROGRESS] Processed 51 batches in the last 60s
[PROGRESS]   Forward pass: 28 batches
[PROGRESS]   Backward pass: 23 batches

A new [PROGRESS] Processed [N] batches line every 60 seconds means the node is contributing.

Detailed logs live in logs/server_gpu<ID>.log on the instance.

12. Verify your contribution¶

Open agora.pluralis.ai and search for your HuggingFace username. Your rented node appears once training begins.

Stopping, restarting, and reconnecting¶

Stop the node (inside the tmux session): press Ctrl + C.
Restart the node without losing your peer identity: re-run python3 agora_cli.py --host_port 49200 --announce_port <external_port>. As long as the private.key file under the agora directory is intact, your node rejoins with the same identity and contribution history.
Destroy the rental: from the Vast.ai Instances page, click the stop / delete icon on the instance card. Billing stops at that moment.

Troubleshooting¶

[NETWORK] speed test fails. The host's uplink is below 200 Mbps. Destroy the rental and pick another offer with a higher reliability score; community-marketplace hosts vary in network quality.
Node joins but never processes batches. Port 49200 is not reachable from outside. Verify in step 7 that the IP & Port Info panel shows ... -> 49200/tcp. If it does not, the template's port mapping was overridden — relaunch with the Pluralis template.
ssh: connect to host ... port ...: Connection refused. The instance is still provisioning. Wait 30–60 seconds and retry. If it persists, check the instance status on the Vast.ai dashboard.
HuggingFace authentication fails. The token must have Read access. Generate a new token at huggingface.co/settings/tokens and re-run the CLI; it will re-prompt for the token if the saved one is invalid.
Ask for help. Pluralis Zulip — Contributor Support channel.