Advanced¶

Multiple GPUs¶

Run one CLI instance per GPU:

python3 agora_cli.py --gpu_id 0
python3 agora_cli.py --gpu_id 1

Each GPU automatically uses its own port (49200, 49201, ...). Make sure every port is open for inbound TCP.

Native. Run each command in a separate terminal (or tmux window).
Docker. Each instance gets its own container automatically.

Maximum 2 peers per account

We allow up to 2 active peers per account. We may relax the cap during the live run.

CLI reference¶

The CLI prompts interactively, but every value can be passed as a flag:

Flag	Description
`--gpu_id <ID>`	GPU to use (default: 0)
`--token <TOKEN>`	HuggingFace token
`--email <EMAIL>`	Email address (optional)
`--host_port <PORT>`	Listening port (default: `49200 + gpu_id`)
`--announce_port <PORT>`	External port if different from `--host_port` (e.g. RunPod, Tensordock)
`--use_docker`	Run inside Docker
`--log_file <PATH>`	Log file path (default: `logs/server_gpu<ID>.log`)
`--identity_path <PATH>`	Identity key path (default: `private_gpu<ID>.key`)
`--batch_size_override <N>`	Advanced. Force a lower max batch size than the system would auto-pick. Useful when CUDA OOM occurs at the recommended size (common on edge-case GPUs or when other processes share the device). The effective max batch is `min(batch_size_override, recommended_batch_size)`, so the cap can only be lowered, not raised. Start at the recommended size and lower it until OOM stops.
`--skip_input`	Non-interactive: all values must come from flags or saved config
`--reconfigure`	Re-prompt every setting from scratch

The --log_file and --identity_path defaults are appropriate for most setups; override only if there is a specific reason.

Manual installation¶

1. Install dependencies¶

DockerNative (no Docker)

docker build . -t pluralis_agora --label image_version=1

Prerequisites: Python 3.11, pip >= 25.3, NVIDIA drivers with CUDA 12.8, conda.

# Create and activate a Python 3.11 environment
conda create -y -n agora python=3.11
conda activate agora

# Upgrade pip
pip install --upgrade "pip>=25.3"

# Install PyTorch with CUDA 12.8 support
pip install torch==2.7.0 --index-url https://download.pytorch.org/whl/cu128

# Install the two required packages from source
pip install --build-constraint constraints.txt -e ./agora_server
pip install --build-constraint constraints.txt -e ./agora

2. Read runtime parameters¶

run.json at the repo root contains values required by run_server.py:

`run.json` field	`run_server.py` argument
`run_config`	`--config`
`auth_server`	`--auth_server`
`prom_gateway`	`--prom_gateway`
`seeds` (array)	`--initial_peers`

RUN_CONFIG=$(jq -r '.run_config' run.json)
AUTH_SERVER=$(jq -r '.auth_server' run.json)
PROM_GATEWAY=$(jq -r '.prom_gateway' run.json)
SEEDS=$(jq -r '.seeds[]' run.json)

3. Get your public IP¶

PUBLIC_IP=$(curl -s https://api.ipify.org)

4. Launch¶

Default port convention: 49200 + gpu_id. Each GPU needs its own port.

DockerNative (no Docker)

docker run -d --name agora_gpu<gpu_id> --ipc=host --network=host \
  --gpus device=<gpu_id> \
  -v $(pwd):/home -w /home \
  pluralis_agora \
  bash -c "CUDA_VISIBLE_DEVICES=0 python3.11 agora/src/agora/run_server.py \
    --gpu_id <gpu_id> \
    --config $RUN_CONFIG \
    --token <hf_token> \
    --auth_server $AUTH_SERVER \
    --prom_gateway $PROM_GATEWAY \
    --host_maddrs /ip4/0.0.0.0/tcp/<port> \
    --announce_maddrs /ip4/$PUBLIC_IP/tcp/<port> \
    --initial_peers $SEEDS \
    --email <email> \
    --log_file logs/server_gpu<gpu_id>.log \
    --identity_path private_gpu<gpu_id>.key"

CUDA_VISIBLE_DEVICES=<gpu_id> python3.11 agora/src/agora/run_server.py \
  --gpu_id <gpu_id> \
  --config "$RUN_CONFIG" \
  --token <hf_token> \
  --auth_server "$AUTH_SERVER" \
  --prom_gateway "$PROM_GATEWAY" \
  --host_maddrs /ip4/0.0.0.0/tcp/<port> \
  --announce_maddrs /ip4/$PUBLIC_IP/tcp/<port> \
  --initial_peers $SEEDS \
  --email <email> \
  --log_file logs/server_gpu<gpu_id>.log \
  --identity_path private_gpu<gpu_id>.key

Caveats¶

private.key¶

On first run, Agora generates a private.key file (or private_gpu<ID>.key for multi-GPU setups). This is your node's cryptographic identity.

Required for secure communication within the swarm.
Keep it private. Never share or commit it.
Tied to your HuggingFace account. Rejoining with the same private.key under a different HF account fails with This peer_id is already used by another user.
Losing it means you join as a new identity. Previous swarm contributions stay on the dashboard, but your node_id changes.

Docker file ownership¶

Files created inside a Docker container are owned by the container's user, not the host user. To modify or delete them from the host:

Linux:

sudo chown -R $USER <path/to/project>

This commonly matters for logs/ and checkpoint files written during a Docker run.