EGGROLLTrainer¶
EGGROLL (Evolution Guided General Optimization via Low-rank Learning) trainer.
EGGROLLTrainer ¶
EGGROLLTrainer(params, model, fitness_fn, population_size=256, learning_rate=0.01, sigma=0.1, rank=1, noise_reuse=0, group_size=0, freeze_nonlora=False, device=None, seed=None)
Bases: Optimizer
EGGROLL trainer implementing the actual EGGROLL algorithm.
Unlike the base ESTrainer which works with flattened parameters, EGGROLL works per-layer with low-rank perturbations for efficiency.
Key features: - Low-rank perturbations: For matrices W ∈ R^(m×n), samples A ∈ R^(m×r), B ∈ R^(n×r) where r << min(m,n), forming perturbation A @ B.T - Per-layer updates: Handles each parameter tensor independently - Noise reuse: Can reuse noise across multiple evaluations (antithetic sampling) - Group normalization: Supports fitness normalization within groups
Subclasses torch.optim.Optimizer for compatibility with PyTorch optimizer interface. Use model.parameters() as the first argument, similar to standard optimizers.
Initialize the EGGROLL trainer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
params
|
Iterable of parameters to optimize (for optimizer compatibility). Typically model.parameters(). |
required | |
model
|
Module
|
PyTorch model to train. Parameters will be optimized. |
required |
fitness_fn
|
Callable[[Module], float]
|
Function that takes a model and returns a fitness score (higher is better). Should handle model evaluation. |
required |
population_size
|
int
|
Number of population members |
256
|
learning_rate
|
float
|
Learning rate for parameter updates |
0.01
|
sigma
|
float
|
Standard deviation for perturbations |
0.1
|
rank
|
int
|
Rank of low-rank perturbations (default: 1) |
1
|
noise_reuse
|
int
|
Number of evaluations to reuse noise (0 = no reuse, 2 = antithetic) |
0
|
group_size
|
int
|
Size of groups for fitness normalization (0 = global normalization) |
0
|
freeze_nonlora
|
bool
|
If True, only apply LoRA updates to linear layers |
False
|
device
|
Optional[device]
|
Device to run on |
None
|
seed
|
Optional[int]
|
Random seed |
None
|
Source code in eggroll_trainer/eggroll.py
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | |
Functions¶
get_best_model ¶
Get a copy of the model with the best parameters found.
Returns:
| Type | Description |
|---|---|
Module
|
New model instance with best parameters |
Source code in eggroll_trainer/eggroll.py
step ¶
Perform one optimization step.
This method provides compatibility with PyTorch optimizer interface. For ES algorithms, the fitness function is provided at initialization, not per-step. The closure parameter is ignored for ES.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
closure
|
Optional callable (ignored for ES, fitness_fn used instead) |
None
|
Returns:
| Type | Description |
|---|---|
|
Dictionary with training metrics |
Source code in eggroll_trainer/eggroll.py
368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 | |
train ¶
Train for multiple generations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_generations
|
int
|
Number of generations to train |
required |
verbose
|
bool
|
Whether to print progress |
True
|
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dictionary with final training state |
Source code in eggroll_trainer/eggroll.py
zero_grad ¶
Zero gradients (no-op for ES algorithms).
This method exists for optimizer interface compatibility. ES algorithms don't use gradients, so this is a no-op.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
set_to_none
|
bool
|
If True, set gradients to None (ignored for ES) |
False
|
Source code in eggroll_trainer/eggroll.py
Usage¶
from eggroll_trainer import EGGROLLTrainer
trainer = EGGROLLTrainer(
model.parameters(),
model=model,
fitness_fn=fitness_fn,
population_size=256,
learning_rate=0.01,
sigma=0.1,
rank=1,
noise_reuse=0,
group_size=0,
freeze_nonlora=False,
seed=42,
)
trainer.train(num_generations=100)
Key Parameters¶
rank (int, default: 1)¶
Rank of low-rank perturbations. Controls memory/computation tradeoff:
- rank=1: Minimum memory, fastest (recommended)
- rank=2-4: Better expressivity, still efficient
- rank>>1: Approaches full-rank (not recommended)
noise_reuse (int, default: 0)¶
Number of evaluations to reuse noise:
- 0: No reuse (standard)
- 2: Antithetic sampling (use +ε and -ε)
- >2: Multiple reuses (rarely needed)
group_size (int, default: 0)¶
Size of groups for fitness normalization:
- 0: Global normalization (all population members)
- >0: Group-based normalization (can improve stability)
freeze_nonlora (bool, default: False)¶
If True, only apply LoRA updates to 2D parameters (matrices):
- False: Update all parameters (recommended)
- True: Only update matrix parameters (biases frozen)
Characteristics¶
- ✅ 100x speedup over full-rank for large models
- ✅ Memory efficient
- ✅ Handles large population sizes
- ✅ Per-layer updates
- ✅ Supports fitness normalization
See Also¶
- User Guide - Detailed usage guide
- Research - Algorithm details