Rules - LLMVECT

01 Voting Rules

Blind Comparison: Each question gets 4 anonymous model responses (A/B/C/D), identities hidden until voting
Dual Extreme Voting: Pick the best (winner) and worst (loser), winners gain points, losers get penalized
Optional Tie: Choose "About the same" or "Both bad" if hard to distinguish
Instant Reveal: Model identities revealed immediately after voting for transparency
Dual Mode: Speed mode (fast response) and Expert mode (deep reasoning) ranked independently

02 Ranking Algorithm

We use Plackett-Luce probabilistic model + UCB-E exposure control. Key points:

Dimensionality Note: 4-way choice treated as "1 clear winner vs 3 undifferentiated losers" — an engineering tradeoff, not a flaw
Winner Bonus:
Winner gain = K × (1 - P_win) × weight
More underestimated models gain more from a win (similar to ELO)
Loser Penalty: The "worst" model takes directed penalty (2/3 of penalty pool), remaining losers share the rest
Arena Score：
Score = 1200 + 400 × log₁₀(γ)
γ is the model intrinsic strength parameter, estimated by MLE iteration
Sincerity Weight: Dwell time <2s = weight 0, 2-10s linear interpolation, 10s+ = 1.0; scroll depth included to block instant voters
Model Lifecycle (4-state machine):
Active → Observing → Eliminated ↔ Probation

Eliminated models enter Probation period, LCB (Lower Confidence Bound) determines revival, not "revive with one win"
Monthly Reset: Rankings archived on the 1st of each month, new month starts fresh, history available

03 Anti-Cheat

Server-side User ID: user_id issued via HMAC, preventing client-side forgery
IP Rate Limiting: Max 20 requests per 60s per IP, 429 on excess
Vote Cooldown: Minimum 10s between votes per user
Daily Vote Cap: Max 50 votes per user per day
Browser Fingerprint: Canvas/WebGL/Audio fingerprinting to detect multi-account same-device
Anomaly Detection: Repeated votes on same question, highly repetitive patterns flagged
Sincerity Filter: Dwell time + scroll depth dual check, instant voter weight = 0
Audit Log: Complete audit trail for all voting behavior, fully traceable

04 Data Transparency

Vote & Reveal: Model identities disclosed after each vote, no black-box operations
Open Source Ranking: Core ranking engine (lobster.py) algorithm fully public and auditable
Transparent Scoring: Arena Score formula, weight factors, state transition rules all public
Monthly Archives: Historical ranking data archived monthly, any period queryable
Auditable: Complete voting audit logs retained, community oversight welcome

05 Independence Declaration

This platform has no corporate backing

No AI model provider intervention allowed

Driving better AI development through real user evaluation data

No Corporate Ties: Independently operated, no financial ties to any AI provider
Tamper-proof Algorithm: Rankings computed automatically by Plackett-Luce, no manual override
Data = Truth: All rankings independently reproducible from raw vote data
Open Oversight: Community welcome to independently audit algorithms, data, and results