TradeArena: Offline Smoke Showcase And LLM Benchmark Harness

The showcase path validates the deterministic runner, risk gate, execution simulator, and trajectory artifacts without live provider calls. TradeArena also includes opt-in live or cache-backed LLM analyst runs through the same audit lifecycle: observation -> signal -> intended allocation -> risk gate -> order -> fill/rejection -> portfolio state -> diagnostic report.

Open showcase Benchmark v0.1 Watch demo GitHub

python -m pip install -e ".[dev]"
python scripts/run_showcase.py

# Open outputs/examples/index.html
# First-run path uses deterministic agents,
# tracked snapshots, and no live provider calls.

Opt-In LLM Agent Evaluation

Run live or cache-backed model analysts after configuring provider keys, then compare return, drawdown, risk edits, rejection rate, reproducibility, and audit coverage.

Stress-Test Execution Assumptions

Inspect how spread, slippage, latency, liquidity limits, partial fills, and rejected orders change realized exposure.

Build Auditable Workflows

Plug in data adapters, analysts, strategies, risk gates, execution simulators, memory, and evaluators.

Observe

Plan

Risk Gate

Execute

Reflect

Audit

Benchmark result pageCrisis scenes, intraday portfolio probes, and representation robustness in one compact snapshot. Community registryValidate redacted benchmark submissions and compare runs without raw provider text. Replayable audit reportTrace one decision through observation, proposal, risk revision, execution, memory, and reproducibility fields. Crisis-scene visual probesInspect representation trajectories, correlation/intent heatmaps, feedback curves, and exposure waterfalls. Contributor extension pathSee how custom analysts, risk managers, and evaluators plug into the fixed protocol stack.

What TradeArena is not: it is not financial advice, not a live trading bot, and not a promise of profitable trading. It is an audit and benchmark layer for financial AI agent behavior.