| ta-109a118ee5d7 | anonymous_entry_synthetic_stress_v0_1 | anonymous / anonymous-redacted-policy | weights_only | true | stress-onlyexternal-submittedredacted-prompt | benchmark | stress-benchmark | 0.9420 | synthetic-market (daily, 3 symbols) | 0.0184 | -0.0261 | 0.8125 | 9 | 31 | 1.0000 | ReproducibleRedacted | OpenModel redacted: True Claim scope: anonymous external submission under stress-only execution Source: examples/benchmark_submissions/anonymous_entry_redacted_submission.json Hash: sha256:109a118ee5d70ca663873c613da8caef7802ceb2f80a45df7b05f48e25ecced9 |
| ta-aad1948b44bf | crisis_scene_llm_redacted_example | poe / frontier-chat-model-redacted | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 0.9670 | yahoo-finance-csv (hourly, 3 symbols) | 0.0108 | -0.0187 | 0.7816 | 28 | 196 | 1.0000 | ReproducibleRedacted | OpenModel redacted: True Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/example_llm_redacted_submission.json Hash: sha256:aad1948b44bf9d607641a8b84455224c87d5bcc6446a8acf5bf2fc8f81f29ff0 |
| ta-ed2d5e4f2ff3 | quickstart_core_synthetic_v0_1 | deterministic / signal-weighted-baseline | none | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.3508 | -0.0126 | 0.9034 | 14 | 124 | 1.0000 | ReproducibleRedacted | OpenModel redacted: True Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/example_redacted_submission.json Hash: sha256:ed2d5e4f2ff3c87513c4b72ab96a375369150877e6771e7a86b5baa911e9c138 |
| ta-0a4d0479945d | leaderboard_llm_calm_trend_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/calm_trend__baseline_always_hold__seed_11.json Hash: sha256:0a4d0479945d620fa4a6558eada3d815242f670a88cc099653bbafad18937b49 |
| ta-e6b53c235779 | leaderboard_llm_calm_trend_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/calm_trend__baseline_always_hold__seed_17.json Hash: sha256:e6b53c2357796f84ae7e9134a7ec5d87f29f9ccbda89513d5eb9ae299ec440da |
| ta-0d0673c4df95 | leaderboard_llm_calm_trend_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/calm_trend__baseline_always_hold__seed_23.json Hash: sha256:0d0673c4df95ec1d80a708921d5295eb172e26e8703e6bb69645d815e6fff981 |
| ta-bdad4e3ca968 | leaderboard_llm_calm_trend_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/calm_trend__baseline_always_hold__seed_31.json Hash: sha256:bdad4e3ca9680dd061973d21fcb7e18a0dc29ba6a756af66abab08647fef6746 |
| ta-70cc7102f89e | leaderboard_llm_calm_trend_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/calm_trend__baseline_always_hold__seed_7.json Hash: sha256:70cc7102f89ea2e707c0f9a6e684b76ffe543abbc3e3cc29222cb2834a4eebee |
| ta-43260ef9e93e | leaderboard_llm_calm_trend_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0174 | -0.0018 | 0.7333 | 2 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/calm_trend__baseline_random__seed_11.json Hash: sha256:43260ef9e93e709396975af40b15adfb61dc06ac61e1777968e5f67c4522dd06 |
| ta-3ae7ff9c8fff | leaderboard_llm_calm_trend_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0098 | -0.0023 | 0.7333 | 2 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/calm_trend__baseline_random__seed_17.json Hash: sha256:3ae7ff9c8ffff38edd76d9e697055fd640837c94010fd3a72ce087bd70583ab0 |
| ta-7fc2c1ac4801 | leaderboard_llm_calm_trend_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0362 | -0.0012 | 0.6667 | 2 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/calm_trend__baseline_random__seed_23.json Hash: sha256:7fc2c1ac4801de38cacd90d1fa1a506c8c1721fe5c48a682cd767cabd588a0d2 |
| ta-cf42e362917a | leaderboard_llm_calm_trend_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0195 | -0.0012 | 0.7778 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/calm_trend__baseline_random__seed_31.json Hash: sha256:cf42e362917a88acf59925ca1d65599bbde3caf1adf95a71efaec9b480b6e0be |
| ta-1aae0b97e8b7 | leaderboard_llm_calm_trend_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0031 | -0.0069 | 0.7857 | 1 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/calm_trend__baseline_random__seed_7.json Hash: sha256:1aae0b97e8b7aa1145e714bb5c9106e126bda5189fbe29b22e1dd2cb63080b2b |
| ta-06927d1dc59c | leaderboard_llm_calm_trend_synthetic_v0_1 | deepseek / deepseek-v4-flash | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0257 | -0.0008 | 0.8333 | 0 | 12 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/calm_trend__deepseek_deepseek_v4_flash.json Hash: sha256:06927d1dc59c5d5c1fd56aa60226c55cd251643347f0a31412483749c417b855 |
| ta-14378629b078 | leaderboard_llm_calm_trend_synthetic_v0_1 | deepseek / deepseek-v4-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0240 | -0.0005 | 0.6667 | 1 | 4 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/calm_trend__deepseek_deepseek_v4_pro.json Hash: sha256:14378629b078648470eeeca89976bdfbfccfdc7033e14a1693c6183f6118f320 |
| ta-4b68d43caa14 | leaderboard_llm_calm_trend_synthetic_v0_1 | poe / claude-opus-4.7 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0309 | -0.0008 | 0.8462 | 0 | 12 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/calm_trend__poe_claude_opus_4_7.json Hash: sha256:4b68d43caa1450e078562b14ce58495e1aacb6aefc71f6c9756131f4fcf2da51 |
| ta-06ea62894f11 | leaderboard_llm_calm_trend_synthetic_v0_1 | poe / gemini-3.1-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0253 | -0.0008 | 0.6667 | 2 | 6 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/calm_trend__poe_gemini_3_1_pro.json Hash: sha256:06ea62894f112c7f946fc6f9a611ccc444d932b899901c9ebe7efc41bb8bdd4b |
| ta-4e7b44b87eb3 | leaderboard_llm_calm_trend_synthetic_v0_1 | poe / glm-5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0257 | -0.0008 | 0.8333 | 0 | 12 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/calm_trend__poe_glm_5.json Hash: sha256:4e7b44b87eb3b6d0d8ea4495f46c7a3a29b4c2a1689c8c34a0cfc3074a606766 |
| ta-1a6cea67f2eb | leaderboard_llm_calm_trend_synthetic_v0_1 | poe / gpt-5.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0288 | -0.0008 | 0.8333 | 0 | 11 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/calm_trend__poe_gpt_5_5.json Hash: sha256:1a6cea67f2ebbc9cb0a5375e05852dcc75fb3e1b8041f606a0f448cc93e18fd5 |
| ta-e3e0649b2c2b | leaderboard_llm_calm_trend_synthetic_v0_1 | poe / kimi-k2.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0319 | -0.0008 | 0.7500 | 1 | 11 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/calm_trend__poe_kimi_k2_5.json Hash: sha256:e3e0649b2c2baae0e0fb580f079af69bee241d1820376430cc06ed5d368c0066 |
| ta-20c5743cca23 | leaderboard_llm_high_vol_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/high_vol__baseline_always_hold__seed_17.json Hash: sha256:20c5743cca23bc3adfad33316cbd1e17914dfad144841e80e7f03bcac364962a |
| ta-05283aa56f64 | leaderboard_llm_high_vol_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/high_vol__baseline_always_hold__seed_21.json Hash: sha256:05283aa56f64bfd8abb14dba02509818b421f1019b5a7492f0cd44dcf10b0b92 |
| ta-cf6987259a15 | leaderboard_llm_high_vol_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/high_vol__baseline_always_hold__seed_27.json Hash: sha256:cf6987259a158d7b5c33aa7611ea4d6d1a7d90e78ab21bed358552a2ed300064 |
| ta-4dfc17c135a0 | leaderboard_llm_high_vol_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/high_vol__baseline_always_hold__seed_33.json Hash: sha256:4dfc17c135a0e1c2ff7e7c715722d2904de20678b09ded0ead3b3c5ff35671b7 |
| ta-85914c9333eb | leaderboard_llm_high_vol_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/high_vol__baseline_always_hold__seed_41.json Hash: sha256:85914c9333eb1b2436090f81d23c90d759b5f115265ac090b854bc16e32e268d |
| ta-56efa955c691 | leaderboard_llm_high_vol_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | -0.0141 | -0.0141 | 0.7333 | 2 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/high_vol__baseline_random__seed_17.json Hash: sha256:56efa955c69154b343ec2a083e11d476053f86c3ee9d5f0bdffc79b73721cc8c |
| ta-a2f838f37d16 | leaderboard_llm_high_vol_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0139 | -0.0022 | 0.7500 | 3 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/high_vol__baseline_random__seed_21.json Hash: sha256:a2f838f37d1695fbfad592e2f9869f7f5efb7f595eb5832e3a5be76d230e9268 |
| ta-9662fa012555 | leaderboard_llm_high_vol_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | -0.0238 | -0.0262 | 0.7857 | 1 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/high_vol__baseline_random__seed_27.json Hash: sha256:9662fa0125554fb00e583209281b52dd9e67069a3472b36ab873694d9486ac90 |
| ta-0a73e46ec48b | leaderboard_llm_high_vol_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0204 | -0.0179 | 0.8125 | 1 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/high_vol__baseline_random__seed_33.json Hash: sha256:0a73e46ec48b495c3adbc03b973c3ab67c6e9686bb246c0837be6c5437dc7860 |
| ta-315a367cb5b1 | leaderboard_llm_high_vol_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0383 | -0.0009 | 0.8125 | 1 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/high_vol__baseline_random__seed_41.json Hash: sha256:315a367cb5b14bf1459643e1319d825134ec43108cc7117c41e1fb116c20cd7a |
| ta-817eef2f37a2 | leaderboard_llm_high_vol_synthetic_v0_1 | deepseek / deepseek-v4-flash | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0033 | -0.0075 | 0.7692 | 1 | 10 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/high_vol__deepseek_deepseek_v4_flash.json Hash: sha256:817eef2f37a2024f091e6fdf02a054517680b7f212ccba64636b56364378d299 |
| ta-05e101952ae2 | leaderboard_llm_high_vol_synthetic_v0_1 | deepseek / deepseek-v4-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0043 | -0.0065 | 0.7500 | 1 | 9 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/high_vol__deepseek_deepseek_v4_pro.json Hash: sha256:05e101952ae24d095162eab0825c55527082b1f979982c7a75502c3935912e4b |
| ta-8aaf791e5111 | leaderboard_llm_high_vol_synthetic_v0_1 | poe / claude-opus-4.7 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0045 | -0.0058 | 0.7500 | 1 | 8 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/high_vol__poe_claude_opus_4_7.json Hash: sha256:8aaf791e5111bb3817f8ef20c8dd6551975587c68bab188c7d874eedb62539b7 |
| ta-11a2de57a72d | leaderboard_llm_high_vol_synthetic_v0_1 | poe / gemini-3.1-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0144 | -0.0040 | 0.7500 | 2 | 6 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/high_vol__poe_gemini_3_1_pro.json Hash: sha256:11a2de57a72d0449395c0bf0a50f78ac7f57d72ecbc1bf3c6699c5d1db60935b |
| ta-96447d495a71 | leaderboard_llm_high_vol_synthetic_v0_1 | poe / glm-5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0030 | -0.0058 | 0.8462 | 0 | 8 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/high_vol__poe_glm_5.json Hash: sha256:96447d495a712b930d8ada9f6c184271ded3e5fc6e87d31980ea136ec8cb6966 |
| ta-e622727b4323 | leaderboard_llm_high_vol_synthetic_v0_1 | poe / gpt-5.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0038 | -0.0065 | 0.7500 | 1 | 8 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/high_vol__poe_gpt_5_5.json Hash: sha256:e622727b432351dcf220bc4a812b614fe2a099d91f7423b9c9c527081b132f88 |
| ta-bcde515dace4 | leaderboard_llm_high_vol_synthetic_v0_1 | poe / kimi-k2.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0037 | -0.0052 | 0.8462 | 0 | 8 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/high_vol__poe_kimi_k2_5.json Hash: sha256:bcde515dace4494efb6ec18fbaec68b891688fc915e3832d865db4c314650928 |
| ta-d15a87438728 | leaderboard_llm_jump_tail_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/jump_tail__baseline_always_hold__seed_29.json Hash: sha256:d15a87438728de761e003597c65c7c580cff1b6a78f4f3ad079f0f2fd8893834 |
| ta-c353fa8d8aac | leaderboard_llm_jump_tail_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/jump_tail__baseline_always_hold__seed_33.json Hash: sha256:c353fa8d8aacc5b5ea87516f297c7172da5d8cf370df9d142517c0fb1f0c70e2 |
| ta-3f4548fcfe1a | leaderboard_llm_jump_tail_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/jump_tail__baseline_always_hold__seed_39.json Hash: sha256:3f4548fcfe1a3e197dd1ad41ac1b07cb0e7b1dd24e1972b6e23fdc2668934853 |
| ta-7f22806dea18 | leaderboard_llm_jump_tail_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/jump_tail__baseline_always_hold__seed_45.json Hash: sha256:7f22806dea18d7b73257b411d6a5d750ee29adb351e43b526d9995f49d6df680 |
| ta-836432d9b11a | leaderboard_llm_jump_tail_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/jump_tail__baseline_always_hold__seed_53.json Hash: sha256:836432d9b11adbcad2d8a42ffc1a71d0c0f073526149cfc79415b43078bb777e |
| ta-5cb2d1f11f6d | leaderboard_llm_jump_tail_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0104 | -0.0342 | 0.7500 | 2 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/jump_tail__baseline_random__seed_29.json Hash: sha256:5cb2d1f11f6dfd2e555c79dc5f0f71b1e2b26006c249a9a5642ab28bdc575b2c |
| ta-b441e1db5528 | leaderboard_llm_jump_tail_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0269 | -0.0087 | 0.8125 | 1 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/jump_tail__baseline_random__seed_33.json Hash: sha256:b441e1db5528bf3f1fe727d50357ba039f5aa98d1e373c9c432bddb84bd293c4 |
| ta-3b6992b6545d | leaderboard_llm_jump_tail_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | -0.0437 | -0.0542 | 0.8125 | 1 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/jump_tail__baseline_random__seed_39.json Hash: sha256:3b6992b6545d1e366f00589815a1c01d9b42100d2ac33b1215cfea0f4549020c |
| ta-a84ff6d7402f | leaderboard_llm_jump_tail_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | -0.0286 | -0.0478 | 0.8125 | 1 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/jump_tail__baseline_random__seed_45.json Hash: sha256:a84ff6d7402f725575ad3704f57f6537a7bc7d16ef16d0eb3456d11283f9cabb |
| ta-6825325cfdb0 | leaderboard_llm_jump_tail_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0942 | -0.0043 | 0.7500 | 2 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/jump_tail__baseline_random__seed_53.json Hash: sha256:6825325cfdb071e85696bf681170db75ef232c3b081d4ef95894aedde038f72d |
| ta-99983feb3ad7 | leaderboard_llm_jump_tail_synthetic_v0_1 | deepseek / deepseek-v4-flash | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0167 | -0.0214 | 0.6667 | 3 | 12 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/jump_tail__deepseek_deepseek_v4_flash.json Hash: sha256:99983feb3ad7745410e61fdfd0a339aeb0bbc61d5a970457be68f04883f7b3cd |
| ta-30b9043e4daf | leaderboard_llm_jump_tail_synthetic_v0_1 | deepseek / deepseek-v4-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0167 | -0.0214 | 0.7692 | 2 | 9 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/jump_tail__deepseek_deepseek_v4_pro.json Hash: sha256:30b9043e4dafd573418092dfedb7a78b772f28bca64a5b234e8ca7ec3d5509c1 |
| ta-0d773b86a41b | leaderboard_llm_jump_tail_synthetic_v0_1 | poe / claude-opus-4.7 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0167 | -0.0214 | 0.6667 | 3 | 10 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/jump_tail__poe_claude_opus_4_7.json Hash: sha256:0d773b86a41b201e3824c946197c2dfaab9c744fcde5158195ba92ad857b53e5 |
| ta-5c0f219d4cea | leaderboard_llm_jump_tail_synthetic_v0_1 | poe / gemini-3.1-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0157 | -0.0075 | 0.6429 | 3 | 9 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/jump_tail__poe_gemini_3_1_pro.json Hash: sha256:5c0f219d4cea28712e7d0bc11c323943280c578d087acffc9f36b8f6af0b33df |
| ta-514d4157571d | leaderboard_llm_jump_tail_synthetic_v0_1 | poe / glm-5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0167 | -0.0214 | 0.7143 | 3 | 11 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/jump_tail__poe_glm_5.json Hash: sha256:514d4157571d384947bba0ea295da35096643d1ca9a16b20ebf6042c59426587 |
| ta-50e42e87acb5 | leaderboard_llm_jump_tail_synthetic_v0_1 | poe / gpt-5.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0167 | -0.0214 | 0.6667 | 3 | 12 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/jump_tail__poe_gpt_5_5.json Hash: sha256:50e42e87acb5866092c8059f76d9af57993deb02219c93b68e81ac9abe3da12a |
| ta-07e91c904d14 | leaderboard_llm_jump_tail_synthetic_v0_1 | poe / kimi-k2.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0167 | -0.0214 | 0.6667 | 3 | 11 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/jump_tail__poe_kimi_k2_5.json Hash: sha256:07e91c904d14031ed2eedd42dead47953a65e7603a8ff6015e9575c30e724063 |
| ta-fd3ab766cf8e | leaderboard_llm_latency_spike_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/latency_spike__baseline_always_hold__seed_65.json Hash: sha256:fd3ab766cf8e028e48f327c867424833d5ea8ea4495792aaeed1fc14504f6840 |
| ta-29d627fb2020 | leaderboard_llm_latency_spike_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/latency_spike__baseline_always_hold__seed_69.json Hash: sha256:29d627fb2020db782d17a8b72728cb7b536d7aaec3951ab7eea2fd528a4c82f3 |
| ta-828ff0cc0185 | leaderboard_llm_latency_spike_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/latency_spike__baseline_always_hold__seed_75.json Hash: sha256:828ff0cc0185c46011bae40b05f4a04308f48947dc1ed65b4c1f00bbd2b7194e |
| ta-0803defa1f4e | leaderboard_llm_latency_spike_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/latency_spike__baseline_always_hold__seed_81.json Hash: sha256:0803defa1f4e25532c5a2f98205b31e04b125ae71dde35ae00a9414c490a2fa5 |
| ta-a6890d44909f | leaderboard_llm_latency_spike_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/latency_spike__baseline_always_hold__seed_89.json Hash: sha256:a6890d44909ff2ab39ad4feab7b6d74a399ba3d3fd0bc3e6411f484f1debe541 |
| ta-a59a34b1b3ff | leaderboard_llm_latency_spike_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0197 | -0.0110 | 0.2143 | 3 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/latency_spike__baseline_random__seed_65.json Hash: sha256:a59a34b1b3ff18154a2c6c49f38542fa96371037902e9ed3b3f37da0fb53b4f1 |
| ta-210515aca353 | leaderboard_llm_latency_spike_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0033 | -0.0039 | 0.1875 | 5 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/latency_spike__baseline_random__seed_69.json Hash: sha256:210515aca353d3f076f5f624fd3b72458a1a61feb8c80ec065f1e5c8210f4ac2 |
| ta-181d6d3404bd | leaderboard_llm_latency_spike_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | -0.0203 | -0.0346 | 0.2500 | 4 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/latency_spike__baseline_random__seed_75.json Hash: sha256:181d6d3404bd3b031d693be4c4d564239c07c7ce33c384d942aad4dca6bcc09f |
| ta-012721f5cc0f | leaderboard_llm_latency_spike_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0475 | -0.0033 | 0.1875 | 5 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/latency_spike__baseline_random__seed_81.json Hash: sha256:012721f5cc0f604897f2e1fcd19669ec09cc92cdd03e8f594541667c844c6f19 |
| ta-381c3fa51ad8 | leaderboard_llm_latency_spike_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | -0.0070 | -0.0070 | 0.3333 | 2 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/latency_spike__baseline_random__seed_89.json Hash: sha256:381c3fa51ad8829c0fe4ae0171dabec4cec1e8c514a09622e74e5997028e5385 |
| ta-ea537e97eec9 | leaderboard_llm_latency_spike_synthetic_v0_1 | deepseek / deepseek-v4-flash | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0010 | -0.0133 | 0.2727 | 1 | 9 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/latency_spike__deepseek_deepseek_v4_flash.json Hash: sha256:ea537e97eec939685beeb77cb4beed02341db376e53bc5254c6e562d2e104c08 |
| ta-c1a0fb026bdd | leaderboard_llm_latency_spike_synthetic_v0_1 | deepseek / deepseek-v4-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0010 | -0.0133 | 0.2727 | 1 | 9 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/latency_spike__deepseek_deepseek_v4_pro.json Hash: sha256:c1a0fb026bddee5eabc86cdf6201a1d7b40a5053949c59784d4eeb5736834250 |
| ta-88f9614808b5 | leaderboard_llm_latency_spike_synthetic_v0_1 | poe / claude-opus-4.7 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0010 | -0.0133 | 0.2727 | 1 | 9 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/latency_spike__poe_claude_opus_4_7.json Hash: sha256:88f9614808b564cf60fedd3fbca8d5d913a86c46ec07bf82220d8b12ade92cfa |
| ta-6dc1d83bcc76 | leaderboard_llm_latency_spike_synthetic_v0_1 | poe / gemini-3.1-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0329 | -0.0091 | 0.2308 | 3 | 13 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/latency_spike__poe_gemini_3_1_pro.json Hash: sha256:6dc1d83bcc76a051907218994fb623e1d7286df8fa8664adbf1b3b03ae86020d |
| ta-6dd5c84b9142 | leaderboard_llm_latency_spike_synthetic_v0_1 | poe / glm-5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 0.8750 | synthetic-market (daily, 2 symbols) | 0.0010 | -0.0133 | 0.3000 | 0 | 9 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/latency_spike__poe_glm_5.json Hash: sha256:6dd5c84b914274571c34c5e89aca8e0a42baa7664f188959d7590e18f53bd8c2 |
| ta-fb175dbf03cc | leaderboard_llm_latency_spike_synthetic_v0_1 | poe / gpt-5.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0010 | -0.0133 | 0.2727 | 1 | 9 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/latency_spike__poe_gpt_5_5.json Hash: sha256:fb175dbf03cc28342ba82a2296241ae915c6b0db752137ebd9a4d62ecf4c509e |
| ta-d83fec4e3980 | leaderboard_llm_latency_spike_synthetic_v0_1 | poe / kimi-k2.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0010 | -0.0133 | 0.2727 | 1 | 9 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/latency_spike__poe_kimi_k2_5.json Hash: sha256:d83fec4e3980482715bfe34d19b3765b66206a7df8d12e01d2a4db9b646d3d67 |
| ta-9910a6a580f0 | leaderboard_llm_liquidity_collapse_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/liquidity_collapse__baseline_always_hold__seed_41.json Hash: sha256:9910a6a580f0e7ce7e0368e0551094e77b900e94a6c32c12ba3051ce4e781336 |
| ta-0a0d435433d7 | leaderboard_llm_liquidity_collapse_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/liquidity_collapse__baseline_always_hold__seed_45.json Hash: sha256:0a0d435433d7c7222d0c1b81e55a7d795d593b8ae4c9440236abf1a5ea5ddac0 |
| ta-5a12ea693cfd | leaderboard_llm_liquidity_collapse_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/liquidity_collapse__baseline_always_hold__seed_51.json Hash: sha256:5a12ea693cfdac56ae45404dfc240283b9e2eb357d46f03c8d93b09fea96885c |
| ta-1df45ac7fc12 | leaderboard_llm_liquidity_collapse_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/liquidity_collapse__baseline_always_hold__seed_57.json Hash: sha256:1df45ac7fc129370790b2124042d2d7dbe942e5743f919421b25a019cb48bcce |
| ta-1b934080b980 | leaderboard_llm_liquidity_collapse_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/liquidity_collapse__baseline_always_hold__seed_65.json Hash: sha256:1b934080b980734cfeb56fb6b0b9e976349f8a15db837bffc9d0db0923034e4c |
| ta-8786d22669aa | leaderboard_llm_liquidity_collapse_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0094 | -0.0165 | 0.8125 | 1 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/liquidity_collapse__baseline_random__seed_41.json Hash: sha256:8786d22669aa0e92d16c586e0c8c53a270d1452558760ec308e950b2e2f1c3d5 |
| ta-3281dbcfb47a | leaderboard_llm_liquidity_collapse_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0574 | -0.0057 | 0.8125 | 1 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/liquidity_collapse__baseline_random__seed_45.json Hash: sha256:3281dbcfb47a6966a3391c0c8217b40c45fa6bbef6402d3fb2523628fe133b52 |
| ta-20120db93e7e | leaderboard_llm_liquidity_collapse_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0304 | -0.0088 | 0.8667 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/liquidity_collapse__baseline_random__seed_51.json Hash: sha256:20120db93e7eaf4b8c37432dd83f066a7a0faae26cd5a1056fdf234daec5eda2 |
| ta-57c175c1aafa | leaderboard_llm_liquidity_collapse_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | -0.0012 | -0.0141 | 0.8571 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/liquidity_collapse__baseline_random__seed_57.json Hash: sha256:57c175c1aafaa5ef18ff230882abbd605fdac07847bce2743d0cd5f7d75541d3 |
| ta-f6a55649e5f6 | leaderboard_llm_liquidity_collapse_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0297 | -0.0090 | 0.7333 | 2 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/liquidity_collapse__baseline_random__seed_65.json Hash: sha256:f6a55649e5f6e1952ced589b86e5d0c1bd2297df9e08c9e2a4acdc9936ae38a4 |
| ta-9700e6062d4b | leaderboard_llm_liquidity_collapse_synthetic_v0_1 | deepseek / deepseek-v4-flash | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0442 | -0.0012 | 0.8000 | 0 | 9 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/liquidity_collapse__deepseek_deepseek_v4_flash.json Hash: sha256:9700e6062d4b474e4b75a15f58f01608084464508abbbef9807375942083fba8 |
| ta-16ce8dd0a233 | leaderboard_llm_liquidity_collapse_synthetic_v0_1 | deepseek / deepseek-v4-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0236 | -0.0167 | 0.6250 | 1 | 6 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/liquidity_collapse__deepseek_deepseek_v4_pro.json Hash: sha256:16ce8dd0a233d319fc3984a13e6024c0692cacccbf4e02ef3d2c110f14433b2b |
| ta-46b6ba61345a | leaderboard_llm_liquidity_collapse_synthetic_v0_1 | poe / claude-opus-4.7 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0442 | -0.0012 | 0.8000 | 0 | 8 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/liquidity_collapse__poe_claude_opus_4_7.json Hash: sha256:46b6ba61345a587cc724ae55207f6383643577acbbb56e73071282f13397a8b7 |
| ta-21b67d022098 | leaderboard_llm_liquidity_collapse_synthetic_v0_1 | poe / gemini-3.1-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0209 | -0.0102 | 0.7692 | 1 | 10 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/liquidity_collapse__poe_gemini_3_1_pro.json Hash: sha256:21b67d022098b525010682bb7d25854f52b01f1d163c5aceb2a2158fdbbf65ab |
| ta-5d82e02acbd9 | leaderboard_llm_liquidity_collapse_synthetic_v0_1 | poe / glm-5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0150 | -0.0372 | 0.8462 | 0 | 11 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/liquidity_collapse__poe_glm_5.json Hash: sha256:5d82e02acbd9a164b94d14fb5a7ae3a6a67832a73427ba58375ba3ab0c4a4b7f |
| ta-a4b7fa9d47d2 | leaderboard_llm_liquidity_collapse_synthetic_v0_1 | poe / gpt-5.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0442 | -0.0012 | 0.8000 | 0 | 9 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/liquidity_collapse__poe_gpt_5_5.json Hash: sha256:a4b7fa9d47d238c7729587ceed72dc28d8571a39170d90adff58a1719a347316 |
| ta-4a456013198b | leaderboard_llm_liquidity_collapse_synthetic_v0_1 | poe / kimi-k2.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0442 | -0.0012 | 0.8000 | 0 | 9 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/liquidity_collapse__poe_kimi_k2_5.json Hash: sha256:4a456013198b2779569bf94b150144f3b0dcc4c21e1cf60e68a74e86d68224d6 |
| ta-f6d12eaf7bc6 | leaderboard_llm_spread_explosion_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/spread_explosion__baseline_always_hold__seed_53.json Hash: sha256:f6d12eaf7bc644ea3465f144c226ea058f478e474156958d1a06b71bd2d66306 |
| ta-ea8e3471713c | leaderboard_llm_spread_explosion_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/spread_explosion__baseline_always_hold__seed_57.json Hash: sha256:ea8e3471713cc1a994a25dae87a051f02009de413656bc3aaa79b9310a524cb6 |
| ta-90ffb0a33779 | leaderboard_llm_spread_explosion_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/spread_explosion__baseline_always_hold__seed_63.json Hash: sha256:90ffb0a33779be056115d57464bbf2bd2ddd4ab194ae7539f29f0d51ceda030b |
| ta-e0bc6cfad842 | leaderboard_llm_spread_explosion_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/spread_explosion__baseline_always_hold__seed_69.json Hash: sha256:e0bc6cfad842b84395e7dedd18938b4773af9e744be27619734fb5eaad0eaf73 |
| ta-1aa1a999eea5 | leaderboard_llm_spread_explosion_synthetic_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/spread_explosion__baseline_always_hold__seed_77.json Hash: sha256:1aa1a999eea5f773a61eef4bd39c025650098c473622665e493b2d4c2fdc764b |
| ta-41b6d20be85b | leaderboard_llm_spread_explosion_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | -0.0007 | -0.0392 | 0.7500 | 2 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/spread_explosion__baseline_random__seed_53.json Hash: sha256:41b6d20be85b2c237ec9b8140eb59df1e79479ffe5eaf947bc1160e3edb8bd10 |
| ta-63351965b1cf | leaderboard_llm_spread_explosion_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | -0.0108 | -0.0137 | 0.8571 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/spread_explosion__baseline_random__seed_57.json Hash: sha256:63351965b1cfdb1786964410601ab87fd75fa45a24c052117eb114baf04a6f76 |
| ta-218ecb3bb308 | leaderboard_llm_spread_explosion_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | -0.0263 | -0.0458 | 0.8750 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/spread_explosion__baseline_random__seed_63.json Hash: sha256:218ecb3bb308847c7680e38f60e7ab0de292e523d451a2d9344eafeb6f29aba6 |
| ta-6ecc04dbeac2 | leaderboard_llm_spread_explosion_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0141 | -0.0122 | 0.8750 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/spread_explosion__baseline_random__seed_69.json Hash: sha256:6ecc04dbeac2b366fcea8e6dac63f9945fd036f43c3a50480d44fc3d707bd7bd |
| ta-e0cc2136ae91 | leaderboard_llm_spread_explosion_synthetic_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0137 | -0.0177 | 0.7500 | 2 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/model_matrix/spread_explosion__baseline_random__seed_77.json Hash: sha256:e0cc2136ae9183ca6dbb4ae277854071b92dd7522f56fd417477156f818c1afe |
| ta-3f07c8ab9963 | leaderboard_llm_spread_explosion_synthetic_v0_1 | deepseek / deepseek-v4-flash | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | -0.0155 | -0.0424 | 0.7692 | 1 | 9 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/spread_explosion__deepseek_deepseek_v4_flash.json Hash: sha256:3f07c8ab996333cd2453f3b7b2db76cb42e52d9a6894de08f6cfdb3ed5dfbe53 |
| ta-e51d439c3fb3 | leaderboard_llm_spread_explosion_synthetic_v0_1 | deepseek / deepseek-v4-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | 0.0048 | -0.0206 | 0.8333 | 1 | 4 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/spread_explosion__deepseek_deepseek_v4_pro.json Hash: sha256:e51d439c3fb379a071bbd1e72982a586c39a415d220585bbacf51c14148b6beb |
| ta-fe1be1a36c38 | leaderboard_llm_spread_explosion_synthetic_v0_1 | poe / claude-opus-4.7 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | -0.0155 | -0.0424 | 0.7692 | 1 | 9 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/spread_explosion__poe_claude_opus_4_7.json Hash: sha256:fe1be1a36c38fc5a4ca851b2db5ba2fe5a0b496a252fb068dfe642a357efd731 |
| ta-895143237d66 | leaderboard_llm_spread_explosion_synthetic_v0_1 | poe / gemini-3.1-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | -0.0065 | -0.0334 | 0.7857 | 1 | 9 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/spread_explosion__poe_gemini_3_1_pro.json Hash: sha256:895143237d66885366809bd9ad86995312a5c136e822965ac0cfe5ec35840663 |
| ta-886315ca40c5 | leaderboard_llm_spread_explosion_synthetic_v0_1 | poe / glm-5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 0.8750 | synthetic-market (daily, 2 symbols) | -0.0116 | -0.0269 | 0.7500 | 1 | 8 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/spread_explosion__poe_glm_5.json Hash: sha256:886315ca40c5b3196fbfb1a906376a777bc3e70a63ffdb6f41bc6293efaeeefa |
| ta-4cb4062b028a | leaderboard_llm_spread_explosion_synthetic_v0_1 | poe / gpt-5.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | -0.0155 | -0.0424 | 0.7692 | 1 | 9 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/spread_explosion__poe_gpt_5_5.json Hash: sha256:4cb4062b028a14852be60cf99ba6665d6dd8405044ffceae7c5ea2e1b253bc80 |
| ta-0820021e1502 | leaderboard_llm_spread_explosion_synthetic_v0_1 | poe / kimi-k2.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | synthetic-market (daily, 2 symbols) | -0.0007 | -0.0233 | 0.7500 | 1 | 8 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/model_matrix/spread_explosion__poe_kimi_k2_5.json Hash: sha256:0820021e150264610213f88640e10a0ab2519ea4914a62403aba3d2b341ff1c1 |
| ta-f85e3b63f63c | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__baseline_always_hold__seed_11.json Hash: sha256:f85e3b63f63cbce7b774bc6b780b1af15b5edf5d0ca93c3f6a25264e50b9c9ab |
| ta-14330dea416c | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__baseline_always_hold__seed_17.json Hash: sha256:14330dea416c278668f1e7849a9fdbfeeb57db6f3f7d1ba51d2c1e3d88b8cf0a |
| ta-ed60bebc2bac | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__baseline_always_hold__seed_23.json Hash: sha256:ed60bebc2bac4e1f1ea4b9e66ef561a009c46d57a1b154c1db3cfa08e6c40efb |
| ta-493b31723e1a | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__baseline_always_hold__seed_31.json Hash: sha256:493b31723e1aa3dcba513cf51cfd783c22aae061c622a34b572f305ab377f653 |
| ta-bb97d5793bbf | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__baseline_always_hold__seed_7.json Hash: sha256:bb97d5793bbfb0ef392f9919a89164f9cc18106a6d1457ef0f56ceb39a1b0f3f |
| ta-7aa11e67f69a | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0721 | -0.0226 | 0.7353 | 6 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__baseline_random__seed_11.json Hash: sha256:7aa11e67f69ac91c26333c66126c35925ee3b1caf5c27830a528206aab38b2ab |
| ta-6a1d2a467cd8 | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.1061 | -0.1585 | 0.8000 | 4 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__baseline_random__seed_17.json Hash: sha256:6a1d2a467cd8f4ba8686c350a869757c01a54d2d1971fb44058111756acf0c09 |
| ta-40ca7e76523a | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.0982 | -0.1732 | 0.6765 | 8 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__baseline_random__seed_23.json Hash: sha256:40ca7e76523ad47f72bd496a7da8340772e582349d8838ac623e59f992ee90ef |
| ta-9661fd5891a5 | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0416 | -0.0463 | 0.8333 | 2 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__baseline_random__seed_31.json Hash: sha256:9661fd5891a5da906dc8ade2d2fa80d64129dba22794df31e618ed2d2b60d6a3 |
| ta-480274e4c362 | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.0695 | -0.1116 | 0.7429 | 6 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__baseline_random__seed_7.json Hash: sha256:480274e4c362e189011e2b908c978a44aae67cc60c52e58c5d72c81eb20352d2 |
| ta-b89707bd8492 | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | deepseek / deepseek-v4-flash | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.1014 | -0.1658 | 0.8519 | 1 | 25 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__deepseek_deepseek_v4_flash__seed_11.json Hash: sha256:b89707bd849277cdf8776f4af235d603a67b961bf7d6ef717147e60427e000aa |
| ta-f2dfd5a8b623 | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | deepseek / deepseek-v4-flash | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0212 | -0.0550 | 0.7037 | 5 | 27 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__deepseek_deepseek_v4_flash__seed_17.json Hash: sha256:f2dfd5a8b6238186f04f9d9f8f464d49ac255527360b6ae73b2e38e7f196611c |
| ta-e1889c42a03e | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | deepseek / deepseek-v4-flash | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.0897 | -0.1058 | 0.6087 | 6 | 24 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__deepseek_deepseek_v4_flash__seed_23.json Hash: sha256:e1889c42a03e98b5be5b91bf45ad1c3ee83d296f9ffbc78e1f0fc94fbf7cd17b |
| ta-f08bc58ea950 | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | deepseek / deepseek-v4-flash | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.1346 | -0.1943 | 0.8800 | 3 | 22 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__deepseek_deepseek_v4_flash__seed_31.json Hash: sha256:f08bc58ea95057405e8ea3d7cef8f00b6560b9ab6d91816c3626b1f2b5a2e463 |
| ta-ea5d4d3b4eff | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | deepseek / deepseek-v4-flash | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.1601 | -0.2058 | 0.7273 | 4 | 19 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__deepseek_deepseek_v4_flash__seed_7.json Hash: sha256:ea5d4d3b4eff3796378b50b0623ab6f26fee4a241cc341ae52993ec833e8e809 |
| ta-569966a71c3d | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | deepseek / deepseek-v4-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.1343 | -0.1963 | 0.7308 | 4 | 23 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__deepseek_deepseek_v4_pro__seed_11.json Hash: sha256:569966a71c3dfc2daf35bee61204dd10b428f29d1d65c9308ec553a4b62b4571 |
| ta-0a4a0fab532e | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | deepseek / deepseek-v4-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0012 | -0.0550 | 0.6667 | 5 | 23 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__deepseek_deepseek_v4_pro__seed_17.json Hash: sha256:0a4a0fab532eadafa8478075e92fc69b338fed601e3868186dd33d6ae32e31ac |
| ta-fea519cd8389 | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | deepseek / deepseek-v4-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.0203 | -0.0527 | 0.6000 | 5 | 23 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__deepseek_deepseek_v4_pro__seed_23.json Hash: sha256:fea519cd8389993a75dc41fa894adb03bb374ef8d64cd348e828304c81f60612 |
| ta-a378c6db80c2 | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | deepseek / deepseek-v4-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.0144 | -0.0529 | 0.7619 | 5 | 18 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__deepseek_deepseek_v4_pro__seed_31.json Hash: sha256:a378c6db80c2cbd502b9f1113eeb6137d145074ba54f27a7092f413a645f03c2 |
| ta-c7d453e14955 | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | deepseek / deepseek-v4-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.1385 | -0.1854 | 0.7200 | 5 | 23 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__deepseek_deepseek_v4_pro__seed_7.json Hash: sha256:c7d453e149555ca531d1cd714e83c6014307d56f879e09ca958b20c5d07a7490 |
| ta-22ab171611c0 | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | poe / claude-opus-4.7 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.1343 | -0.1963 | 0.7308 | 4 | 23 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__poe_claude_opus_4_7__seed_11.json Hash: sha256:22ab171611c072826b238ce1cc15c171366fe58b1351d19c1aafc488f5afdbef |
| ta-f4dee12a74b8 | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | poe / claude-opus-4.7 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0207 | -0.0550 | 0.7500 | 4 | 28 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__poe_claude_opus_4_7__seed_17.json Hash: sha256:f4dee12a74b88dfe881a0619657ab3c063f84ddbbb28985199352651b4595769 |
| ta-b70991c0a097 | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | poe / claude-opus-4.7 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.0473 | -0.0760 | 0.7037 | 5 | 27 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__poe_claude_opus_4_7__seed_23.json Hash: sha256:b70991c0a0979efb854bb5996cf35f26280300d10f296751125c401d5d77905b |
| ta-2d83546bf33d | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | poe / claude-opus-4.7 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.1039 | -0.1619 | 0.8214 | 5 | 23 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__poe_claude_opus_4_7__seed_31.json Hash: sha256:2d83546bf33def821d608c3e9ffbca0890738f76fd920573f152517e0ee15375 |
| ta-a9db83b7afa7 | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | poe / claude-opus-4.7 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.1385 | -0.1854 | 0.7200 | 5 | 22 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__poe_claude_opus_4_7__seed_7.json Hash: sha256:a9db83b7afa7e61668ee0989d4699d560aec4cf51c7c949bf17014c37c2f4484 |
| ta-5a4b4ac82a0d | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | poe / gemini-3.1-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.1343 | -0.1963 | 0.7308 | 4 | 21 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__poe_gemini_3_1_pro__seed_11.json Hash: sha256:5a4b4ac82a0d0da517fbbe4d8f95a0a10c4010b2722b7a28e4b37f6054f9bc6f |
| ta-5ea4bfedc3de | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | poe / gemini-3.1-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0256 | -0.0550 | 0.6538 | 6 | 22 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__poe_gemini_3_1_pro__seed_17.json Hash: sha256:5ea4bfedc3de266b2a62f83f1c5d1f154f11f8a283ab571c5d4c5c190a9d5343 |
| ta-95fb805c9921 | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | poe / gemini-3.1-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.1277 | -0.1851 | 0.5769 | 8 | 22 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__poe_gemini_3_1_pro__seed_23.json Hash: sha256:95fb805c99219cab83f4c8632a3f576ee60a1668b794b5a80897dbee947df697 |
| ta-7ddf880bf3ec | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | poe / gemini-3.1-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.0577 | -0.0881 | 0.8095 | 4 | 18 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__poe_gemini_3_1_pro__seed_31.json Hash: sha256:7ddf880bf3ecc2f57562998f61359fc5b456f3afe44293c1867c492a972c66e6 |
| ta-292c6fde1882 | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | poe / gemini-3.1-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.1385 | -0.1854 | 0.6800 | 6 | 21 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__poe_gemini_3_1_pro__seed_7.json Hash: sha256:292c6fde1882bb9ccd9b5932978aa731488fab1fcc39d10181e56e4a7c19f682 |
| ta-26e5eb0a6305 | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | poe / glm-5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.1343 | -0.1963 | 0.7308 | 4 | 23 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__poe_glm_5__seed_11.json Hash: sha256:26e5eb0a630593e6c1ad5fb3383fc32e0ef3c9d526c30badfabe9c52d14112f3 |
| ta-4954717bfa81 | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | poe / glm-5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0212 | -0.0550 | 0.7037 | 5 | 27 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__poe_glm_5__seed_17.json Hash: sha256:4954717bfa810f0fa079609f0928deb6ddeb4067ca14f40144a6f41b2544f12b |
| ta-0876e7751807 | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | poe / glm-5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 0.9167 | yahoo-finance-csv (weekly, 3 symbols) | 0.0202 | -0.0550 | 0.6667 | 5 | 27 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__poe_glm_5__seed_23.json Hash: sha256:0876e7751807f8bb022bdeba8ad3c4146e69658500633513908246bf19875c3a |
| ta-2e43162b667a | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | poe / glm-5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 0.9167 | yahoo-finance-csv (weekly, 3 symbols) | -0.1228 | -0.1841 | 0.7857 | 6 | 24 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__poe_glm_5__seed_31.json Hash: sha256:2e43162b667af37195ee0d9073762bd1b9df1ff950930d800d052a4a5c63445c |
| ta-a4b9ecae134c | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | poe / glm-5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.1385 | -0.1854 | 0.7200 | 5 | 23 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__poe_glm_5__seed_7.json Hash: sha256:a4b9ecae134cf82e08792e02ec69caa87a3f1b9cde96f60f71babf0472f5302e |
| ta-9f33639094f7 | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | poe / gpt-5.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.1343 | -0.1963 | 0.7308 | 4 | 21 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__poe_gpt_5_5__seed_11.json Hash: sha256:9f33639094f774c2f0a9a896f158509ae3b1ff083b83b977e2a5bca8e0e3c29e |
| ta-7b3f159e7656 | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | poe / gpt-5.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0212 | -0.0550 | 0.7037 | 5 | 27 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__poe_gpt_5_5__seed_17.json Hash: sha256:7b3f159e765622987b44ce741cbeec0840f668f21f3b97ad4fb8023c6b0a083e |
| ta-7f028c544df5 | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | poe / gpt-5.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.0672 | -0.1007 | 0.6154 | 7 | 28 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__poe_gpt_5_5__seed_23.json Hash: sha256:7f028c544df549aa0c3de9068f6c741c95d0132fb3a223ee826ac367818f831d |
| ta-bbf88fa62999 | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | poe / gpt-5.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.1228 | -0.1841 | 0.7857 | 6 | 24 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__poe_gpt_5_5__seed_31.json Hash: sha256:bbf88fa62999a501592cbc262422935b190a60f600b355d1b056c532cd6a2fca |
| ta-5872db22a33e | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | poe / gpt-5.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.1385 | -0.1854 | 0.7200 | 5 | 22 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__poe_gpt_5_5__seed_7.json Hash: sha256:5872db22a33e2b834ed39a97301088d92366296f63ff23bb437d19c0e1c6d0fa |
| ta-0bc65cff62d0 | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | poe / kimi-k2.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.1420 | -0.2040 | 0.6296 | 7 | 25 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__poe_kimi_k2_5__seed_11.json Hash: sha256:0bc65cff62d0375547027e0891a449ce57dcd151cfd5a3e4691c4f14b142dbb3 |
| ta-8f36b888685c | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | poe / kimi-k2.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0256 | -0.0550 | 0.6296 | 7 | 23 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__poe_kimi_k2_5__seed_17.json Hash: sha256:8f36b888685c5032ed7de3a2ecf80081464f6378de074f23fe3ef2af50ed648b |
| ta-243b6ff130f2 | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | poe / kimi-k2.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.0674 | -0.1008 | 0.6429 | 7 | 30 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__poe_kimi_k2_5__seed_23.json Hash: sha256:243b6ff130f2da101d196e36bfc304968cbbce2b02f61eb7202cb401cf29cea7 |
| ta-92023a00f17d | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | poe / kimi-k2.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.0527 | -0.0994 | 0.8462 | 3 | 19 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__poe_kimi_k2_5__seed_31.json Hash: sha256:92023a00f17d034051c434acf28cc6513f0c07e034e2d383299851515c88ed7e |
| ta-5e0395334c9d | leaderboard_real_yahoo_2022_gspc_btc_btcf_weekly_v0_1 | poe / kimi-k2.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.1258 | -0.1755 | 0.6800 | 7 | 18 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/rates_drawdown__poe_kimi_k2_5__seed_7.json Hash: sha256:5e0395334c9d4444fe13204f3141b699128fd402543589373c5f872d09d78683 |
| ta-9f03f4f1c0e4 | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__baseline_always_hold__seed_11.json Hash: sha256:9f03f4f1c0e4ac0c633e4b6c73b685ee78bc23c401d49400cbbb11b18433a75a |
| ta-61f784cac384 | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__baseline_always_hold__seed_17.json Hash: sha256:61f784cac384e97886b974e740ab1662ebd20081b427a4ad85ca4d37bda8e817 |
| ta-098a9999647a | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__baseline_always_hold__seed_23.json Hash: sha256:098a9999647aa9c942965a12d401a6b981e5aa706192ba76f93b78cb9bf3b710 |
| ta-ff6a1c057835 | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__baseline_always_hold__seed_31.json Hash: sha256:ff6a1c057835df2d5780284b1ee7f3df5469b5fe72dcc1b5895c821a99657205 |
| ta-e61a71e6b1cd | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | baseline / always-hold | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0000 | 0.0000 | 0.0000 | 0 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__baseline_always_hold__seed_7.json Hash: sha256:e61a71e6b1cdcdabdae52507a0336d33422736193f483de1c01ae76ee86bbb57 |
| ta-81e598e08f3a | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0758 | -0.0546 | 0.7647 | 5 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__baseline_random__seed_11.json Hash: sha256:81e598e08f3a7645df441d4d465b5c00893f66d58ce9e85a0ed64a8509cf7f42 |
| ta-490b13bf92d9 | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0542 | -0.0278 | 0.7714 | 5 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__baseline_random__seed_17.json Hash: sha256:490b13bf92d9e3bb9fa0f29d0eac85ab051bdb9eb90be5ee879578ae598e9826 |
| ta-923a9957f19e | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.0635 | -0.1216 | 0.8286 | 3 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__baseline_random__seed_23.json Hash: sha256:923a9957f19e628d1c4681cb6061a4e2fc0dfc7102b350ffa1b24229a5d64231 |
| ta-b19cdee87bee | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0416 | -0.0534 | 0.8000 | 3 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__baseline_random__seed_31.json Hash: sha256:b19cdee87bee635aafa1316d2df05095449f6b549d8a54d1331f7e23745fcc1c |
| ta-f17b01957b04 | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | baseline / random | rationale | true | stress-onlydeterministic-baseline | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0692 | -0.0449 | 0.6857 | 8 | 0 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: deterministic baseline under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__baseline_random__seed_7.json Hash: sha256:f17b01957b04e4ad8361d2daf47d5ec0dfff9cbb0c14dca6fcae120d2213b386 |
| ta-62847a2b522a | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | deepseek / deepseek-v4-flash | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.0111 | -0.0566 | 0.7857 | 3 | 37 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__deepseek_deepseek_v4_flash__seed_11.json Hash: sha256:62847a2b522abb58940534288fadfbafd5975cbdbcf5baad5442eaf670e45759 |
| ta-1e387f026feb | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | deepseek / deepseek-v4-flash | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.0111 | -0.0566 | 0.8000 | 3 | 31 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__deepseek_deepseek_v4_flash__seed_17.json Hash: sha256:1e387f026febbacee227ba0039137468355fe5a1757a0165e9dbeb08f0c9b873 |
| ta-1798bc1768be | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | deepseek / deepseek-v4-flash | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.1093 | -0.1199 | 0.7097 | 6 | 36 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__deepseek_deepseek_v4_flash__seed_23.json Hash: sha256:1798bc1768be5d50328fc202a4a4cdd4975e8ce252b005840603fa284f7bce8f |
| ta-6b9d42757b1a | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | deepseek / deepseek-v4-flash | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.1184 | -0.1199 | 0.6786 | 6 | 30 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__deepseek_deepseek_v4_flash__seed_31.json Hash: sha256:6b9d42757b1abda1b4758b22401421cea1f0cf76bc9c0c8f16578b7dc87143a4 |
| ta-3c681fbadada | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | deepseek / deepseek-v4-flash | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0055 | -0.0566 | 0.8276 | 4 | 37 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__deepseek_deepseek_v4_flash__seed_7.json Hash: sha256:3c681fbadada5f0bf46fa1d37623912ea6332af72327d1a9020a6bbe24cec886 |
| ta-a72a9fc81b75 | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | deepseek / deepseek-v4-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.0229 | -0.0411 | 0.7143 | 3 | 24 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__deepseek_deepseek_v4_pro__seed_11.json Hash: sha256:a72a9fc81b7594188c74ddff29bf1ff57e4e2e46271da26f0ab1b3837b88051d |
| ta-6e1fb210d152 | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | deepseek / deepseek-v4-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.0382 | -0.0509 | 0.7500 | 3 | 17 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__deepseek_deepseek_v4_pro__seed_17.json Hash: sha256:6e1fb210d152ac2236214bed0ebf27e8ead73f909ed6fc8d75a21caf844876c9 |
| ta-6e36090be7cc | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | deepseek / deepseek-v4-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.0411 | -0.0509 | 0.6667 | 4 | 22 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__deepseek_deepseek_v4_pro__seed_23.json Hash: sha256:6e36090be7ccb6b1795d0051dbd717f9e88c61619513be558b8063c4e3e6b664 |
| ta-33360ef68cb9 | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | deepseek / deepseek-v4-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.1184 | -0.1199 | 0.7391 | 5 | 18 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__deepseek_deepseek_v4_pro__seed_31.json Hash: sha256:33360ef68cb91f0c48e05321eef5b68d1ea009039a70f773823e7fe4eea2d80a |
| ta-956794b2c8db | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | deepseek / deepseek-v4-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.0055 | -0.0509 | 0.7692 | 5 | 25 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__deepseek_deepseek_v4_pro__seed_7.json Hash: sha256:956794b2c8db6a940c4a4da5ed967f3e9b4cd7e97d46a98b8277e73a45e55d67 |
| ta-5d7be6f375fe | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | poe / claude-opus-4.7 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 0.9167 | yahoo-finance-csv (weekly, 3 symbols) | -0.0146 | -0.0424 | 0.7200 | 4 | 34 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__poe_claude_opus_4_7__seed_11.json Hash: sha256:5d7be6f375fed388458db37b8493375fa4431b0c71424633b078c101d68d7281 |
| ta-f3e3116e2730 | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | poe / claude-opus-4.7 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0182 | -0.0583 | 0.8000 | 3 | 31 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__poe_claude_opus_4_7__seed_17.json Hash: sha256:f3e3116e2730f9c224e2a65f64368089f73aa0f882a94e1bb253036d0591475e |
| ta-76bd42463b57 | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | poe / claude-opus-4.7 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.0101 | -0.0443 | 0.6538 | 6 | 35 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__poe_claude_opus_4_7__seed_23.json Hash: sha256:76bd42463b57a75f69e3d3002f50f8aec6cc568d261a2f8755421b75aff4a5a4 |
| ta-52282c9a9d51 | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | poe / claude-opus-4.7 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.0429 | -0.0446 | 0.6957 | 4 | 33 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__poe_claude_opus_4_7__seed_31.json Hash: sha256:52282c9a9d512fdc58d2416a90e15b0cf5b6c92bd4e8e707c027a9d8e53763b8 |
| ta-28d78d5b9968 | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | poe / claude-opus-4.7 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0439 | -0.0540 | 0.8125 | 3 | 40 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__poe_claude_opus_4_7__seed_7.json Hash: sha256:28d78d5b99688fb8246c25de33d70eb25bef0c920d630aaea370f8a4981a2de8 |
| ta-4415accc1ec9 | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | poe / gemini-3.1-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0166 | -0.0413 | 0.7600 | 3 | 27 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__poe_gemini_3_1_pro__seed_11.json Hash: sha256:4415accc1ec93cd393a23987a0af4dc53352aee940934ba53fc3391c16412004 |
| ta-8ec660b9803b | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | poe / gemini-3.1-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0315 | -0.0423 | 0.7727 | 3 | 23 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__poe_gemini_3_1_pro__seed_17.json Hash: sha256:8ec660b9803be9925ef87d04700ed157ae065178e380f301e5156e3cceb11010 |
| ta-1c7196d1253d | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | poe / gemini-3.1-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.0529 | -0.1212 | 0.6333 | 8 | 23 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__poe_gemini_3_1_pro__seed_23.json Hash: sha256:1c7196d1253dc98c256131df5e97c0da9f83345a733247277bf09f9d2ab8d424 |
| ta-499250f22d86 | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | poe / gemini-3.1-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.0974 | -0.1207 | 0.6296 | 7 | 19 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__poe_gemini_3_1_pro__seed_31.json Hash: sha256:499250f22d862e51cdc5116d25099137f8a82039c99ae1b5a1acdc8b193857df |
| ta-e57b4893cff3 | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | poe / gemini-3.1-pro | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0486 | -0.0423 | 0.7308 | 6 | 24 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__poe_gemini_3_1_pro__seed_7.json Hash: sha256:e57b4893cff3f049e6c6e305af3a88fee6ca2e13ac651d124e4661adee1a5e68 |
| ta-1bd2254f8b18 | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | poe / glm-5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.0116 | -0.0566 | 0.8148 | 3 | 32 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__poe_glm_5__seed_11.json Hash: sha256:1bd2254f8b183f8c6a02875d9f1dbffa07446540078ad30befd3891110d8b349 |
| ta-bfea064200d1 | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | poe / glm-5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.0107 | -0.0571 | 0.7826 | 3 | 26 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__poe_glm_5__seed_17.json Hash: sha256:bfea064200d15b20d01f32712d8bf81dd1c791d91b59639e0f5838a89560c805 |
| ta-08978397f6df | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | poe / glm-5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.1093 | -0.1199 | 0.7097 | 6 | 36 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__poe_glm_5__seed_23.json Hash: sha256:08978397f6df59a199b9c709ca8b26edc2c6a711c92256cd0ab8e25fe35b4600 |
| ta-307185d6b99c | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | poe / glm-5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.1184 | -0.1199 | 0.6786 | 6 | 30 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__poe_glm_5__seed_31.json Hash: sha256:307185d6b99c6113542439cb76cea0b7b759d52506bd29bde8cf6f5e4294ffb4 |
| ta-694e250680fe | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | poe / glm-5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0066 | -0.0566 | 0.8214 | 4 | 33 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__poe_glm_5__seed_7.json Hash: sha256:694e250680feb51bf86760ed8dd1f6fe0cc80c9305bba09c0d922ddd2222baac |
| ta-bd8f2291a9fb | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | poe / gpt-5.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.0082 | -0.0567 | 0.7931 | 3 | 37 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__poe_gpt_5_5__seed_11.json Hash: sha256:bd8f2291a9fbcb67fb2cccd9fe47a751841190d67704a5dd8aa77a52d6da2407 |
| ta-8705d0026b8b | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | poe / gpt-5.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0053 | -0.0569 | 0.8077 | 3 | 32 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__poe_gpt_5_5__seed_17.json Hash: sha256:8705d0026b8b449515dbd60f2a147c7cf608342457de93f06b29c82f0b62607b |
| ta-892eda5d0bbb | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | poe / gpt-5.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.1031 | -0.1200 | 0.6875 | 7 | 36 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__poe_gpt_5_5__seed_23.json Hash: sha256:892eda5d0bbb298a4a4a2f870ebe58ed8fdd8be3cbc3154dc979f119a22e862b |
| ta-56ef503e6f84 | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | poe / gpt-5.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.1103 | -0.1202 | 0.6897 | 6 | 30 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__poe_gpt_5_5__seed_31.json Hash: sha256:56ef503e6f84dcf50b448435428472b326db3628cd28f3e8ecd2d6dc251c1ea6 |
| ta-d080576867c6 | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | poe / gpt-5.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0095 | -0.0566 | 0.8333 | 3 | 37 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__poe_gpt_5_5__seed_7.json Hash: sha256:d080576867c6ebe09805870258ff5a5caab2e6dddd1b9f286b3bbab677d6a493 |
| ta-99bc73bf56a8 | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | poe / kimi-k2.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.0046 | -0.0571 | 0.7333 | 5 | 41 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__poe_kimi_k2_5__seed_11.json Hash: sha256:99bc73bf56a8edd11736737f45fe321da861dad7a3303eabb4794d8943d4b5e5 |
| ta-60d531e261b0 | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | poe / kimi-k2.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0068 | -0.0421 | 0.7391 | 4 | 25 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__poe_kimi_k2_5__seed_17.json Hash: sha256:60d531e261b0e16bc714b1693ad64fe43338ee4a55aa2b5f0b4f4f16c0d8a593 |
| ta-9fdedfcd3b81 | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | poe / kimi-k2.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.1093 | -0.1199 | 0.6774 | 7 | 32 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__poe_kimi_k2_5__seed_23.json Hash: sha256:9fdedfcd3b814e67fdb4491f2d17e1732805d549c2d6088714bf352512a60979 |
| ta-028041a81f5c | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | poe / kimi-k2.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | -0.1103 | -0.1202 | 0.6897 | 6 | 26 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__poe_kimi_k2_5__seed_31.json Hash: sha256:028041a81f5c6dec9799c8f19598b8d82e918bfa8692d5e2fcd5017b11da62bd |
| ta-40cce8421d1c | leaderboard_real_yahoo_recent_gspc_btc_btcf_weekly_v0_1 | poe / kimi-k2.5 | rationale | true | stress-onlycached-providerredacted-prompt | benchmark | stress-benchmark | 1.0000 | yahoo-finance-csv (weekly, 3 symbols) | 0.0055 | -0.0566 | 0.8276 | 4 | 37 | 1.0000 | ReproducibleRedacted | OpenModel redacted: False Claim scope: cached-provider reliability under stress-only execution Source: examples/benchmark_submissions/real_market_matrix/recent_cross_asset__poe_kimi_k2_5__seed_7.json Hash: sha256:40cce8421d1cdec7d235a50dae95561df55f605c71eaaf6a812711353de44db3 |