About

news.prompt20.com is a one-person editorial system for the AI news cycle. Solo engineer, Next.js + TypeScript, deployed on a self-hosted CapRover. Built to test a thesis: a curator who can also code can do the AI Curator job at higher leverage — by encoding editorial judgment (which benchmarks count, which sources are dead, which clusters are consensus-real vs hype) into systems instead of doing it from memory every morning.

🛠 Editorial system — the artifacts

📰/brief

Daily TLDR-style wrap-up of yesterday's AI news, auto-curated. Categorised (Launches / Funding / Benchmarks / Deals), with a cross-source consensus badge so events that 3+ outlets covered surface above singletons.

📊/leaderboard/benchmarks

Public legitimacy rubric. 24 widely-cited benchmarks classified A/B/C/D by contamination resistance, format exploit-surface, and saturation. Only 7 of 24 are still strong-signal in 2026 — the rest are saturated, MCQ-gameable, or in the pretraining corpus.

🩺/diagnostics

Live source-health audit. Tracks 110 feeds, distinguishes missing / stale / date-blind. Used to drop 5 dead feeds (Vercel, Apptronik, Agility, Fetch.ai, NEAR.org) and add Bittensor as a 4-way composite (OpenTensor + r/bittensor_ + Macrocosmos + Manifold).

🧠/leaderboard/intelligence-index

Live leaderboard from Artificial Analysis with the same A/B/C/D legitimacy badges propagated onto each per-benchmark card. Strong-signal evals (HLE, LiveCodeBench) sort first; saturated MCQ relics sort last.

✍️ Three picks in TLDR-style voice

Drafted using yesterday's real news from this site's feed. Lead with the news, second sentence on what changed materially, third on so-what. ~60 words each, engineer audience.

🤖PostTrainBench drops with 23.2% top score
Opus 4.6 (Claude Code) leads. The benchmark measures whether a model, acting as a CLI agent, can fine-tune a different base LLM across 7 evals end-to-end. 23.2% is SOTA — agentic self-improvement is barely starting, and the next 18 months of agent leaderboards will be defined here, not on SWE-Bench.
⚡Kimi Linear: 75% smaller KV cache, 6× decoding at 1M
Moonshot's hybrid 3:1 KDA-to-MLA attention ships with FlashKDA CUTLASS kernels (1.72–2.22× prefill speedup vs flash-linear-attention on H20). The first frontier-scale demo that linear-attention variants are production-ready, not just paper-curiosities.
🎨Black Forest Labs raises $300M, ships FLUX 2 Klein
Klein is a 9B FP8 open-weights image model that runs on consumer GPUs. Combined with the Series B and the FLUX 2 base release, BFL is now the credible open-weights answer to Midjourney and Sora at meaningful capital. Open image-gen is having its DeepSeek moment.

🎯 Coverage philosophy

What gets in, what doesn't, why. Editorial decisions are encoded in the feed list and benchmark rubric — not in my head, so they're reviewable.

🇨🇳11 China lab feeds + dedicated CN-language wire
Most Western AI news is one cycle late on Qwen / DeepSeek / GLM / Kimi releases. /cn pulls 36Kr AI, 量子位, 雷峰网, INFOQ 中国, 钛媒体 directly so I see HF-org commits the day they ship.
🪙Crypto x AI as a first-class category
Bittensor is a composite (4 sub-sources, including subnet operators), 0G / Venice AI / NEAR AI tracked individually. CoinGecko AI categories aggregated into a market table. TLDR currently undercovers this — onchain inference + agent tokens are real signal.
🤖Robotics labs not just announcements
Physical Intelligence, Figure, Boston Dynamics, Skild, 1X, Sanctuary all pulled directly. Pages re-classified after I found 1X's actual articles live at /discover, not /ai (the parser was missing them all).
📜Research / agent evals as a separate vertical
/leaderboard/research surfaces METR Time Horizon, GDPval, LiveBench, ClawEval — the metrics that still discriminate at the frontier when MMLU/HumanEval no longer can.

📈 By the numbers

110

configured feeds

benchmarks rated

leaderboards

170+

commits / 60 days

🤝 Why I'd be a strong AI Curator at TLDR

I already do the job, just for myself. Daily reading across X, arXiv (via HF Papers), Hacker News, GitHub orgs, and 100+ AI lab blogs is what this site is built on. The brief at /brief is the output of that loop — it would take me ~30 minutes a day to author by hand because the system already extracted the events, deduped them, and ranked by cross-source consensus.
Defensible editorial judgment. The benchmark legitimacy rubric is a written framework for which evals should appear in a daily — most applicants for an AI Curator role have written summaries; very few can defend a public taxonomy of which numbers to cite.
Hear about it before friends.Things this site tracked early: PostTrainBench, NEAR AI's subsidiary (distinct from NEAR Protocol), Manifold Labs' Targon subnet, FLUX 2 Klein, Kimi Linear's KDA hybrid attention. Each was a feed decision I made before mainstream coverage.
Engineer-curator, not pure editor.Background is software engineering; this site is fully my own code (Next.js App Router + TypeScript + custom fetcher pipeline + auto-zoom CSS + ISR + Cloudflare-edge). The pilot reader TLDR mentions in the JD — I'd ship to it confidently.