About
news.prompt20.com is a one-person editorial system for the AI news cycle. Solo engineer, Next.js + TypeScript, deployed on a self-hosted CapRover. Built to test a thesis: a curator who can also code can do the AI Curator job at higher leverage β by encoding editorial judgment (which benchmarks count, which sources are dead, which clusters are consensus-real vs hype) into systems instead of doing it from memory every morning.
π Editorial system β the artifacts
Daily TLDR-style wrap-up of yesterday's AI news, auto-curated. Categorised (Launches / Funding / Benchmarks / Deals), with a cross-source consensus badge so events that 3+ outlets covered surface above singletons.
Public legitimacy rubric. 24 widely-cited benchmarks classified A/B/C/D by contamination resistance, format exploit-surface, and saturation. Only 7 of 24 are still strong-signal in 2026 β the rest are saturated, MCQ-gameable, or in the pretraining corpus.
Live source-health audit. Tracks 110 feeds, distinguishes missing / stale / date-blind. Used to drop 5 dead feeds (Vercel, Apptronik, Agility, Fetch.ai, NEAR.org) and add Bittensor as a 4-way composite (OpenTensor + r/bittensor_ + Macrocosmos + Manifold).
Live leaderboard from Artificial Analysis with the same A/B/C/D legitimacy badges propagated onto each per-benchmark card. Strong-signal evals (HLE, LiveCodeBench) sort first; saturated MCQ relics sort last.
βοΈ Three picks in TLDR-style voice
Drafted using yesterday's real news from this site's feed. Lead with the news, second sentence on what changed materially, third on so-what. ~60 words each, engineer audience.
- π€PostTrainBench drops with 23.2% top score
Opus 4.6 (Claude Code) leads. The benchmark measures whether a model, acting as a CLI agent, can fine-tune a different base LLM across 7 evals end-to-end. 23.2% is SOTA β agentic self-improvement is barely starting, and the next 18 months of agent leaderboards will be defined here, not on SWE-Bench.
- β‘Kimi Linear: 75% smaller KV cache, 6Γ decoding at 1M
Moonshot's hybrid 3:1 KDA-to-MLA attention ships with FlashKDA CUTLASS kernels (1.72β2.22Γ prefill speedup vs flash-linear-attention on H20). The first frontier-scale demo that linear-attention variants are production-ready, not just paper-curiosities.
- π¨Black Forest Labs raises $300M, ships FLUX 2 Klein
Klein is a 9B FP8 open-weights image model that runs on consumer GPUs. Combined with the Series B and the FLUX 2 base release, BFL is now the credible open-weights answer to Midjourney and Sora at meaningful capital. Open image-gen is having its DeepSeek moment.
π― Coverage philosophy
What gets in, what doesn't, why. Editorial decisions are encoded in the feed list and benchmark rubric β not in my head, so they're reviewable.
- π¨π³11 China lab feeds + dedicated CN-language wire
Most Western AI news is one cycle late on Qwen / DeepSeek / GLM / Kimi releases. /cn pulls 36Kr AI, ιεδ½, ι·ε³°η½, INFOQ δΈε½, ιεͺδ½ directly so I see HF-org commits the day they ship.
- πͺCrypto x AI as a first-class category
Bittensor is a composite (4 sub-sources, including subnet operators), 0G / Venice AI / NEAR AI tracked individually. CoinGecko AI categories aggregated into a market table. TLDR currently undercovers this β onchain inference + agent tokens are real signal.
- π€Robotics labs not just announcements
Physical Intelligence, Figure, Boston Dynamics, Skild, 1X, Sanctuary all pulled directly. Pages re-classified after I found 1X's actual articles live at /discover, not /ai (the parser was missing them all).
- πResearch / agent evals as a separate vertical
/leaderboard/research surfaces METR Time Horizon, GDPval, LiveBench, ClawEval β the metrics that still discriminate at the frontier when MMLU/HumanEval no longer can.
π By the numbers
π€ Why I'd be a strong AI Curator at TLDR
- I already do the job, just for myself. Daily reading across X, arXiv (via HF Papers), Hacker News, GitHub orgs, and 100+ AI lab blogs is what this site is built on. The brief at /brief is the output of that loop β it would take me ~30 minutes a day to author by hand because the system already extracted the events, deduped them, and ranked by cross-source consensus.
- Defensible editorial judgment. The benchmark legitimacy rubric is a written framework for which evals should appear in a daily β most applicants for an AI Curator role have written summaries; very few can defend a public taxonomy of which numbers to cite.
- Hear about it before friends.Things this site tracked early: PostTrainBench, NEAR AI's subsidiary (distinct from NEAR Protocol), Manifold Labs' Targon subnet, FLUX 2 Klein, Kimi Linear's KDA hybrid attention. Each was a feed decision I made before mainstream coverage.
- Engineer-curator, not pure editor.Background is software engineering; this site is fully my own code (Next.js App Router + TypeScript + custom fetcher pipeline + auto-zoom CSS + ISR + Cloudflare-edge). The pilot reader TLDR mentions in the JD β I'd ship to it confidently.