2026-05-05 22:11 GMT+8 · summary_2026-05-05_22-11.md

🤖 AI News Summary - 2026-05-05 22:11 GMT+8

Focused AI/dev subreddit roundup.

Full site: https://kkklobsterfarming.github.io/ai-news-summary-site/

What changed since last run

Qwen3.6 27B FP8 runs with 200k tokens of BF16 KV cache at 80 TPS on a single RTX 5000 PRO 48GB — r/LocalLLaMA
Human in the loop pipe function — r/OpenWebUI
Open terminal problem? — r/OpenWebUI
As MTP prepares to land in llama.cpp, Models that support MTP — r/LocalLLaMA
FastDMS: 6.4X KV-cache compression running faster than vLLM BF16/FP8 — r/LocalLLaMA
vLLM Just Merged TurboQuant Fix for Qwen 3.5+ — r/LocalLLaMA
Error 1?! — r/OpenWebUI
Having problems with Gemini models (tools) — r/OpenWebUI
How to perform a clean install without the web UI — r/OpenWebUI
I replaced a 5-step lead enrichment workflow with Claude custom skills — r/ClaudeAI
I wish Claude Projects would have the same read/write ability as Claude Code — r/ClaudeAI
My setup for running Claude Code across the full software dev lifecycle — r/llmdevs

r/openai

No non-pinned/newsworthy posts fetched after filtering.

r/LocalLLaMA

Qwen3.6 27B FP8 runs with 200k tokens of BF16 KV cache at 80 TPS on a single RTX 5000 PRO 48GB
- START HUMAN TEXT—- Hi all, I’ve seen a bunch of posts about squeezing 27B onto a 24GB card and all the quantization tricks involved in doing so. It’s all amazing work, but at the end of the day a quantized model with quantized KV will…
- Timestamp: 2026-05-05 13:46 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is split between joking and skeptical. Top reactions focus on I’m running qwen3.6 27B Q4_K_M on my i5-1334U without any issues, it’s just that “tokens per second” is more like “seconds per token”. | Ha! Yeah this is not that :) Still, if you’ve already got…
- Author: /u/JockY
As MTP prepares to land in llama.cpp, Models that support MTP
- DeepSeekv3 OG DeepSeekv3.2/4 Qwen3.5+ GLM4.5+ MiniMax2.5+ Step3.5Flash Mimo v2+ Until we get mtp weights, you need to download HF weights and convert to gguf. I think I’m going to try either qwen3.5-122b or glm4.5-air first.
- Timestamp: 2026-05-05 13:51 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is mostly skeptical. Top reactions focus on Well, this beta is only for Qwen3.5/6. Each architecture has their own MTP implementation. So it is not an once for all thing. | some guy tries to get them post…
- Author: /u/segmond
FastDMS: 6.4X KV-cache compression running faster than vLLM BF16/FP8
- Last year researchers affiliated with NVIDIA, University of Warsaw, and University of Edinburgh published Dynamic Memory Sparsification (DMS) (https://arxiv.org/abs/2506.05345), a KV-cache sparsification technique using learned per-head…
- Timestamp: 2026-05-05 05:38 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is mostly positive. Top reactions focus on Good post! Very similar to my own implementation on CASK it seems, I’ll look into what can be done with it on AMD side | Thanks for all you do, & I don’t even own anything AMD. 😄. Overall sentiment — post: positive;…
- Author: /u/randomfoo2
vLLM Just Merged TurboQuant Fix for Qwen 3.5+
- Previously it was throwing a ‘Not Implemented’ error due to Mamba layers. https://github.com/vllm-project/vllm/pull/39931 (https://github.com/vllm-project/vllm/pull/39931) Edit: Works with Qwen 3.6, tested with 27B Can be used with…
- Timestamp: 2026-05-05 08:30 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is split between critical and positive. Top reactions focus on Am I crazy or are there 0 benchmarks against perplexity and KLD done? Should that not be standard when testing this? | Unless my brain is playing tricks on me, I seem to recall seeing a post here…
- Author: /u/havenoammo
Vulkan backend outperforms ROCm on Strix Halo (gfx1151) — llama.cpp benchmark
- Just ran some llama-bench comparisons between ROCm and Vulkan backends on my Strix Halo system. Vulkan came out ahead, which surprised me.
- Timestamp: 2026-05-05 21:31 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is split between critical and skeptical. Top reactions focus on and which you prefer to stick? I think I will stay on Vulkan | Results will be same I guess, on Production Tasks, Rocm Performed for me even worse. Overall sentiment — post: mixed; author:…
- Author: /u/FeiX7

r/llmdevs

My setup for running Claude Code across the full software dev lifecycle
- Spent the last several months using Claude Code well beyond the editor: as the reasoning engine inside a multi-layer system that handles tickets, cross-repo implementation, code review, MRs, and a persistent knowledge layer between…
- Timestamp: 2026-05-05 08:08 GMT+8
- Community: Community reaction (heuristic-fallback): Top reactions focus on Nice stack; ditch Jira, and try AgentRQ - human in the loop task manager (opensource w Apache 2.0) | Why even use Claude code at all? You’ve got a harness embedded inside another harness, and you’re having to compress context through the…. Overall…
- Author: /u/Alternative_One_4804

r/OpenWebUI

Human in the loop pipe function
- I know there has been a human in the loop PR in review for the longest time and someone recently commented on it saying that it be built using all available functions, actions with event emmiters so I took some time to work on this. I have…
- Timestamp: 2026-05-05 10:22 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is mostly positive. Top reactions focus on Hi i believe that comment was from me haha. For everyone else who is not familiar let me give you a TLDR: Self built tools in the Tools…. Overall sentiment — post: positive; author: mixed. Reply threads:…
- Author: /u/jamolopa
Open terminal problem?
- I’ve been trying to use the experimental Open Terminal integration on Open WebUI v0.9.2 (self-hosted via Docker) with DeepSeek V4 Pro through the DeepSeek API, and I’m running into two issues that I can’t seem to get past. The first one is…
- Timestamp: 2026-05-05 18:17 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is mostly positive. Top reactions focus on yes native is needed this is expected and documented > Even with Native mode working, the model executes a single step and then stops. If I… | This sounds like you could put it in the system prompt? „Keep running…
- Author: /u/RegicideRook
Error 1?!
- I’m trying to add ComfyUI for image generation, and I keep getting Error 1 no matter what I do. It doesn’t give me any further info.
- Timestamp: 2026-05-05 07:25 GMT+8
- Author: /u/louislamore
Having problems with Gemini models (tools)
- I am experiencing difficulties with Gemini models in their ability to utilize tools such as Calendar and Notes, although web search functionality appears to be working as expected. However, when I configure tool calling to its native…
- Timestamp: 2026-05-05 04:23 GMT+8
- Community: Community reaction (heuristic-fallback): Top reactions focus on If you haven’t updated your Gemma4 model, do that first. There were early problems with tool calling and the chat template, since resolved…. | It’s straight from Google API, and those are their Flash models i am having problems with.. Overall sentiment…
- Author: /u/Electronic-Air5728
How to perform a clean install without the web UI
- Hi everyone, I’ve tried the desktop version (yes, ignoring all warnings) and now I’m facing a completely broken setup. I tried to fix it myself, but it’s a waterfall of errors, so I prefer to perform a new clean install, probably with…
- Timestamp: 2026-05-05 18:53 GMT+8
- Community: Community reaction (heuristic-fallback): Top reactions focus on Everything openweb ui has in on its database. It’s either a normal one, or sqlite. (the default is sqlite) All data is in the sqlite…. Overall sentiment — post: mixed; author: mixed. Reply threads: 2026-05-05 21:59 GMT+8: post=mixed, author=mixed —…
- Author: /u/Lart_Iste

r/selfhosted

No non-pinned/newsworthy posts fetched after filtering.

r/ClaudeAI

I replaced a 5-step lead enrichment workflow with Claude custom skills
- Sharing this because i know a lot of people here are doing what i did. My old workflow was a long process.
- Timestamp: 2026-05-05 14:37 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is split between concerned and critical. Top reactions focus on the data fidelity angle is real - context gets mangled at every vendor handoff. one call sidesteps that. | This is solid when your data source is reliable, but the real constraint I’ve hit with…
- Author: /u/lemnistatic
I wish Claude Projects would have the same read/write ability as Claude Code
- I have a “second brain” filesystem as markdown files that I have been maintaining for months that started out in Claude Code as the interface + file read/write layer… This system just stores a collection of personal todo items,…
- Timestamp: 2026-05-05 09:20 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is split between joking and positive. Top reactions focus on Completely agree. The maintained context of the project knowledge is unbeatable. I can’t replicate it via Claude Code. | Filesystem MCP? Or am i misunderstanding? Starting to move towards claude…
- Author: /u/Comprehensive-Ad1819

r/ClaudeCode

No non-pinned/newsworthy posts fetched after filtering.

r/Codex

No non-pinned/newsworthy posts fetched after filtering.

Generated 2026-05-05 22:11 GMT+8 | Next update in 2 hours