2026-05-07 22:11 GMT+8 · summary_2026-05-07_22-11.md

🤖 AI News Summary - 2026-05-07 22:11 GMT+8

Focused AI/dev subreddit roundup.

Full site: https://kkklobsterfarming.github.io/ai-news-summary-site/

What changed since last run

Qwen3.6 27B uncensored heretic v2 Native MTP Preserved is Out Now With KLD 0.0021, 6/100 Refusals and the Full 15 MTPs Preserved and Retained, Available in Safetensors, GGUFs and NVFP4s formats. — r/LocalLLaMA
I keep losing settings — r/OpenWebUI
Set up a “Knowledge”, can’t get a chat configured — r/OpenWebUI
Get faster qwen 3.6 27b — r/LocalLLaMA
Running Qwen3.5 / Qwen3.6 with NextN MTP (Multi-Token Prediction) speculative decode in llama.cpp — single RTX 3090 Ti GPU guide — r/LocalLLaMA
Open-sourced a Python sensor that captures execution under MCP servers (tool calls, imports, subprocesses) — r/llmdevs
Need advice on hardware purchasing decision: RTX 5090 vs. M5 Max 128GB for agentic software development — r/LocalLLaMA
Which LLM/API model offers the best balance of affordability, performance, reliability, low token cost, context window size, and minimal rate-limit restrictions for high-volume production use in 2026? What are the best non-Chinese alternatives offering similar or better performance, pricing? — r/llmdevs
I built an open-source Hermes profile pack for local-first wellness agents — r/llmdevs
Best self-hosted Discord alternative supporting ARM — r/selfhosted
How do you monitor your self-hosted servers? — r/selfhosted
MIT-licensed Sentry + Datadog replacement, self-hosts in ~90 seconds — r/selfhosted

r/openai

No non-pinned/newsworthy posts fetched after filtering.

r/LocalLLaMA

Qwen3.6 27B uncensored heretic v2 Native MTP Preserved is Out Now With KLD 0.0021, 6/100 Refusals and the Full 15 MTPs Preserved and Retained, Available in Safetensors, GGUFs and NVFP4s formats.
- llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved: https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved…
- Timestamp: 2026-05-07 10:59 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is mostly positive. Top reactions focus on Good effort! Would love to try it, can you add a Q4_K_XS to run on 16GB with enough context? Does the MTP work with TurboQuant compressed… | Fellow 16GB VRAM user here. Could you please share which model are you…
- Author: /u/LLMFan46
Get faster qwen 3.6 27b
- Using 100k context with 3090 with MTP GGUF and getting 50 t/s on llama.cpp Thought I would knowledge share Use https://huggingface.co/RDson/Qwen3.6-27B-MTP-Q4_K_M-GGUF (https://huggingface.co/RDson/Qwen3.6-27B-MTP-Q4_K_M-GGUF) And am17an…
- Timestamp: 2026-05-07 07:33 GMT+8
- Community: Community reaction (heuristic-fallback): Top reactions focus on If you have a multiGPU setup, speculative decoding is probably faster than this. | Do the qwen models support speculative decode with llama.cpp yet? When I tried it with the drafter model I saw a decrease…… Overall sentiment — post: mixed; author:…
- Author: /u/admajic
Running Qwen3.5 / Qwen3.6 with NextN MTP (Multi-Token Prediction) speculative decode in llama.cpp — single RTX 3090 Ti GPU guide
- I was asked for this guide, so here it is. Some overlap with someone else’s post from yesterday.
- Timestamp: 2026-05-07 17:56 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is split between positive and critical. Top reactions focus on Just use my fork lol, all the fixes are going to land there | This is great thanks! Speculative decoding works on other quants too if anyone complains about q4. I’m wrestling with some things…
- Author: /u/yes_i_tried_google
Need advice on hardware purchasing decision: RTX 5090 vs. M5 Max 128GB for agentic software development
- tl;dr - For software development, Qwen3.6 27B, 5090 gives you ~3x speed over M5 Max, letting you plow through code, while M5 Max gives you ~4x memory, letting you use higher quantization and bigger context. I’ve been doing a lot of…
- Timestamp: 2026-05-07 08:34 GMT+8
- Community: Community reaction (heuristic-fallback): Top reactions focus on I use Q6 quant with 131k context on 5090…. 55-60 tok/sec on Qwen3,6 27b | If you go with thetom fork you slightly decrease speed for full or 1 Mio context size if needed. Q8/T4 für k/v. 1% degeneration in quality…. Overall sentiment — post: mixed;…
- Author: /u/BawbbySmith
why llama.cpp can’t combine speculative decode methods?
- dicking around with the new mtp speculative decode with qwen3.6 27b, and it’s great. but for agentic coding i’ve seen significant improvements from ngram, because a decent fraction of the time (e.g.
- Timestamp: 2026-05-07 15:53 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is split between positive and critical. Top reactions focus on « There is a PR for that » is a bit like « there is an app for that » in the llama.cpp world | i literally hope it lands soon i saw that –draft-max was deprecated in favor of –spec-draft-n-max…
- Author: /u/Qwoctopussy

r/llmdevs

Open-sourced a Python sensor that captures execution under MCP servers (tool calls, imports, subprocesses)
- We (BlueRock) kept hitting the same wall debugging multi-agent and MCP systems: traces show that the agent called a tool, but not what happened once that tool started executing. In MCP systems specifically, tools call other tools.
- Timestamp: 2026-05-07 01:31 GMT+8
- Author: /u/Upstairs_Safe2922
Which LLM/API model offers the best balance of affordability, performance, reliability, low token cost, context window size, and minimal rate-limit restrictions for high-volume production use in 2026? What are the best non-Chinese alternatives offering similar or better performance, pricing?
- I often see models like Qwen 3.6, DeepSeek V4, MiniMax 2.7, and Kimi K2.6 discussed due to their strong price-to-performance ratio, large context windows, and relatively low API costs. But I know these are all Chinese models/providers.
- Timestamp: 2026-05-07 14:31 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is mostly concerned. Top reactions focus on My preference for a coding agent is ibm granite4 paired with gemma4 for writing | If your RAG pipeline ingests raw markdown, you pay to embed navigation menus, language selectors, and irrelevant UI elements on…
- Author: /u/ComparisonLiving6793
I built an open-source Hermes profile pack for local-first wellness agents
- [Image: I built an open-source Hermes profile pack for local-first wellness agents] Hey everyone - I’ve been dogfooding Hermes with wearable/nutrition MCPs and turned the setup into a small open-source profile pack: Delx Wellness for…
- Timestamp: 2026-05-07 00:12 GMT+8
- Community: Community reaction (heuristic-fallback): Top reactions focus on the inspectable setup flow is smart. non-technical users gonna bounce off the mcp config though. Overall sentiment — post: mixed; author: mixed. Reply threads: 2026-05-07 10:58 GMT+8: post=mixed, author=mixed — the inspectable setup flow is smart….
- Author: /u/delxmobile

r/OpenWebUI

I keep losing settings
- Hi, I’m running openwebui as a test within a small org on Azure. Every time I restart the app the database (users, chats, tools) is kept but I keep losing admin panel settings like connections.
- Timestamp: 2026-05-07 16:37 GMT+8
- Community: Community reaction (heuristic-fallback): Top reactions focus on All settings or just some of them? We are experiencing this too, and the connection settings are not saved | I lose connection settings and other admin panel settings. Tools and users are persistent. Overall sentiment — post: mixed; author: mixed. Reply…
- Author: /u/Vroedoeboy
Set up a “Knowledge”, can’t get a chat configured
- [Image: Set up a “Knowledge”, can’t get a chat configured] I’ve just installed Open WebUI locally. I am not testing with any local models.
- Timestamp: 2026-05-07 09:34 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is split between critical and positive. Top reactions focus on Do what is says. Choose base model there. This will create new “virtual model” which will be your base model like nanogpt or whatever… | Im very new so I apologize if im off but when you start…
- Author: /u/Omnius42

r/selfhosted

Best self-hosted Discord alternative supporting ARM
- I know posts like these appear on this subreddit every other month, but the landscape for Discord alternatives is changing so quickly I feel the need to ask All I wanna do is chat and send media to a couple of my friends because I…
- Timestamp: 2026-05-07 09:45 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is split between critical and skeptical. Top reactions focus on Expand the replies to this comment to learn how AI was used in this post/project. | Run a Matrix Synapse server, expose it, and then connect using whatever Matrix client you want. Most common…
- Author: /u/Bo0sted5
How do you monitor your self-hosted servers?
- I’m curious how people here handle server monitoring. Right now I’m thinking about things like: - Authentication activity - Process execution history - Network activity But I’m not sure what the “normal” setup looks like for self-hosting.
- Timestamp: 2026-05-07 15:58 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is mostly positive. Top reactions focus on Expand the replies to this comment to learn how AI was used in this post/project. | I have a great system. Invite a bunch of friends and their wives to your Plex server then any time it goes down you’ll get a flood…
- Author: /u/vdorru
MIT-licensed Sentry + Datadog replacement, self-hosts in ~90 seconds
- Hi, I’ve been working on an open-source observability stack that is really easy to self host. About 6 months ago I got super frustrated by paying for Sentry and hosting a bunch of services (otel collector, prometheus, grafana…) and still…
- Timestamp: 2026-05-07 03:13 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is split between joking and positive. Top reactions focus on Expand the replies to this comment to learn how AI was used in this post/project. | Not to sound too harsh, but looking at the website, it screams AI. That’s okay, but it also doesn’t signal a well…
- Author: /u/narrow-adventure
Why aren’t more people talking about Filestash?
- I haven’t tried it yet because I’m in the process of saving up for storage drives for my current server (stupid LLMs), but it looks amazing! The ability to just completely configure it to your liking and needs via the plugins and the whole…
- Timestamp: 2026-05-07 16:36 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is mostly positive. Top reactions focus on Expand the replies to this comment to learn how AI was used in this post/project. | It doesnt seem to have a sync client, no android client, its all web browser based.. not a real replacement for nextclund. Overall…
- Author: /u/noahkra

r/ClaudeAI

No non-pinned/newsworthy posts fetched after filtering.

r/ClaudeCode

No non-pinned/newsworthy posts fetched after filtering.

r/Codex

No non-pinned/newsworthy posts fetched after filtering.

Generated 2026-05-07 22:11 GMT+8 | Next update in 2 hours