2026-04-30 22:10 GMT+8 · summary_2026-04-30_22-10.md

🤖 AI News Summary - 2026-04-30 22:10 GMT+8

Focused AI/dev subreddit roundup.

Full site: https://kkklobsterfarming.github.io/ai-news-summary-site/

What changed since last run

Is OWUI native RAG good enough or should I just use Azure AI Search? — r/OpenWebUI
How to make chat work unattended? — r/OpenWebUI
Azure Anthropic Opus 4.7 through LiteLLM - how to enable thinking — r/OpenWebUI
Can’t replicate Reddit numbers with Qwen 27B on a 3090TI. — r/LocalLLaMA
“What do you guys even use local LLMs for?” Me: A lot — r/LocalLLaMA
Qwen3.6-27B 4.256bpw in full VRAM on a 5070 Ti with 50000 q4_0 context - not turbo! — r/LocalLLaMA
Selfhosting Authentication server choice — r/selfhosted
Claude Code is expensive for mechanical work. I routed that class of task to DeepSeek via MCP. — r/ClaudeCode
I built a runtime protocol monitor for LLM agents and MCP tool use (session types). Looking for one team to apply it to a real agent - free, in exchange for a case study. — r/llmdevs
Indexing and OCR solution for Documents that preserves folder structure — r/selfhosted
PSA: llama-swap released a new grouping feature, matrix, allowing you to fine tune which models can run together — r/LocalLLaMA
Selfhosted an IRC server — r/selfhosted

r/openai

No non-pinned/newsworthy posts fetched after filtering.

r/LocalLLaMA

Can’t replicate Reddit numbers with Qwen 27B on a 3090TI.
- I see people here posting 30 - 100+ tok/s (100+ being with speculative decoding) on a 3090 with Qwen 3.6 27B. I’m trying to replicate this but my performance numbers are nowhere near that.
- Timestamp: 2026-04-30 19:26 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is mostly positive. Top reactions focus on on single 3090 i couldnt make it run reliably with enough context. speed was lower then llama.cpp. Maybe a skill issue | iq5ks on 3090 with 138k context with the shared config: - model size: 18.532 GiB - required…
- Author: /u/YourNightmar31
“What do you guys even use local LLMs for?” Me: A lot
- [Image: “What do you guys even use local LLMs for?” Me: A lot] Created separate private API keys for each service within LiteLLM and started logging the usage via Prometheus to view in Grafana. Surprised the Frigate GenAI summaries tokens…
- Timestamp: 2026-04-30 06:31 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is split between critical and positive. Top reactions focus on Right? I generated around 25 million with Qwen3.5-27B across several days, and that was just one project (adding some nuance notes to my… | ~22k Japanese flash cards tangential question - have…
- Author: /u/andy2na
Qwen3.6-27B 4.256bpw in full VRAM on a 5070 Ti with 50000 q4_0 context - not turbo!
- Hugging face link here (https://huggingface.co/sokann/Qwen3.6-27B-GGUF-4.256bpw). Ive been waiting for sokann to drop his Qwen 3.6 GGUF for 16 GB GPUs as his Qwen 3.5 was my GGUF of choice.
- Timestamp: 2026-04-30 11:02 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is mostly critical. Top reactions focus on Or use a iGPU or 2nd GPU if you have one. Also shutting off GPU acceleration of web browsers should help. But thanks for the reminder that… | just switch to iGPU for your desktop and you’ll have 0MB used Windows…
- Author: /u/Decivox
PSA: llama-swap released a new grouping feature, matrix, allowing you to fine tune which models can run together
- Previously a model could only be present in a single group. Now you can create whatever groups you want: one for big models that should run on their own, a group for STT + bigger model, a group for RAG usages, etc.
- Timestamp: 2026-04-30 21:45 GMT+8
- Community: Community reaction (heuristic-fallback): Top reactions focus on It is, but when you’re trying to stuff 80GB of model weights into a 40GB bag it’s a life-saver. Being able to simul-load an embedding,…. Overall sentiment — post: mixed; author: mixed. Reply threads: 2026-04-30 22:09 GMT+8: post=mixed, author=mixed — It…
- Author: /u/walden42

r/llmdevs

I built a runtime protocol monitor for LLM agents and MCP tool use (session types). Looking for one team to apply it to a real agent - free, in exchange for a case study.
- A couple of weeks ago I posted here asking how multi-turn agents fail in production. Fair enough, abstract questions without artifacts don’t earn much.
- Timestamp: 2026-04-30 16:02 GMT+8
- Author: /u/Sweaty-Reach9809

r/OpenWebUI

Is OWUI native RAG good enough or should I just use Azure AI Search?
- So I built an internal AI tool at work using Open WebUI as the frontend with Azure on the backend (Blob Storage, Azure OpenAI GPT-4o, Azure AI Search). The tool takes acceptance criteria for new features, does RAG over around 10,000+…
- Timestamp: 2026-04-30 00:11 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is split between positive and joking. Top reactions focus on I’d check RAGFlow. Tell me how it went once done :). EDIT: I use Docling for extraction in Open WebUI and the better embedding/reranking… | Azure Vector Search is certainly the “easier” option. I…
- Author: /u/Boring-Baker-3716
How to make chat work unattended?
- so, I created an MCP server with like 14 tools. My task is bit complicated and takes like 3-4 hours easily.
- Timestamp: 2026-04-30 00:53 GMT+8
- Community: Community reaction (heuristic-fallback): Top reactions focus on Aiohttp timeout variables..increase it. And also adjust reverse proxy settings if needed. Overall sentiment — post: mixed; author: mixed. Reply threads: 2026-04-30 02:13 GMT+8: post=mixed, author=mixed — Aiohttp timeout variables..increase it. And also…
- Author: /u/Upper-Advantage-6156
Azure Anthropic Opus 4.7 through LiteLLM - how to enable thinking
- Hi, I am using Anthropic models on OWUI routed through Litellm. Reasoning works well for Opus 4.6, but for 4.7, they have changed the settings.
- Timestamp: 2026-04-30 05:45 GMT+8
- Community: Community reaction (heuristic-fallback): Top reactions focus on Same for me on google cloud. But honestly at this point i think it’s an anthropic issue. Even if i query the endpoint directly using…. Overall sentiment — post: mixed; author: mixed. Reply threads: 2026-04-30 11:17 GMT+8: post=mixed, author=mixed — Same…
- Author: /u/dotanchase

r/selfhosted

Selfhosting Authentication server choice
- Hi all, Iam developing a fuel tracking app for the past 2 years now and want to roll out Apple support end of June. The last feature that is currently in development phase is a login screen with suppirt to login via different social media…
- Timestamp: 2026-04-30 04:23 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is mostly positive. Top reactions focus on Expand the replies to this comment to learn how AI was used in this post/project. | Don’t focus on providing support for some specific provider. Just implement OIDC support and then basically any provider can…
- Author: /u/iServeCloud
Indexing and OCR solution for Documents that preserves folder structure
- I rather like my folder structures so any tool that doesn’t preserve it is a no go for me. Is there any tool that given a folder structure, just OCR’s non text document and indexes text documents recursively ?
- Timestamp: 2026-04-30 08:59 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is mostly critical. Top reactions focus on Expand the replies to this comment to learn how AI was used in this post/project. | I felt the same way, but I started using Paperless and between its filters, views, searching, work flows, and storage paths I don’t…
- Author: /u/vortexmak
Selfhosted an IRC server
- Hi Been working on setting up a self-hosted IRC server and finally got it running properly. Turned out to be a pretty fun project and actually pretty interesting.
- Timestamp: 2026-04-30 07:09 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is mostly positive. Top reactions focus on Expand the replies to this comment to learn how AI was used in this post/project. | UnrealIRCd is awesome. Used to run it on my network. Been a IRC user for many years.. good old days.. Overall sentiment — post:…
- Author: /u/570194

r/ClaudeAI

No non-pinned/newsworthy posts fetched after filtering.

r/ClaudeCode

Claude Code is expensive for mechanical work. I routed that class of task to DeepSeek via MCP.
- Claude Code sessions cost real money. Significant chunk of that usage is mechanical: classify files, format JSON, turn notes into tables, extract fields, populate templates.
- Timestamp: 2026-04-30 21:54 GMT+8
- Author: /u/petburiraja

r/Codex

No non-pinned/newsworthy posts fetched after filtering.

Generated 2026-04-30 22:10 GMT+8 | Next update in 2 hours