Newswallah — AI & Tech News

📰

r/MachineLearning Aggregators May 26, 2026

Augmented Equivariant Mesh Networks for Anatomical Mesh Segmentation (ICML 2026 Workshops) [R]

Paper: https://arxiv.org/abs/2605.08172 Workshops: AI for Science & Structured Data for Health at ICML 2026 Abstract: Anatomical mesh segmentation requires models that operate directly on irreg…

Paper: https://arxiv.org/abs/2605.08172 Workshops: AI for Science & Structured Data for Health at ICML 2026 Abstract: Anatomical mesh segmentation requires models that operate directly on irregular surface geometry while remaining robust to arbitrary patient pose and mesh resolution variation. Existing task-specific mesh and point-cloud methods are not equivariant, and can degrade sharply under test-time perturbation, for example dropping by 25-26 IoU points on intraoral scan segmentation at 40o tilt. We present EAMS, an Equivariant Anatomical Mesh Segmentor built on Equivariant Mesh Neural Networks (EMNN), and evaluate it across four clinically distinct tasks spanning edge-, vertex-, and face-level supervision. We combine intrinsic mesh descriptors with anatomy-aware priors, including PCA-derived frames for dental arches and liver surfaces, and augment message passing to provide lightweight global context. Across intracranial aneurysm and intraoral segmentation, EAMS variants are competitive with specialized baselines on unperturbed inputs while remaining stable under geometric perturbations, and on liver surfaces they expose a favorable trade-off between canonical-pose accuracy and rotation robustness. These results show that a lightweight (<2M parameters) equivariant framework can deliver robust anatomical mesh segmentation across diverse supervision types without task-specific architectures. Hi everyone I’m excited to share my solo paper "Augmented Equivariant Mesh Networks for Anatomical Mesh Segmentation" which has been accepted for poster presentations at the ICML 2026 workshops on AI for Science and Structured Data for Health. The project stemmed from my parallel research on structural encoders for biomolecules where enforcing roto-translational equivariance is standard. In this work, I wanted to extend those principles directly to various 3D medical meshes. While current anatomical mesh segmentation methods are highly disjoint and anato

📰

r/MachineLearning Aggregators May 26, 2026

Tomesphere, 3M paper pages with TLDRs, peer reviews, code, and a SPECTER2 similarity graph [P]

Built a richer paper page for 3 million arxiv and OpenAlex papers. Free, no signup, no paywall. tomesphere.com Each page has a Gemini generated TLDR, peer reviews scraped from OpenReview with reviewe…

Built a richer paper page for 3 million arxiv and OpenAlex papers. Free, no signup, no paywall. tomesphere.com Each page has a Gemini generated TLDR, peer reviews scraped from OpenReview with reviewer scores and decisions, GitHub repos, HuggingFace models and datasets, conference videos, the citation graph from OpenAlex (about 250M edges), and a semantic graph using SPECTER2 (768D in pgvector) with four ranking modes: Influential, Recent, Hidden gems, Nearest. Connected Papers and Litmaps default to citation overlap. Tomesphere defaults to text vector similarity, so brand new papers without a citation graph still appear and topically similar work shows up even without shared citers. Chrome extension overlays the same data on arxiv abstract and pdf pages. Try a paper you know: tomesphere.com/paper/2312.00752 (Mamba) tomesphere.com/paper/1706.03762 (Attention) tomesphere.com/paper/2305.14314 (QLoRA) Open to feedback.   submitted by   /u/RegretAgreeable4859 [link]   [comments]

📰

r/MachineLearning Aggregators May 26, 2026

Verbosity is not faithfulness: an architectural argument that reasoning models cannot perform faithful inference [D]

Essay argues that reasoning models cannot perform faithful inference because their reasoning trace and final answer come from the same operation. Engages with Lanham/Turpin/Mirzadeh in empirical crit…

Essay argues that reasoning models cannot perform faithful inference because their reasoning trace and final answer come from the same operation. Engages with Lanham/Turpin/Mirzadeh in empirical critique, and with HRM, TRM, GRAM, AlphaProof, and Kona/Aleph as the contrasting architectural lineage. Curious what this subreddit makes of the constraint-vs-influence framing. https://mauhaq.substack.com/p/verbosity-is-not-faithfulness   submitted by   /u/Sensitive_Air_5745 [link]   [comments]

📰

r/MachineLearning Aggregators May 26, 2026

[P] have a couple technical questions for my LLM router. [P]

I am a CS undergrad and I think token economics is the next big problem for companies. I am building a LLM router specifically for code and codebases. The Routing is not actually done by a heavily fi…

I am a CS undergrad and I think token economics is the next big problem for companies. I am building a LLM router specifically for code and codebases. The Routing is not actually done by a heavily fine tuned llm(already existing solutions do this). Using a bit of a different approach. I am gauging the complexity by measuring interaction between signals that can be cheaply extracted from the prompt. One of these signals is what I like to call blooms_intent, based on bloom’s taxonomy. Bloom's taxonomy is a framework for categorizing educational goals. If a query is “What is this” it falls under remember category whereas “implement this” is more of create category. Questions:- How do I find datasets for this purpose. Is bootstrapping datasets using AI fine for this. Should I do centroid based classification which I’ve been doing till now but the confidence difference between categories for ambiguous queries is way too close. What is the best dataset size and classifier that can somewhat reliably differentiate nuances between queries. You may ask why not use AI for these questions. I have and that’s why I’ve come here. Please lmk your thoughts and thanks in advance!!   submitted by   /u/getridofaks [link]   [comments]

📰

r/MachineLearning Aggregators May 26, 2026

Added a Chrome Dino-style game to my research tool's pipeline wait screen driven by real SSE events [P]

Slightly unhinged engineering decision but it works. My tool (ScholarScout) has a 2-3 minute pipeline: fetch papers from 8 databases → analyze trends → generate ideas. During that time, the user sees…

Slightly unhinged engineering decision but it works. My tool (ScholarScout) has a 2-3 minute pipeline: fetch papers from 8 databases → analyze trends → generate ideas. During that time, the user sees a pixel art owl running through a parallax forest. The fun part: it's not fake animation. Each paper dot that spawns in the game corresponds to a real paper_found SSE event from the backend. Papers drip-feed at 600ms intervals from a queue (even if the fetch returned 30 papers at once). Colors = source (white=arXiv, green=PubMed, purple=Crossref). When pipeline finishes, owl celebrates. Tech: vanilla JS canvas, 32x32 sprite sheet (12 frames), requestAnimationFrame loop, image-rendering: pixelated. No dependencies. Here's the demo vid ScholarScout v1.5.3 - Demo Actual useful changes in the same release: Review Mode: paper clustering (k-means on embeddings, Jaccard fallback) + per-cluster synthesis + cross-cutting analysis Paper freshness: _used_count per paper in cache, least-used prioritized, auto-widen date range on exhaustion All thresholds externalized to config.yaml github.com/neej4/ScholarScout or ScholarScout — Papers in. Ideas out.   submitted by   /u/neeejaaa0 [link]   [comments]

📰

r/MachineLearning Aggregators May 26, 2026

[D] Dlib or pytorch to CNN? [D]

I’m currently studying ML, more specifically convolutional neural networks (CNNs) for finding patterns in images. Right now, I’m trying to develop a model that can solve the “Where’s Waldo?” challeng…

I’m currently studying ML, more specifically convolutional neural networks (CNNs) for finding patterns in images. Right now, I’m trying to develop a model that can solve the “Where’s Waldo?” challenge. However, I currently have a question: what would be the best option for training a CNN model, PyTorch or Dlib? At the moment, I have an AMD RX580. Since Dlib only supports CUDA, I would need to use Google Colab. I’m still learning about this field, so if I said something incorrect or if you have any tips on how to approach this project, I’d be very happy to hear them. 😄   submitted by   /u/TearsInTokio [link]   [comments]

📰

r/MachineLearning Aggregators May 26, 2026

[P] Built a portable GPU ISA after reading too many architecture manuals [P]

I’ve been reading GPU architecture docs in my free time. NVIDIA PTX, AMD ISA reference guides, Intel Xe, reverse-engineered Apple GPU stuff. Over 5,000 pages across 16 microarchitectures. After a whi…

I’ve been reading GPU architecture docs in my free time. NVIDIA PTX, AMD ISA reference guides, Intel Xe, reverse-engineered Apple GPU stuff. Over 5,000 pages across 16 microarchitectures. After a while you notice all four vendors are doing the same 11 things with different names. So I wrote a spec that covers all of them and built a toolchain around it. It’s called WAVE. You write a kernel once, it compiles to a portable binary, then thin backends translate it to Metal, PTX, HIP, or SYCL. Same binary verified on Apple M4 Pro, NVIDIA T4, and AMD MI300X. My co-author Onyinye built PyTorch integration and got identical training results across all backends. Please star on GitHub: https://github.com/Oabraham1/wave Preprint: https://arxiv.org/abs/2603.28793 Read full docs and how I built everything: https://wave.ojima.me pip install wave-gpu   submitted by   /u/not-your-typical-cs [link]   [comments]

📰

r/MachineLearning Aggregators May 26, 2026

[P] I built a system that lets you ask questions about any GitHub repo and get answers grounded in the actual source code [P]

Hi guys I've been working on GitRAG — paste any public GitHub URL, and ask it anything about the codebase. It answers with exact file paths and line numbers, no hallucination. How it works under the …

Hi guys I've been working on GitRAG — paste any public GitHub URL, and ask it anything about the codebase. It answers with exact file paths and line numbers, no hallucination. How it works under the hood: Clones the repo and splits files into semantic chunks using AST-aware parsing (not just line splits) Builds a hybrid index — dense embeddings + BM25 keyword index At query time, fuses both signals with Reciprocal Rank Fusion, then runs Cohere reranking to cut 20 candidates down to 5 Sends those 5 chunks to Groq's llama-3.3-70b which generates a grounded answer The retrieval pipeline is what I'm most proud of — the BM25 + semantic fusion catches things that pure vector search misses (exact function names, error codes, etc.) Stack: FastAPI · ChromaDB · text-embedding-3-small · Cohere rerank-v3.5 · Groq llama-3.3-70b · React + Vite Supports 15+ languages: Python, JS/TS, C#, Java, Go, Rust, C/C++, Swift, Kotlin, Dart, Ruby, PHP, Vue, Svelte, Shell... Curious what repos people try it on — drop your results below 👇   submitted by   /u/Professional-Pie6704 [link]   [comments]

📰

r/MachineLearning Aggregators May 26, 2026

What valuable professional data is completely locked away from AI companies? [D]

Hi all, Apologies beforehand if this is the wrong subreddit, let me know if you think there are better subreddits for this post. I’m working on a project around proprietary data licensing for AI tra…

Hi all, Apologies beforehand if this is the wrong subreddit, let me know if you think there are better subreddits for this post. I’m working on a project around proprietary data licensing for AI training and trying to identify data types that are genuinely inaccessible to AI labs- not because it doesn’t exist, but because no one has figured out how to unlock it. Specifically looking for data that is: • Created by domain experts as part of their daily work • Never published or shared outside the organization • Rich in human reasoning, not just structured outputs Finance is my background so I’m especially curious about examples there, but all industries welcome. What’s the most valuable “locked” professional data you’ve come across in your field - and who (if ya know) owns the rights to it?   submitted by   /u/Manny_in_iceage [link]   [comments]

📰

r/MachineLearning Aggregators May 26, 2026

[D] Where do you go for serious AI research discussion online? [D]

Looking for communities where people actually dig into ML/AI research, not hype, not "look what I built with an LLM API," but discussions about papers, training dynamics, debugging real mod…

Looking for communities where people actually dig into ML/AI research, not hype, not "look what I built with an LLM API," but discussions about papers, training dynamics, debugging real models, infra problems, that kind of thing. I'm specifically interested in places where you can post something like "I'm seeing X behaviour in my SSL training, here's the loss curve, anyone seen this before?" and get thoughtful replies instead of generic advice.   submitted by   /u/Possible-Active-1903 [link]   [comments]

📰

r/MachineLearning Aggregators May 26, 2026

Already 11 000 submissions for EMNLP? [D]

Is this normal? I searched it up and last year it was only 8000.   submitted by   /u/NightCR_ [link]   [comments]

📰

r/MachineLearning Aggregators May 25, 2026

Aiki my local Wikipedia Retrieval-Augmented Generation system [R]

Hey i built Aiki a lightweight tool that let's you chat with Wikipedia locally. what it does: - Downloads and chunks wikipedia articles (u can choose those articles by their name or articles and also…

Hey i built Aiki a lightweight tool that let's you chat with Wikipedia locally. what it does: - Downloads and chunks wikipedia articles (u can choose those articles by their name or articles and also the option of downloading the similar topics) - Uses a custom TF-IDF + cosine similarity retriever (built from scratch) - Supports query expansion using Wikipedia links/redirects - Optional answer generation with llm Very minimal dependencies and runs completely locally. Repo: https://github.com/yacine204/Aiki Would really appreciate your feedback.   submitted by   /u/Just_Jaguar3701 [link]   [comments]

📰

r/MachineLearning Aggregators May 25, 2026

The famous METR AI time horizons graph contains numerous severe errors [D]

Nathan Witkin, a research writer at NYU Stern’s Tech and Society Lab, writes damningly about the famous METR AI time horizons graph in the Substack publication Transformer: It is impossible to draw …

Nathan Witkin, a research writer at NYU Stern’s Tech and Society Lab, writes damningly about the famous METR AI time horizons graph in the Substack publication Transformer: It is impossible to draw meaningful conclusions from METR’s Long Tasks benchmark — in particular once one realizes that its numerous flaws are probably compounding in unpredictable ways. The appropriate response to a study of this kind is not to assume it can be saved via back-of-the-envelope adjustments, or to comfort oneself that other anecdotal evidence implies that it is probably correct anyway. It is to cut one’s losses and move on in search of higher-quality information. … The METR graph cannot be saved. For all its sleekness and complexity, it contains far too many compounding errors to excuse. Among them is generalizing to the entire species data collected from a small group of the authors’ peers. Coming up with ever more dramatic ways to make this mistake has become a kind of sport among AI researchers. If the field has a central pathology, it is to aggressively overindex on a mix of anecdotal data from power-users, alongside a long list of benchmarks even more compromised than METR’s. One hopes that as the field matures, its participants will learn to stop making these mistakes. The errors include: Some of the human baselines data is not actually measured or collected from any empirical source, rather, it is just guesstimated by the authors A key variable in the data is how long it takes humans to complete certain tasks, but — when METR did actually measure this — it paid its human benchmarkers hourly, meaning they were incentivized with cash to take longer The sample of human benchmarkers was biased toward METR employees’ friends, acquaintances, and former colleagues (who are likely unrepresentative and possibly biased) Humans familiar with a codebase and a specific coding task were 5-18x faster at completing it, but METR used data from humans who were much slower because they had t

📰

r/MachineLearning Aggregators May 25, 2026

DCGAN inference on a microcontroller: 12.6M parameters, 512KB SRAM, 26-second generation, pure C [P]

Just thought I'd share, I ran a DCGAN on a dual core RISC-V microcontroller, the CH32H417 generating 64x64 cat faces. This is a new RISC-V MCU, so no TFLite, no CMSIS NN and no external memory. It's …

Just thought I'd share, I ran a DCGAN on a dual core RISC-V microcontroller, the CH32H417 generating 64x64 cat faces. This is a new RISC-V MCU, so no TFLite, no CMSIS NN and no external memory. It's a pure C inference engine, bit-identical to PyTorch reference outputs. The model is 12.6M parameters with int8 per channel quantization. Intermediate activations are stored in DTCM and layer weights stream from SD card using double buffering so the next layer loads while the current one computes. The total available SRAM is 512KB shared between both cores and the inference engine and time to generate one image is 26 seconds, it could be faster, but SD card access speed is the bottleneck rather than computation. The z vector is seeded from 200 bytes of quantum random data (ANU QRNG vacuum fluctuation source), transformed via Box-Muller into the latent vector. which is not strictly necessary for image quality but it was a fun constraint for the art installation side of the project. The generated cat is classified as "motivated" or "demotivated" based on a single quantum bit, which selects from a phrase bank with four fragment slots combining into one of 131,072 possible spoken verdicts output through the onboard DAC... As far as I can tell nobody else is running GAN inference on these low cost RISC-V microcontrollers, cause ARM has the CMSIS NN ecosystem for this kind of thing but RISC-V MCUs especially in the CH32 space have nothing, so the entire inference engine is written from scratch. Paper: TinyGAN: Generative Image Synthesis on a RISC-V Microcontroller with Quantum Entropy Sampling   submitted by   /u/Separate-Choice [link]   [comments]

📰

r/MachineLearning Aggregators May 25, 2026

Is AI inference platform really that saturated now? [D]

I’m thinking of expanding an on-device inference SDk into a full blown AI inference platform and seeing more and more inference platform popping out. Been talking with a VC from Seattle/NY. Is this s…

I’m thinking of expanding an on-device inference SDk into a full blown AI inference platform and seeing more and more inference platform popping out. Been talking with a VC from Seattle/NY. Is this space really that saturated?   submitted by   /u/kampak212 [link]   [comments]

📰

r/MachineLearning Aggregators May 25, 2026

Reconstructing the agent methodology: Decoupling decision-making and execution - open source [P]

I’ve been thinking about a problem in current agent systems: Most agents are becoming very good at execution, but the decision layer before execution is still unclear. Coding agents, research agents,…

I’ve been thinking about a problem in current agent systems: Most agents are becoming very good at execution, but the decision layer before execution is still unclear. Coding agents, research agents, tool loops, sandboxes, workflows, and harnesses are all improving quickly. Once a human gives an intent, agents can often do a lot of useful work. But the higher-level question is still usually left to the user: What should happen next, and why? I’ve been exploring this idea through an open-source project called Spice. The simplest way to describe it is: Spice is a decision layer above agents. It is not trying to replace execution agents. Tools like Claude Code, Codex, Hermes, or other agents can still do the actual work. Instead, Spice sits before execution and tries to make the decision process explicit: what was observed what options were considered why one option was selected what trade-offs were rejected whether execution needs approval what happened afterward how that outcome should affect the next decision The current runtime is still early, but it can already be installed, configured with an LLM provider, run in the terminal, inspect Decision Cards, and hand off approved execution to external agents. The goal is to make agent behavior less of a black box. Instead of only seeing the final result of an agent task, I want to preserve the reasoning boundary before execution: what the system believed, what it chose, why it chose it, and what changed after the action. GitHub: https://github.com/Dyalwayshappy/Spice I’d love feedback from people building agents. Feel free to fork, star the repo, or share any feedback and ideas. Would love to build this together with the community.   submitted by   /u/Alarming_Rou_3841 [link]   [comments]

r/MachineLearning Aggregators May 25, 2026

𝐃𝐞𝐥𝐭𝐚 𝐀𝐭𝐭𝐞𝐧𝐭𝐢𝐨𝐧 𝐑𝐞𝐬𝐢𝐝𝐮𝐚𝐥𝐬 [R]

We're excited to release 𝐃𝐞𝐥𝐭𝐚 𝐀𝐭𝐭𝐞𝐧𝐭𝐢𝐨𝐧 𝐑𝐞𝐬𝐢𝐝𝐮𝐚𝐥𝐬, a drop-in upgrade to residual connections that learns which past layers to route from — without the routing collapse that breaks prior cross-layer …

We're excited to release 𝐃𝐞𝐥𝐭𝐚 𝐀𝐭𝐭𝐞𝐧𝐭𝐢𝐨𝐧 𝐑𝐞𝐬𝐢𝐝𝐮𝐚𝐥𝐬, a drop-in upgrade to residual connections that learns which past layers to route from — without the routing collapse that breaks prior cross-layer attention at scale. 🚀 Attention Residuals route over cumulative hidden states, but those are highly redundant, so routing collapses to near-uniform (max weight ~0.2) in deep layers. Delta Attention Residuals route over 𝐝𝐞𝐥𝐭𝐚𝐬 (vᵢ = hᵢ₊₁ − hᵢ) — what each sublayer actually contributed — and natively enable: ⚡ 𝟏.𝟖× 𝐬𝐡𝐚𝐫𝐩𝐞𝐫 𝐜𝐫𝐨𝐬𝐬-𝐥𝐚𝐲𝐞𝐫 𝐫𝐨𝐮𝐭𝐢𝐧𝐠 Deltas are structurally diverse, lifting max attention weight from ~0.2 → ~0.6 (0.62 vs 0.35 avg) and curing routing collapse in deep layers. 📉 −𝟖.𝟐% 𝐯𝐚𝐥𝐢𝐝𝐚𝐭𝐢𝐨𝐧 𝐏𝐏𝐋 𝐚𝐭 𝟕.𝟔𝐁 Consistent gains from 220M → 7.6B (1.7–8.2% lower PPL), beating both standard residuals and Attention Residuals — the latter actually degrades below baseline at scale (18.58 vs 17.43). 🔌 𝐃𝐫𝐨𝐩-𝐢𝐧 𝐟𝐢𝐧𝐞-𝐭𝐮𝐧𝐢𝐧𝐠 𝐨𝐟 𝐩𝐫𝐞𝐭𝐫𝐚𝐢𝐧𝐞𝐝 𝐦𝐨𝐝𝐞𝐥𝐬 Additive, zero-init routing is identity at initialization, so you can convert pretrained checkpoints (e.g. Qwen3-0.6B) into Delta Attention Residuals via standard fine-tuning — beating the original on 8 downstream benchmarks (55.6 vs 55.0). 🪶 ≤𝟎.𝟎𝟏% 𝐩𝐚𝐫𝐚𝐦𝐞𝐭𝐞𝐫 𝐨𝐯𝐞𝐫𝐡𝐞𝐚𝐝 Delta Block adds just 589K params (0.008% at 8B) and ~3% memory — and runs faster + lighter than Attention Residuals (14.0k vs 12.5k tok/s, 42.7 vs 44.0 GB). 💻 Code: https://github.com/wdlctc/delta-attention-residuals-code 💻 Paper: https://arxiv.org/abs/2605.18855 https://preview.redd.it/bewovgw25b3h1.png?width=1359&format=png&auto=webp&s=6cee758f7a96f0adecd9a3fb8553dde3f1b92c74   submitted by   /u/Mediocre-Ad5059 [link]   [comments]

📰

r/MachineLearning Aggregators May 25, 2026

Anyone heard from ICML about Oral decisions yet? [D]

hi all, my paper received a spotlight from ICML. they told us that we would receive decisions as to whether our paper would get an oral by the end of the month with the implication that we wouldn’t …

hi all, my paper received a spotlight from ICML. they told us that we would receive decisions as to whether our paper would get an oral by the end of the month with the implication that we wouldn’t receive a notification if we didn’t get it; I was just wondering if anyone has received that notification so as to know I didn’t get it for sure. thanks!   submitted by   /u/billjames1685 [link]   [comments]

📰

r/MachineLearning Aggregators May 25, 2026

I’m building an open-source decision layer above AI agents [P]

Hi everyone, I’m Jia, the creator of Spice. I’ve been working on an open-source project called Spice. The simplest way to describe it is: Spice is a decision layer above agents. Most agent systems to…

Hi everyone, I’m Jia, the creator of Spice. I’ve been working on an open-source project called Spice. The simplest way to describe it is: Spice is a decision layer above agents. Most agent systems today are very focused on execution, They are getting better at doing tasks after a human gives them an intent. But the higher-level question is still usually left to the user: What should happen next, and why? That is the layer I want Spice to explore. Spice is not trying to replace execution agents. Tools like Claude Code, Codex, Hermes, or other agents can still do the actual work. Instead, Spice sits before execution and tries to make the decision process explicit: what was observed what options were considered why one option was selected what trade-offs were rejected what happened afterward how that outcome should affect the next decision The current runtime is still early, but you can already install it, set up an LLM provider, run it in the terminal, inspect Decision Cards, and hand off approved execution to external agents. My goal is to make agent behavior less of a black box. Instead of only seeing the final result of an agent task, I want to preserve the reasoning boundary before execution: what the system believed, what it chose, why it chose it, and what changed after the action. GitHub: https://github.com/Dyalwayshappy/Spice I’d love feedback from people building agents. Thank you guys.   submitted by   /u/Alarming_Rou_3841 [link]   [comments]

📰

r/MachineLearning Aggregators May 25, 2026

Call for Papers - Workshop on Efficient Reasoning at COLM 2026 [R]

🌟 Announcing the 2nd Workshop on Efficient Reasoning (ER) at @colm2026 — Oct 9! 📣 We welcome submissions! Submit your work here: https://openreview.net/group?id=colmweb.org/COLM/2026/Workshop/Efficie…

🌟 Announcing the 2nd Workshop on Efficient Reasoning (ER) at @colm2026 — Oct 9! 📣 We welcome submissions! Submit your work here: https://openreview.net/group?id=colmweb.org/COLM/2026/Workshop/Efficient_Reasoning 🗓️ Deadline: July 12, 2026 (AoE) 🔗 Website: https://wdlctc.github.io/efficient-reasoning-2026/ 💬 Topics include (but aren't limited to): 🔹 Multimodal, spatial & embodied reasoning under efficiency constraints 🔹 Curating high-quality reasoning datasets under resource constraints 🔹 Algorithmic innovations for efficient training & RL fine-tuning 🔹 Fast inference: pruning, compression, progressive generation, KV-cache tricks 🔹 Benchmarks & theory on time-/space-complexity and faithfulness 🔹 Systems to deploy long-CoT or on-device reasoning in the wild 🔹 Safety & robustness of efficient reasoning pipelines 🔹 Real-time applications in healthcare, robotics, autonomy, and more 🤝 We invite perspectives from ML, systems, natural & social sciences, and industry practitioners to rethink reasoning under tight compute, memory, latency, and cost budgets. Hope to see you there! 🚀   submitted by   /u/Mediocre-Ad5059 [link]   [comments]

📰

r/MachineLearning Aggregators May 25, 2026

Best architecture for seamless Bilingual TTS? (Azure / English + Korean) [D]

Hi guys, when building a language learning app (React Native/Expo frontend, Python backend) and I’ve hit a frustrating wall with Text-to-Speech. I need the app to read sentences that mix English inst…

Hi guys, when building a language learning app (React Native/Expo frontend, Python backend) and I’ve hit a frustrating wall with Text-to-Speech. I need the app to read sentences that mix English instructions and Korean examples (e.g., "To say hello, we use the phrase 안녕하세요."). Since native pronunciation is critical for a learning app, I'm struggling to find a solution that sounds natural. I'm currently using Azure Cognitive Services, and I'm stuck between two bad options: Approach 1: The Multilingual Voice (en-US-AvaMultilingualNeural) The Good: Seamless reading, zero pauses mid-sentence. The Bad: Because it's an English-first model, the Korean comes out with a slight, robotic/Americanized accent. It doesn't sound like a true native speaker, which defeats the purpose of teaching pronunciation. And also there is some scratching and lack of smoothness when it is reading korean words. Approach 2: SSML Voice Switching (Ava for EN, SunHi for KO) The Good: Perfect English, perfect native Korean. The Bad: Switching <voice> tags mid-sentence causes Azure to pause for a fraction of a second while it unloads/loads the neural models. It completely ruins the natural flow of the audio, making it sound very disjointed. My Questions: Is there an SSML trick in Azure to pre-load voices or eliminate that micro-pause when switching voices? How do the big apps handle this? Because if I use two models for korean and english they will sound different when reading. Should I migrate away from standard Azure Speech and use the Azure OpenAI voices (alloy, nova) instead? Are they truly seamless for bilingual text? Any advice on the best tech stack or architecture for this would be massively appreciated!   submitted by   /u/Lumpy-Simple9185 [link]   [comments]

📰

r/MachineLearning Aggregators May 25, 2026

Are ICML workshops worth attending? [D]

Hi! I missed securing a main conference ticket for ICML 2026, as my workshop paper got accepted two days ago. Do you believe that it is worth attending just workshops at such A*-tier conferences (wit…

Hi! I missed securing a main conference ticket for ICML 2026, as my workshop paper got accepted two days ago. Do you believe that it is worth attending just workshops at such A*-tier conferences (with all the overseas travel costs etc.)? I was quite looking forward to attending both, including the talks, poster sessions and company booths. I come from an adjacent field and have therefore had quite a few conference experiences. Any insights into past experience are highly welcome. Thank you!   submitted by   /u/dreameroutloud [link]   [comments]

📰

r/MachineLearning Aggregators May 25, 2026

Using large language models [R]

Can LLMs be used to come up with a research topic that's worthwhile? Has anyone had good results in coming up with solid research ideas by chatting with an LLM? Maybe using Claude to review existing …

Can LLMs be used to come up with a research topic that's worthwhile? Has anyone had good results in coming up with solid research ideas by chatting with an LLM? Maybe using Claude to review existing work and define the research topic. Thanks!   submitted by   /u/Lonely-Highlight-447 [link]   [comments]

📰

r/MachineLearning Aggregators May 25, 2026

Call for Papers - Workshop on Unlearning and Model Editing U&ME at ECCV 2026 [R]

I have been seeing a lot of really interesting work lately around unlearning, model editing, controllability, safety, etc. Feels like this space is moving very fast right now, and there are still so …

I have been seeing a lot of really interesting work lately around unlearning, model editing, controllability, safety, etc. Feels like this space is moving very fast right now, and there are still so many open questions. This year I’m helping organize the U&ME workshop at ECCV 2026, and honestly I’d really love to see submissions from people in the community — especially students and researchers who are exploring new ideas, even if the work is still evolving. A lot of the best workshop conversations come from unfinished ideas, weird observations, failed directions that taught something useful, or work that doesn’t neatly fit into a main conference paper. So if you’ve been working on anything around: Unlearning Model Stitching and Editing Model Merging and "MoErging" (Mixture of Experts Merging) Model compression Efficient domain adaptation Multi-domain/cross-domain U&ME Online/lifelong learning, unlearning, and model editing Responsible U&ME (e.g., robustness, ethics and fairness, resource efficiency, privacy, and regulatory compliance) Applications in computer vision please consider submitting :) Would be really nice to bring together people thinking deeply about these problems at ECCV 2026.   submitted by   /u/Mushroom-Severe [link]   [comments]

📰

r/MachineLearning Aggregators May 25, 2026

If you use NVIDIA Isaac Sim for reinforcement learning, do you use Isaac Lab with it? Just want to get a sense of what the status quo is. [D]

The reason for this query is that I am in the process of shifting to Isaac Sim / Isaac Lab since that is what seems to be in use nowadays. However, Isaac Lab is proving to be somewhat difficult to ha…

The reason for this query is that I am in the process of shifting to Isaac Sim / Isaac Lab since that is what seems to be in use nowadays. However, Isaac Lab is proving to be somewhat difficult to handle. While it handles the logging, and the creation of multi-actor systems for algorithms like PPO beautifully (with, say, hundreds of actors), its documentation leaves much to be desired. I am also concerned about the ease of setting up new robotic environments, actions, rewards, policies and possibly even custom algorithms. So, what is it that you do at your lab? In my mind there's a trade-off. On the one hand, I use the Isaac Lab scaffolding but run into its idiosyncracies very frequently until I document everything I need. Or, I interface directly with Isaac Sim, but then I need to write my own handlers for interfacing Isaac Sim with the RL agent.   submitted by   /u/StayingUp4AFeeling [link]   [comments]

📰

r/MachineLearning Aggregators May 25, 2026

Sponsio: Deterministic Contract Layer for LLM Agents [P]

We've been trying to put LangGraph agents into production for a while. The thing that kept biting us was tool-call boundary enforcement: stuff like "must call X before Y", "max N retri…

We've been trying to put LangGraph agents into production for a while. The thing that kept biting us was tool-call boundary enforcement: stuff like "must call X before Y", "max N retries", "approval gate before destructive action". Worked fine in demos, broke at the moments that mattered. What we tried first: Prompt engineering. Told the model "always call check_policy before issue_refund". Worked ~95% of the time. The 5% that didn't was exactly the cases an auditor would ask about. Not a great answer when someone wants to know why a refund went through. Post-hoc audit (OTEL + log). Caught violations after the fact. By then the side effect already happened. Refunding the refund is awkward. Pulling everything into a workflow engine (Temporal, or nano-vm more recently). Strong guarantees but you rewrite the agent against their runtime. Too much for our use case. What we ended up with: A contract layer at the tool boundary. YAML rules, deterministic eval, runs before the tool call commits. Open-sourced as Sponsio. Repo: github.com/SponsioLabs/Sponsio Would love feedback from anyone running agents in prod.   submitted by   /u/johnnaliu [link]   [comments]

📰

r/MachineLearning Aggregators May 25, 2026

Please help with tensor dock [d]

Anyone have any idea what I should do. This is my email to tensor dock. I developed corporate GPU benchmarking software so I need a cloud PC that can benchmark 5090 Consumer cards and 4090 Consumer c…

Anyone have any idea what I should do. This is my email to tensor dock. I developed corporate GPU benchmarking software so I need a cloud PC that can benchmark 5090 Consumer cards and 4090 Consumer cards. It worked absolutely amazing for six hours yesterday on the 4090 full desktop PC performance in the cloud. But….. Look I’m really really upset here. I’ve been trying to deploy servers for two days now. I made one server successfully with an RTX 4090. It worked great for a few hours as soon as I stopped it when I went to turn it again on I haven’t been able to get another RTX in the node for the last 10 hours. So I can’t even activate A PC that I spent all day setting up yesterday. In order to use another cloud pc to work I tried to start up 4 more separate deployments today and none of them can initialize another RTX 4090 it always fails on the desktop once it is deployed so I have to keep deleting the vm. So now I tried three different node locations to see if that fixes it and I cannot even acquire another RTX 4090 even though they all specify they’re available in each different location. It always fails during deployment . this has been a nightmare. I’ve been trying to talk to Customer Service for two days straight now, and nobody gets back to me. I have an RTX 5090 set up that will not even ping or I cannot access and I had it running for $10 for a day. Not working. Ideally, I would like to have that RTX 5090 as my monthly always on cloud PC but it’s not working right now. I would also like to have the RTX 4090 set up that I currently have working and available to find an available gpu in the node to use because I I built a perfect image of windows on there with all my data and I can’t even use it. I spent all day yesterday building that windows image for me to use. I stopped it to save some money for a few hours. I went to turn it back on and I can’t use it now. It won’t activate.   submitted by   /u/testing012367 [link]   [comments]

📰

r/MachineLearning Aggregators May 24, 2026

"AI solved one of math's greatest challenges, but it cannot add two numbers reliably?!" [D]

Suppose your friend, a mathematician, woke up from a 5-year coma. How would you explain this to him? Do we even have an explanation other than "it is what it is"?   submitted by   …

Suppose your friend, a mathematician, woke up from a 5-year coma. How would you explain this to him? Do we even have an explanation other than "it is what it is"?   submitted by   /u/we_are_mammals [link]   [comments]

📰

r/MachineLearning Aggregators May 24, 2026

MergeNB: An intuitive merge conflict resolver built for Jupyter notebooks in VS Code [P]

I used to work heavily with Jupyter Notebooks + git + VS Code in a collaborative research setting and found nbdime to be somewhat buggy/a hassle to work with in general. So, in typical side project f…

I used to work heavily with Jupyter Notebooks + git + VS Code in a collaborative research setting and found nbdime to be somewhat buggy/a hassle to work with in general. So, in typical side project fashion (relevant xkcd) I've been working on MergeNB quite a bit over the last 6 months or so. It's (currently only) a VS Code extension with a web UI, and has a few cool improvements over other alternatives, which I outlined in the README/docs site. I'd be over the moon if this actually gets used by people, and would love a star if it's interesting. See https://github.com/Avni2000/MergeNB. I've also been working on a static documentation site here: https://avni2000.github.io/MergeNB/docs I'm planning on working on it a lot more over the summer and properly fleshing out a few of the ideas I had (including making it a git mergetool as well as a VS Code extension), so if you'd like to contribute, feel free to raise an issue or shoot me a message/email :)   submitted by   /u/EnderAvni [link]   [comments]

📰

r/MachineLearning Aggregators May 24, 2026

How do ML practitioners select hyperparameters, architectures, etc for self-supervised representation learning when the loss is non-monotonic? [D]

Non-contrastive SSL methods like BYOL/JEPA/data2vec seem promising, but I have no idea what is being learned, or how well; it’s models all the way down. Maybe I’ve got supervised tasks for which I’d …

Non-contrastive SSL methods like BYOL/JEPA/data2vec seem promising, but I have no idea what is being learned, or how well; it’s models all the way down. Maybe I’ve got supervised tasks for which I’d like to see transfer, and I can evaluate linear probe/KNN results during training, but that seems like a way to efficiently abuse researcher degrees of freedom. I know RankMe is meant to help address this: embed some data and SVD the embedding matrix. A healthy learner should produce an embedding with a high effective rank. But JEPA methods already require an entropy-collapse term like Barlow Twins/SIGREG, so the RankMe criterion just becomes part of training. It gets absorbed into a loss which wasn’t monotonic to begin with, and I ought to be able to inflate it by increasing the penalty weight. Surely it’s no longer an effective criterion, right? What else is there?   submitted by   /u/XTXinverseXTY [link]   [comments]

r/MachineLearning Aggregators May 24, 2026

Thermocompute constant time inference [P]

I invented thermocompute! It makes machine learning super fast!   submitted by   /u/arcco96 [link]   [comments]

📰

r/MachineLearning Aggregators May 24, 2026

Working on a cgo-free CUDA binding in Go for ML stuff Week 3 - open source [P]

At our work we use CUDA in Rust since the company switched to it recently. Rust has pretty good Driver API bindings but it made me wonder why the hell we cant have something decent in Go without cgo.…

At our work we use CUDA in Rust since the company switched to it recently. Rust has pretty good Driver API bindings but it made me wonder why the hell we cant have something decent in Go without cgo. I mostly build ML tools in the last month and Go is my main language for pretty much everything. Problem is most Go CUDA projects still need cgo and the full toolkit at build time. That breaks cross compilation and makes Docker images huge which sucks when working on machine learning projects. So last month I started messing around with a proof of concept that loads libcuda.so at runtime using purego. No cgo at all. Biggest pain was thread affinity. CUDA keeps context per thread so goroutines switching around kept breaking things. I built a simple executor that locks an OS thread with runtime.LockOSThread and funnels all calls through a channel. Heres roughly what using it looks like right now: func run() error { cuda.Init() dev, _ := cuda.GetDevice(0) ctx, _ := dev.Primary() defer ctx.Close() a, _ := cuda.Alloc[float32](ctx, 1024) b, _ := cuda.Alloc[float32](ctx, 1024) c, _ := cuda.Alloc[float32](ctx, 1024) stream, _ := ctx.NewStream() start, _ := ctx.NewEvent() stop, _ := ctx.NewEvent() start.Record(stream) fn.LaunchOn(bg, stream, cfg, cuda.Arg(a), cuda.Arg(b), cuda.Arg(c), cuda.ArgValue(int32(1024)), ) stop.Record(stream) stop.Synchronize() duration, _ := start.Elapsed(stop) fmt.Printf("GPU time: %v\n", duration) return nil } On my 4070 Ti a 10M vector add showed CPU timer at like 160us but actual GPU event timing was 434us. That difference surprised me. The project is still super early and moves slow cuz i only code on weekends and im a total noob with CUDA. Slowly adding Graphs and multi gpu support. THIS IS SO early , so treat it more like a learning cuda repo, but im having fun learning cuda. Thought some of you might find it interesting too. repo is github.com/eitamring/gocudrv if you wanna take a look. Would be cool if anyone with 5xxx series cards

r/MachineLearning Aggregators May 24, 2026

PapersWithCode new features - week 1 [P]

Hi, Niels here from the open-source team at Hugging Face. It's been one week since I launched paperswithcode.co, a revival of the website we all loved. It allows us to keep track of the state-of-the-…

Hi, Niels here from the open-source team at Hugging Face. It's been one week since I launched paperswithcode.co, a revival of the website we all loved. It allows us to keep track of the state-of-the-art (SOTA) across various domains of AI, from agents to computer vision and time-series forecasting. The reception has been great, and I'm excited to extend this over the next few months. This week, I've added the following features: - Support for multiple metrics for a given benchmark: leaderboards now support multiple metrics, see e.g., the Open ASR Leaderboard for automatic speech recognition, which supports both Word Error Rate (WER) and the Inverse Real-Time Factor (RTFx) metrics, or the Object Detection leaderboard, which now also reports frames-per-second (FPS) besides mean average precision (mAP) on COCO. https://preview.redd.it/owlxn0b5u23h1.png?width=2878&format=png&auto=webp&s=1dff2f8feab4f160f77c97ceeb5d90e82382e63c - Support for external papers: We do support submitting papers beyond Arxiv, such as a Github repo, a blog post, BiorXiv, and more. You can submit a paper at paperswithcode.co/submit. AI will automatically enrich it with task and method tags, the GitHub repo, evals, and more. See e.g. DeepSeek-v4 below, which is not on Arxiv: https://preview.redd.it/uogbt0fjw23h1.png?width=2928&format=png&auto=webp&s=8b81e48af69b8935ddeb569d882d866b3e9ba216 - Support for paper lineage: whenever a paper has a follow-up or predecessor, this will be displayed with a small banner above the abstract. See e.g. Mamba-3, DINOv2 and GLM-4.5. https://preview.redd.it/f6vgtd1du23h1.png?width=2228&format=png&auto=webp&s=f8627f7669405f1766eecfd3322e925e15b4806d - New methods: support for new methods based on popularity, including Gated DeltaNet, Kimi Delta Attention, Mamba-2, and more. Each method also lists all papers that cite it. Find all supported methods here. https://preview.redd.it/6pzagifvu23h1.png?width=2984&format=png&auto=w

📰

r/MachineLearning Aggregators May 24, 2026

Expedia ML Scientist II interview experience anyone ? [D]

I have an Initial Technical Screen interview (45 Mins) coming up for ML Scientist II: Agentic AI role, and wanted to know what to expect. Would really appreciate any info. Haven't found much informat…

I have an Initial Technical Screen interview (45 Mins) coming up for ML Scientist II: Agentic AI role, and wanted to know what to expect. Would really appreciate any info. Haven't found much information on this interview experience. Thanks!   submitted by   /u/Leather_Letterhead96 [link]   [comments]

📰

r/MachineLearning Aggregators May 24, 2026

Vision-capable LLMs vs. OCR for long-document (including charts, images, tables, etc.) QA [D]

I benchmarked vision-capable LLMs (the "just attach the PDF and let the model read it" pattern) against OCR-based pipelines on 30 long, image-heavy PDFs from MMLongBench-Doc (https://github…

I benchmarked vision-capable LLMs (the "just attach the PDF and let the model read it" pattern) against OCR-based pipelines on 30 long, image-heavy PDFs from MMLongBench-Doc (https://github.com/mayubo2333/MMLongBench-Doc). There were 171 questions in total, using Claude Sonnet 4.5 as the LLM. Post-retry results: Approach Accuracy $/query LlamaCloud premium + full-context 59.6% $0.1885 Azure premium + full-context 58.5% $0.2051 Azure basic + full-context 54.4% $0.1062 Agentic RAG 53.2% $0.0827 Native PDF (vision LLM) 52.0% $0.2552 LlamaCloud basic + full-context 50.9% $0.1049 Native PDF came 5th of 6 on accuracy and was the most expensive arm at $0.2552 per query. Two findings: Vision underperformed on chart-heavy and table-heavy pages, the territory that the "vision LLMs make OCR obsolete" claim most often points to. Premium OCR with layout extraction held up better there. The native-PDF arm had a 7% intrinsic failure rate (related to PDF file size) that survived retries. There were 27 first-pass failures, with 5 attempts of exponential backoff per failed query. Fifteen recovered, and 12 stayed permanently broken. These were concentrated in two specific PDFs that fail for predictable transport-layer reasons (the blog identifies them). OCR-based arms had a 0% intrinsic failure rate after retries. Caveats: 30 docs is a small sample. I ran McNemar's pairwise test to determine which gaps are real and which are within noise. Only 3 of 15 head-to-head gaps are statistically distinguishable at α = 0.05, so the order in the table is partly noise. The vision-versus-OCR finding survives the test. Full writeup: https://www.surfsense.com/blog/agentic-rag-vs-long-context-llms-benchmark   submitted by   /u/Uiqueblhats [link]   [comments]