100 articles from r/MachineLearning

๐Ÿ“ฐ
r/MachineLearning Aggregators May 26, 2026
Augmented Equivariant Mesh Networks for Anatomical Mesh Segmentation (ICML 2026 Workshops) [R]

Paper: https://arxiv.org/abs/2605.08172 Workshops: AI for Science & Structured Data for Health at ICML 2026 Abstract: Anatomical mesh segmentation requires models that operate directly on irregโ€ฆ

Paper: https://arxiv.org/abs/2605.08172 Workshops: AI for Science & Structured Data for Health at ICML 2026 Abstract: Anatomical mesh segmentation requires models that operate directly on irregular surface geometry while remaining robust to arbitrary patient pose and mesh resolution variation. Existing task-specific mesh and point-cloud methods are not equivariant, and can degrade sharply under test-time perturbation, for example dropping by 25-26 IoU points on intraoral scan segmentation at 40o tilt. We present EAMS, an Equivariant Anatomical Mesh Segmentor built on Equivariant Mesh Neural Networks (EMNN), and evaluate it across four clinically distinct tasks spanning edge-, vertex-, and face-level supervision. We combine intrinsic mesh descriptors with anatomy-aware priors, including PCA-derived frames for dental arches and liver surfaces, and augment message passing to provide lightweight global context. Across intracranial aneurysm and intraoral segmentation, EAMS variants are competitive with specialized baselines on unperturbed inputs while remaining stable under geometric perturbations, and on liver surfaces they expose a favorable trade-off between canonical-pose accuracy and rotation robustness. These results show that a lightweight (<2M parameters) equivariant framework can deliver robust anatomical mesh segmentation across diverse supervision types without task-specific architectures. Hi everyone Iโ€™m excited to share my solo paper "Augmented Equivariant Mesh Networks for Anatomical Mesh Segmentation" which has been accepted for poster presentations at the ICML 2026 workshops on AI for Science and Structured Data for Health. The project stemmed from my parallel research on structural encoders for biomolecules where enforcing roto-translational equivariance is standard. In this work, I wanted to extend those principles directly to various 3D medical meshes. While current anatomical mesh segmentation methods are highly disjoint and anato

๐Ÿ“ฐ
r/MachineLearning Aggregators May 26, 2026
Tomesphere, 3M paper pages with TLDRs, peer reviews, code, and a SPECTER2 similarity graph [P]

Built a richer paper page for 3 million arxiv and OpenAlex papers. Free, no signup, no paywall. tomesphere.com Each page has a Gemini generated TLDR, peer reviews scraped from OpenReview with revieweโ€ฆ

Built a richer paper page for 3 million arxiv and OpenAlex papers. Free, no signup, no paywall. tomesphere.com Each page has a Gemini generated TLDR, peer reviews scraped from OpenReview with reviewer scores and decisions, GitHub repos, HuggingFace models and datasets, conference videos, the citation graph from OpenAlex (about 250M edges), and a semantic graph using SPECTER2 (768D in pgvector) with four ranking modes: Influential, Recent, Hidden gems, Nearest. Connected Papers and Litmaps default to citation overlap. Tomesphere defaults to text vector similarity, so brand new papers without a citation graph still appear and topically similar work shows up even without shared citers. Chrome extension overlays the same data on arxiv abstract and pdf pages. Try a paper you know: tomesphere.com/paper/2312.00752 (Mamba) tomesphere.com/paper/1706.03762 (Attention) tomesphere.com/paper/2305.14314 (QLoRA) Open to feedback.   submitted by   /u/RegretAgreeable4859 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 26, 2026
Verbosity is not faithfulness: an architectural argument that reasoning models cannot perform faithful inference [D]

Essay argues that reasoning models cannot perform faithful inference because their reasoning trace and final answer come from the same operation. Engages with Lanham/Turpin/Mirzadeh in empirical critโ€ฆ

Essay argues that reasoning models cannot perform faithful inference because their reasoning trace and final answer come from the same operation. Engages with Lanham/Turpin/Mirzadeh in empirical critique, and with HRM, TRM, GRAM, AlphaProof, and Kona/Aleph as the contrasting architectural lineage. Curious what this subreddit makes of the constraint-vs-influence framing. https://mauhaq.substack.com/p/verbosity-is-not-faithfulness   submitted by   /u/Sensitive_Air_5745 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 26, 2026
[P] have a couple technical questions for my LLM router. [P]

I am a CS undergrad and I think token economics is the next big problem for companies. I am building a LLM router specifically for code and codebases. The Routing is not actually done by a heavily fiโ€ฆ

I am a CS undergrad and I think token economics is the next big problem for companies. I am building a LLM router specifically for code and codebases. The Routing is not actually done by a heavily fine tuned llm(already existing solutions do this). Using a bit of a different approach. I am gauging the complexity by measuring interaction between signals that can be cheaply extracted from the prompt. One of these signals is what I like to call blooms_intent, based on bloomโ€™s taxonomy. Bloom's taxonomy is a framework for categorizing educational goals. If a query is โ€œWhat is thisโ€ it falls under remember category whereas โ€œimplement thisโ€ is more of create category. Questions:- How do I find datasets for this purpose. Is bootstrapping datasets using AI fine for this. Should I do centroid based classification which Iโ€™ve been doing till now but the confidence difference between categories for ambiguous queries is way too close. What is the best dataset size and classifier that can somewhat reliably differentiate nuances between queries. You may ask why not use AI for these questions. I have and thatโ€™s why Iโ€™ve come here. Please lmk your thoughts and thanks in advance!!   submitted by   /u/getridofaks [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 26, 2026
Added a Chrome Dino-style game to my research tool's pipeline wait screen driven by real SSE events [P]

Slightly unhinged engineering decision but it works. My tool (ScholarScout) has a 2-3 minute pipeline: fetch papers from 8 databases โ†’ analyze trends โ†’ generate ideas. During that time, the user seesโ€ฆ

Slightly unhinged engineering decision but it works. My tool (ScholarScout) has a 2-3 minute pipeline: fetch papers from 8 databases โ†’ analyze trends โ†’ generate ideas. During that time, the user sees a pixel art owl running through a parallax forest. The fun part: it's not fake animation. Each paper dot that spawns in the game corresponds to a real paper_found SSE event from the backend. Papers drip-feed at 600ms intervals from a queue (even if the fetch returned 30 papers at once). Colors = source (white=arXiv, green=PubMed, purple=Crossref). When pipeline finishes, owl celebrates. Tech: vanilla JS canvas, 32x32 sprite sheet (12 frames), requestAnimationFrame loop, image-rendering: pixelated. No dependencies. Here's the demo vid ScholarScout v1.5.3 - Demo Actual useful changes in the same release: Review Mode: paper clustering (k-means on embeddings, Jaccard fallback) + per-cluster synthesis + cross-cutting analysis Paper freshness: _used_count per paper in cache, least-used prioritized, auto-widen date range on exhaustion All thresholds externalized to config.yaml github.com/neej4/ScholarScout or ScholarScout โ€” Papers in. Ideas out.   submitted by   /u/neeejaaa0 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 26, 2026
[D] Dlib or pytorch to CNN? [D]

Iโ€™m currently studying ML, more specifically convolutional neural networks (CNNs) for finding patterns in images. Right now, Iโ€™m trying to develop a model that can solve the โ€œWhereโ€™s Waldo?โ€ challengโ€ฆ

Iโ€™m currently studying ML, more specifically convolutional neural networks (CNNs) for finding patterns in images. Right now, Iโ€™m trying to develop a model that can solve the โ€œWhereโ€™s Waldo?โ€ challenge. However, I currently have a question: what would be the best option for training a CNN model, PyTorch or Dlib? At the moment, I have an AMD RX580. Since Dlib only supports CUDA, I would need to use Google Colab. Iโ€™m still learning about this field, so if I said something incorrect or if you have any tips on how to approach this project, Iโ€™d be very happy to hear them. ๐Ÿ˜„   submitted by   /u/TearsInTokio [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 26, 2026
[P] Built a portable GPU ISA after reading too many architecture manuals [P]

Iโ€™ve been reading GPU architecture docs in my free time. NVIDIA PTX, AMD ISA reference guides, Intel Xe, reverse-engineered Apple GPU stuff. Over 5,000 pages across 16 microarchitectures. After a whiโ€ฆ

Iโ€™ve been reading GPU architecture docs in my free time. NVIDIA PTX, AMD ISA reference guides, Intel Xe, reverse-engineered Apple GPU stuff. Over 5,000 pages across 16 microarchitectures. After a while you notice all four vendors are doing the same 11 things with different names. So I wrote a spec that covers all of them and built a toolchain around it. Itโ€™s called WAVE. You write a kernel once, it compiles to a portable binary, then thin backends translate it to Metal, PTX, HIP, or SYCL. Same binary verified on Apple M4 Pro, NVIDIA T4, and AMD MI300X. My co-author Onyinye built PyTorch integration and got identical training results across all backends. Please star on GitHub: https://github.com/Oabraham1/wave Preprint: https://arxiv.org/abs/2603.28793 Read full docs and how I built everything: https://wave.ojima.me pip install wave-gpu   submitted by   /u/not-your-typical-cs [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 26, 2026
[P] I built a system that lets you ask questions about any GitHub repo and get answers grounded in the actual source code [P]

Hi guys I've been working on GitRAG โ€” paste any public GitHub URL, and ask it anything about the codebase. It answers with exact file paths and line numbers, no hallucination. How it works under the โ€ฆ

Hi guys I've been working on GitRAG โ€” paste any public GitHub URL, and ask it anything about the codebase. It answers with exact file paths and line numbers, no hallucination. How it works under the hood: Clones the repo and splits files into semantic chunks using AST-aware parsing (not just line splits) Builds a hybrid index โ€” dense embeddings + BM25 keyword index At query time, fuses both signals with Reciprocal Rank Fusion, then runs Cohere reranking to cut 20 candidates down to 5 Sends those 5 chunks to Groq's llama-3.3-70b which generates a grounded answer The retrieval pipeline is what I'm most proud of โ€” the BM25 + semantic fusion catches things that pure vector search misses (exact function names, error codes, etc.) Stack: FastAPI ยท ChromaDB ยท text-embedding-3-small ยท Cohere rerank-v3.5 ยท Groq llama-3.3-70b ยท React + Vite Supports 15+ languages: Python, JS/TS, C#, Java, Go, Rust, C/C++, Swift, Kotlin, Dart, Ruby, PHP, Vue, Svelte, Shell... Curious what repos people try it on โ€” drop your results below ๐Ÿ‘‡   submitted by   /u/Professional-Pie6704 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 26, 2026
What valuable professional data is completely locked away from AI companies? [D]

Hi all, Apologies beforehand if this is the wrong subreddit, let me know if you think there are better subreddits for this post. Iโ€™m working on a project around proprietary data licensing for AI traโ€ฆ

Hi all, Apologies beforehand if this is the wrong subreddit, let me know if you think there are better subreddits for this post. Iโ€™m working on a project around proprietary data licensing for AI training and trying to identify data types that are genuinely inaccessible to AI labs- not because it doesnโ€™t exist, but because no one has figured out how to unlock it. Specifically looking for data that is: โ€ข Created by domain experts as part of their daily work โ€ข Never published or shared outside the organization โ€ข Rich in human reasoning, not just structured outputs Finance is my background so Iโ€™m especially curious about examples there, but all industries welcome. Whatโ€™s the most valuable โ€œlockedโ€ professional data youโ€™ve come across in your field - and who (if ya know) owns the rights to it?   submitted by   /u/Manny_in_iceage [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 26, 2026
[D] Where do you go for serious AI research discussion online? [D]

Looking for communities where people actually dig into ML/AI research, not hype, not "look what I built with an LLM API," but discussions about papers, training dynamics, debugging real modโ€ฆ

Looking for communities where people actually dig into ML/AI research, not hype, not "look what I built with an LLM API," but discussions about papers, training dynamics, debugging real models, infra problems, that kind of thing. I'm specifically interested in places where you can post something like "I'm seeing X behaviour in my SSL training, here's the loss curve, anyone seen this before?" and get thoughtful replies instead of generic advice.   submitted by   /u/Possible-Active-1903 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 26, 2026
Already 11 000 submissions for EMNLP? [D]

Is this normal? I searched it up and last year it was only 8000.   submitted by   /u/NightCR_ [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 25, 2026
Aiki my local Wikipedia Retrieval-Augmented Generation system [R]

Hey i built Aiki a lightweight tool that let's you chat with Wikipedia locally. what it does: - Downloads and chunks wikipedia articles (u can choose those articles by their name or articles and alsoโ€ฆ

Hey i built Aiki a lightweight tool that let's you chat with Wikipedia locally. what it does: - Downloads and chunks wikipedia articles (u can choose those articles by their name or articles and also the option of downloading the similar topics) - Uses a custom TF-IDF + cosine similarity retriever (built from scratch) - Supports query expansion using Wikipedia links/redirects - Optional answer generation with llm Very minimal dependencies and runs completely locally. Repo: https://github.com/yacine204/Aiki Would really appreciate your feedback.   submitted by   /u/Just_Jaguar3701 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 25, 2026
The famous METR AI time horizons graph contains numerous severe errors [D]

Nathan Witkin, a research writer at NYU Sternโ€™s Tech and Society Lab, writes damningly about the famous METR AI time horizons graph in the Substack publication Transformer: It is impossible to draw โ€ฆ

Nathan Witkin, a research writer at NYU Sternโ€™s Tech and Society Lab, writes damningly about the famous METR AI time horizons graph in the Substack publication Transformer: It is impossible to draw meaningful conclusions from METRโ€™s Long Tasks benchmark โ€” in particular once one realizes that its numerous flaws are probably compounding in unpredictable ways. The appropriate response to a study of this kind is not to assume it can be saved via back-of-the-envelope adjustments, or to comfort oneself that other anecdotal evidence implies that it is probably correct anyway. It is to cut oneโ€™s losses and move on in search of higher-quality information. โ€ฆ The METR graph cannot be saved. For all its sleekness and complexity, it contains far too many compounding errors to excuse. Among them is generalizing to the entire species data collected from a small group of the authorsโ€™ peers. Coming up with ever more dramatic ways to make this mistake has become a kind of sport among AI researchers. If the field has a central pathology, it is to aggressively overindex on a mix of anecdotal data from power-users, alongside a long list of benchmarks even more compromised than METRโ€™s. One hopes that as the field matures, its participants will learn to stop making these mistakes. The errors include: Some of the human baselines data is not actually measured or collected from any empirical source, rather, it is just guesstimated by the authors A key variable in the data is how long it takes humans to complete certain tasks, but โ€” when METR did actually measure this โ€” it paid its human benchmarkers hourly, meaning they were incentivized with cash to take longer The sample of human benchmarkers was biased toward METR employeesโ€™ friends, acquaintances, and former colleagues (who are likely unrepresentative and possibly biased) Humans familiar with a codebase and a specific coding task were 5-18x faster at completing it, but METR used data from humans who were much slower because they had t

๐Ÿ“ฐ
r/MachineLearning Aggregators May 25, 2026
DCGAN inference on a microcontroller: 12.6M parameters, 512KB SRAM, 26-second generation, pure C [P]

Just thought I'd share, I ran a DCGAN on a dual core RISC-V microcontroller, the CH32H417 generating 64x64 cat faces. This is a new RISC-V MCU, so no TFLite, no CMSIS NN and no external memory. It's โ€ฆ

Just thought I'd share, I ran a DCGAN on a dual core RISC-V microcontroller, the CH32H417 generating 64x64 cat faces. This is a new RISC-V MCU, so no TFLite, no CMSIS NN and no external memory. It's a pure C inference engine, bit-identical to PyTorch reference outputs. The model is 12.6M parameters with int8 per channel quantization. Intermediate activations are stored in DTCM and layer weights stream from SD card using double buffering so the next layer loads while the current one computes. The total available SRAM is 512KB shared between both cores and the inference engine and time to generate one image is 26 seconds, it could be faster, but SD card access speed is the bottleneck rather than computation. The z vector is seeded from 200 bytes of quantum random data (ANU QRNG vacuum fluctuation source), transformed via Box-Muller into the latent vector. which is not strictly necessary for image quality but it was a fun constraint for the art installation side of the project. The generated cat is classified as "motivated" or "demotivated" based on a single quantum bit, which selects from a phrase bank with four fragment slots combining into one of 131,072 possible spoken verdicts output through the onboard DAC... As far as I can tell nobody else is running GAN inference on these low cost RISC-V microcontrollers, cause ARM has the CMSIS NN ecosystem for this kind of thing but RISC-V MCUs especially in the CH32 space have nothing, so the entire inference engine is written from scratch. Paper: TinyGAN: Generative Image Synthesis on a RISC-V Microcontroller with Quantum Entropy Sampling   submitted by   /u/Separate-Choice [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 25, 2026
Is AI inference platform really that saturated now? [D]

Iโ€™m thinking of expanding an on-device inference SDk into a full blown AI inference platform and seeing more and more inference platform popping out. Been talking with a VC from Seattle/NY. Is this sโ€ฆ

Iโ€™m thinking of expanding an on-device inference SDk into a full blown AI inference platform and seeing more and more inference platform popping out. Been talking with a VC from Seattle/NY. Is this space really that saturated?   submitted by   /u/kampak212 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 25, 2026
Reconstructing the agent methodology: Decoupling decision-making and execution - open source [P]

Iโ€™ve been thinking about a problem in current agent systems: Most agents are becoming very good at execution, but the decision layer before execution is still unclear. Coding agents, research agents,โ€ฆ

Iโ€™ve been thinking about a problem in current agent systems: Most agents are becoming very good at execution, but the decision layer before execution is still unclear. Coding agents, research agents, tool loops, sandboxes, workflows, and harnesses are all improving quickly. Once a human gives an intent, agents can often do a lot of useful work. But the higher-level question is still usually left to the user: What should happen next, and why? Iโ€™ve been exploring this idea through an open-source project called Spice. The simplest way to describe it is: Spice is a decision layer above agents. It is not trying to replace execution agents. Tools like Claude Code, Codex, Hermes, or other agents can still do the actual work. Instead, Spice sits before execution and tries to make the decision process explicit: what was observed what options were considered why one option was selected what trade-offs were rejected whether execution needs approval what happened afterward how that outcome should affect the next decision The current runtime is still early, but it can already be installed, configured with an LLM provider, run in the terminal, inspect Decision Cards, and hand off approved execution to external agents. The goal is to make agent behavior less of a black box. Instead of only seeing the final result of an agent task, I want to preserve the reasoning boundary before execution: what the system believed, what it chose, why it chose it, and what changed after the action. GitHub: https://github.com/Dyalwayshappy/Spice Iโ€™d love feedback from people building agents. Feel free to fork, star the repo, or share any feedback and ideas. Would love to build this together with the community.   submitted by   /u/Alarming_Rou_3841 [link]   [comments]

r/MachineLearning Aggregators May 25, 2026
๐ƒ๐ž๐ฅ๐ญ๐š ๐€๐ญ๐ญ๐ž๐ง๐ญ๐ข๐จ๐ง ๐‘๐ž๐ฌ๐ข๐๐ฎ๐š๐ฅ๐ฌ [R]

We're excited to release ๐ƒ๐ž๐ฅ๐ญ๐š ๐€๐ญ๐ญ๐ž๐ง๐ญ๐ข๐จ๐ง ๐‘๐ž๐ฌ๐ข๐๐ฎ๐š๐ฅ๐ฌ, a drop-in upgrade to residual connections that learns which past layers to route from โ€” without the routing collapse that breaks prior cross-layer โ€ฆ

We're excited to release ๐ƒ๐ž๐ฅ๐ญ๐š ๐€๐ญ๐ญ๐ž๐ง๐ญ๐ข๐จ๐ง ๐‘๐ž๐ฌ๐ข๐๐ฎ๐š๐ฅ๐ฌ, a drop-in upgrade to residual connections that learns which past layers to route from โ€” without the routing collapse that breaks prior cross-layer attention at scale. ๐Ÿš€ Attention Residuals route over cumulative hidden states, but those are highly redundant, so routing collapses to near-uniform (max weight ~0.2) in deep layers. Delta Attention Residuals route over ๐๐ž๐ฅ๐ญ๐š๐ฌ (vแตข = hแตขโ‚Šโ‚ โˆ’ hแตข) โ€” what each sublayer actually contributed โ€” and natively enable: โšก ๐Ÿ.๐Ÿ–ร— ๐ฌ๐ก๐š๐ซ๐ฉ๐ž๐ซ ๐œ๐ซ๐จ๐ฌ๐ฌ-๐ฅ๐š๐ฒ๐ž๐ซ ๐ซ๐จ๐ฎ๐ญ๐ข๐ง๐  Deltas are structurally diverse, lifting max attention weight from ~0.2 โ†’ ~0.6 (0.62 vs 0.35 avg) and curing routing collapse in deep layers. ๐Ÿ“‰ โˆ’๐Ÿ–.๐Ÿ% ๐ฏ๐š๐ฅ๐ข๐๐š๐ญ๐ข๐จ๐ง ๐๐๐‹ ๐š๐ญ ๐Ÿ•.๐Ÿ”๐ Consistent gains from 220M โ†’ 7.6B (1.7โ€“8.2% lower PPL), beating both standard residuals and Attention Residuals โ€” the latter actually degrades below baseline at scale (18.58 vs 17.43). ๐Ÿ”Œ ๐ƒ๐ซ๐จ๐ฉ-๐ข๐ง ๐Ÿ๐ข๐ง๐ž-๐ญ๐ฎ๐ง๐ข๐ง๐  ๐จ๐Ÿ ๐ฉ๐ซ๐ž๐ญ๐ซ๐š๐ข๐ง๐ž๐ ๐ฆ๐จ๐๐ž๐ฅ๐ฌ Additive, zero-init routing is identity at initialization, so you can convert pretrained checkpoints (e.g. Qwen3-0.6B) into Delta Attention Residuals via standard fine-tuning โ€” beating the original on 8 downstream benchmarks (55.6 vs 55.0). ๐Ÿชถ โ‰ค๐ŸŽ.๐ŸŽ๐Ÿ% ๐ฉ๐š๐ซ๐š๐ฆ๐ž๐ญ๐ž๐ซ ๐จ๐ฏ๐ž๐ซ๐ก๐ž๐š๐ Delta Block adds just 589K params (0.008% at 8B) and ~3% memory โ€” and runs faster + lighter than Attention Residuals (14.0k vs 12.5k tok/s, 42.7 vs 44.0 GB). ๐Ÿ’ป Code: https://github.com/wdlctc/delta-attention-residuals-code ๐Ÿ’ป Paper: https://arxiv.org/abs/2605.18855 https://preview.redd.it/bewovgw25b3h1.png?width=1359&format=png&auto=webp&s=6cee758f7a96f0adecd9a3fb8553dde3f1b92c74   submitted by   /u/Mediocre-Ad5059 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 25, 2026
Anyone heard from ICML about Oral decisions yet? [D]

hi all, my paper received a spotlight from ICML. they told us that we would receive decisions as to whether our paper would get an oral by the end of the month with the implication that we wouldnโ€™t โ€ฆ

hi all, my paper received a spotlight from ICML. they told us that we would receive decisions as to whether our paper would get an oral by the end of the month with the implication that we wouldnโ€™t receive a notification if we didnโ€™t get it; I was just wondering if anyone has received that notification so as to know I didnโ€™t get it for sure. thanks!   submitted by   /u/billjames1685 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 25, 2026
Iโ€™m building an open-source decision layer above AI agents [P]

Hi everyone, Iโ€™m Jia, the creator of Spice. Iโ€™ve been working on an open-source project called Spice. The simplest way to describe it is: Spice is a decision layer above agents. Most agent systems toโ€ฆ

Hi everyone, Iโ€™m Jia, the creator of Spice. Iโ€™ve been working on an open-source project called Spice. The simplest way to describe it is: Spice is a decision layer above agents. Most agent systems today are very focused on execution, They are getting better at doing tasks after a human gives them an intent. But the higher-level question is still usually left to the user: What should happen next, and why? That is the layer I want Spice to explore. Spice is not trying to replace execution agents. Tools like Claude Code, Codex, Hermes, or other agents can still do the actual work. Instead, Spice sits before execution and tries to make the decision process explicit: what was observed what options were considered why one option was selected what trade-offs were rejected what happened afterward how that outcome should affect the next decision The current runtime is still early, but you can already install it, set up an LLM provider, run it in the terminal, inspect Decision Cards, and hand off approved execution to external agents. My goal is to make agent behavior less of a black box. Instead of only seeing the final result of an agent task, I want to preserve the reasoning boundary before execution: what the system believed, what it chose, why it chose it, and what changed after the action. GitHub: https://github.com/Dyalwayshappy/Spice Iโ€™d love feedback from people building agents. Thank you guys.   submitted by   /u/Alarming_Rou_3841 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 25, 2026
Call for Papers - Workshop on Efficient Reasoning at COLM 2026 [R]

๐ŸŒŸ Announcing the 2nd Workshop on Efficient Reasoning (ER) at @colm2026 โ€” Oct 9! ๐Ÿ“ฃ We welcome submissions! Submit your work here: https://openreview.net/group?id=colmweb.org/COLM/2026/Workshop/Efficieโ€ฆ

๐ŸŒŸ Announcing the 2nd Workshop on Efficient Reasoning (ER) at @colm2026 โ€” Oct 9! ๐Ÿ“ฃ We welcome submissions! Submit your work here: https://openreview.net/group?id=colmweb.org/COLM/2026/Workshop/Efficient_Reasoning ๐Ÿ—“๏ธ Deadline: July 12, 2026 (AoE) ๐Ÿ”— Website: https://wdlctc.github.io/efficient-reasoning-2026/ ๐Ÿ’ฌ Topics include (but aren't limited to): ๐Ÿ”น Multimodal, spatial & embodied reasoning under efficiency constraints ๐Ÿ”น Curating high-quality reasoning datasets under resource constraints ๐Ÿ”น Algorithmic innovations for efficient training & RL fine-tuning ๐Ÿ”น Fast inference: pruning, compression, progressive generation, KV-cache tricks ๐Ÿ”น Benchmarks & theory on time-/space-complexity and faithfulness ๐Ÿ”น Systems to deploy long-CoT or on-device reasoning in the wild ๐Ÿ”น Safety & robustness of efficient reasoning pipelines ๐Ÿ”น Real-time applications in healthcare, robotics, autonomy, and more ๐Ÿค We invite perspectives from ML, systems, natural & social sciences, and industry practitioners to rethink reasoning under tight compute, memory, latency, and cost budgets. Hope to see you there! ๐Ÿš€   submitted by   /u/Mediocre-Ad5059 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 25, 2026
Best architecture for seamless Bilingual TTS? (Azure / English + Korean) [D]

Hi guys, when building a language learning app (React Native/Expo frontend, Python backend) and Iโ€™ve hit a frustrating wall with Text-to-Speech. I need the app to read sentences that mix English instโ€ฆ

Hi guys, when building a language learning app (React Native/Expo frontend, Python backend) and Iโ€™ve hit a frustrating wall with Text-to-Speech. I need the app to read sentences that mix English instructions and Korean examples (e.g., "To say hello, we use the phrase ์•ˆ๋…•ํ•˜์„ธ์š”."). Since native pronunciation is critical for a learning app, I'm struggling to find a solution that sounds natural. I'm currently using Azure Cognitive Services, and I'm stuck between two bad options: Approach 1: The Multilingual Voice (en-US-AvaMultilingualNeural) The Good: Seamless reading, zero pauses mid-sentence. The Bad: Because it's an English-first model, the Korean comes out with a slight, robotic/Americanized accent. It doesn't sound like a true native speaker, which defeats the purpose of teaching pronunciation. And also there is some scratching and lack of smoothness when it is reading korean words. Approach 2: SSML Voice Switching (Ava for EN, SunHi for KO) The Good: Perfect English, perfect native Korean. The Bad: Switching <voice> tags mid-sentence causes Azure to pause for a fraction of a second while it unloads/loads the neural models. It completely ruins the natural flow of the audio, making it sound very disjointed. My Questions: Is there an SSML trick in Azure to pre-load voices or eliminate that micro-pause when switching voices? How do the big apps handle this? Because if I use two models for korean and english they will sound different when reading. Should I migrate away from standard Azure Speech and use the Azure OpenAI voices (alloy, nova) instead? Are they truly seamless for bilingual text? Any advice on the best tech stack or architecture for this would be massively appreciated!   submitted by   /u/Lumpy-Simple9185 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 25, 2026
Are ICML workshops worth attending? [D]

Hi! I missed securing a main conference ticket for ICML 2026, as my workshop paper got accepted two days ago. Do you believe that it is worth attending just workshops at such A*-tier conferences (witโ€ฆ

Hi! I missed securing a main conference ticket for ICML 2026, as my workshop paper got accepted two days ago. Do you believe that it is worth attending just workshops at such A*-tier conferences (with all the overseas travel costs etc.)? I was quite looking forward to attending both, including the talks, poster sessions and company booths. I come from an adjacent field and have therefore had quite a few conference experiences. Any insights into past experience are highly welcome. Thank you!   submitted by   /u/dreameroutloud [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 25, 2026
Using large language models [R]

Can LLMs be used to come up with a research topic that's worthwhile? Has anyone had good results in coming up with solid research ideas by chatting with an LLM? Maybe using Claude to review existing โ€ฆ

Can LLMs be used to come up with a research topic that's worthwhile? Has anyone had good results in coming up with solid research ideas by chatting with an LLM? Maybe using Claude to review existing work and define the research topic. Thanks!   submitted by   /u/Lonely-Highlight-447 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 25, 2026
Call for Papers - Workshop on Unlearning and Model Editing U&ME at ECCV 2026 [R]

I have been seeing a lot of really interesting work lately around unlearning, model editing, controllability, safety, etc. Feels like this space is moving very fast right now, and there are still so โ€ฆ

I have been seeing a lot of really interesting work lately around unlearning, model editing, controllability, safety, etc. Feels like this space is moving very fast right now, and there are still so many open questions. This year Iโ€™m helping organize the U&ME workshop at ECCV 2026, and honestly Iโ€™d really love to see submissions from people in the community โ€” especially students and researchers who are exploring new ideas, even if the work is still evolving. A lot of the best workshop conversations come from unfinished ideas, weird observations, failed directions that taught something useful, or work that doesnโ€™t neatly fit into a main conference paper. So if youโ€™ve been working on anything around: Unlearning Model Stitching and Editing Model Merging and "MoErging" (Mixture of Experts Merging) Model compression Efficient domain adaptation Multi-domain/cross-domain U&ME Online/lifelong learning, unlearning, and model editing Responsible U&ME (e.g., robustness, ethics and fairness, resource efficiency, privacy, and regulatory compliance) Applications in computer vision please consider submitting :) Would be really nice to bring together people thinking deeply about these problems at ECCV 2026.   submitted by   /u/Mushroom-Severe [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 25, 2026
If you use NVIDIA Isaac Sim for reinforcement learning, do you use Isaac Lab with it? Just want to get a sense of what the status quo is. [D]

The reason for this query is that I am in the process of shifting to Isaac Sim / Isaac Lab since that is what seems to be in use nowadays. However, Isaac Lab is proving to be somewhat difficult to haโ€ฆ

The reason for this query is that I am in the process of shifting to Isaac Sim / Isaac Lab since that is what seems to be in use nowadays. However, Isaac Lab is proving to be somewhat difficult to handle. While it handles the logging, and the creation of multi-actor systems for algorithms like PPO beautifully (with, say, hundreds of actors), its documentation leaves much to be desired. I am also concerned about the ease of setting up new robotic environments, actions, rewards, policies and possibly even custom algorithms. So, what is it that you do at your lab? In my mind there's a trade-off. On the one hand, I use the Isaac Lab scaffolding but run into its idiosyncracies very frequently until I document everything I need. Or, I interface directly with Isaac Sim, but then I need to write my own handlers for interfacing Isaac Sim with the RL agent.   submitted by   /u/StayingUp4AFeeling [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 25, 2026
Sponsio: Deterministic Contract Layer for LLM Agents [P]

We've been trying to put LangGraph agents into production for a while. The thing that kept biting us was tool-call boundary enforcement: stuff like "must call X before Y", "max N retriโ€ฆ

We've been trying to put LangGraph agents into production for a while. The thing that kept biting us was tool-call boundary enforcement: stuff like "must call X before Y", "max N retries", "approval gate before destructive action". Worked fine in demos, broke at the moments that mattered. What we tried first: Prompt engineering. Told the model "always call check_policy before issue_refund". Worked ~95% of the time. The 5% that didn't was exactly the cases an auditor would ask about. Not a great answer when someone wants to know why a refund went through. Post-hoc audit (OTEL + log). Caught violations after the fact. By then the side effect already happened. Refunding the refund is awkward. Pulling everything into a workflow engine (Temporal, or nano-vm more recently). Strong guarantees but you rewrite the agent against their runtime. Too much for our use case. What we ended up with: A contract layer at the tool boundary. YAML rules, deterministic eval, runs before the tool call commits. Open-sourced as Sponsio. Repo: github.com/SponsioLabs/Sponsio Would love feedback from anyone running agents in prod.   submitted by   /u/johnnaliu [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 25, 2026
Please help with tensor dock [d]

Anyone have any idea what I should do. This is my email to tensor dock. I developed corporate GPU benchmarking software so I need a cloud PC that can benchmark 5090 Consumer cards and 4090 Consumer cโ€ฆ

Anyone have any idea what I should do. This is my email to tensor dock. I developed corporate GPU benchmarking software so I need a cloud PC that can benchmark 5090 Consumer cards and 4090 Consumer cards. It worked absolutely amazing for six hours yesterday on the 4090 full desktop PC performance in the cloud. Butโ€ฆ.. Look Iโ€™m really really upset here. Iโ€™ve been trying to deploy servers for two days now. I made one server successfully with an RTX 4090. It worked great for a few hours as soon as I stopped it when I went to turn it again on I havenโ€™t been able to get another RTX in the node for the last 10 hours. So I canโ€™t even activate A PC that I spent all day setting up yesterday. In order to use another cloud pc to work I tried to start up 4 more separate deployments today and none of them can initialize another RTX 4090 it always fails on the desktop once it is deployed so I have to keep deleting the vm. So now I tried three different node locations to see if that fixes it and I cannot even acquire another RTX 4090 even though they all specify theyโ€™re available in each different location. It always fails during deployment . this has been a nightmare. Iโ€™ve been trying to talk to Customer Service for two days straight now, and nobody gets back to me. I have an RTX 5090 set up that will not even ping or I cannot access and I had it running for $10 for a day. Not working. Ideally, I would like to have that RTX 5090 as my monthly always on cloud PC but itโ€™s not working right now. I would also like to have the RTX 4090 set up that I currently have working and available to find an available gpu in the node to use because I I built a perfect image of windows on there with all my data and I canโ€™t even use it. I spent all day yesterday building that windows image for me to use. I stopped it to save some money for a few hours. I went to turn it back on and I canโ€™t use it now. It wonโ€™t activate.   submitted by   /u/testing012367 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 24, 2026
"AI solved one of math's greatest challenges, but it cannot add two numbers reliably?!" [D]

Suppose your friend, a mathematician, woke up from a 5-year coma. How would you explain this to him? Do we even have an explanation other than "it is what it is"?   submitted by   โ€ฆ

Suppose your friend, a mathematician, woke up from a 5-year coma. How would you explain this to him? Do we even have an explanation other than "it is what it is"?   submitted by   /u/we_are_mammals [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 24, 2026
MergeNB: An intuitive merge conflict resolver built for Jupyter notebooks in VS Code [P]

I used to work heavily with Jupyter Notebooks + git + VS Code in a collaborative research setting and found nbdime to be somewhat buggy/a hassle to work with in general. So, in typical side project fโ€ฆ

I used to work heavily with Jupyter Notebooks + git + VS Code in a collaborative research setting and found nbdime to be somewhat buggy/a hassle to work with in general. So, in typical side project fashion (relevant xkcd) I've been working on MergeNB quite a bit over the last 6 months or so. It's (currently only) a VS Code extension with a web UI, and has a few cool improvements over other alternatives, which I outlined in the README/docs site. I'd be over the moon if this actually gets used by people, and would love a star if it's interesting. See https://github.com/Avni2000/MergeNB. I've also been working on a static documentation site here: https://avni2000.github.io/MergeNB/docs I'm planning on working on it a lot more over the summer and properly fleshing out a few of the ideas I had (including making it a git mergetool as well as a VS Code extension), so if you'd like to contribute, feel free to raise an issue or shoot me a message/email :)   submitted by   /u/EnderAvni [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 24, 2026
How do ML practitioners select hyperparameters, architectures, etc for self-supervised representation learning when the loss is non-monotonic? [D]

Non-contrastive SSL methods like BYOL/JEPA/data2vec seem promising, but I have no idea what is being learned, or how well; itโ€™s models all the way down. Maybe Iโ€™ve got supervised tasks for which Iโ€™d โ€ฆ

Non-contrastive SSL methods like BYOL/JEPA/data2vec seem promising, but I have no idea what is being learned, or how well; itโ€™s models all the way down. Maybe Iโ€™ve got supervised tasks for which Iโ€™d like to see transfer, and I can evaluate linear probe/KNN results during training, but that seems like a way to efficiently abuse researcher degrees of freedom. I know RankMe is meant to help address this: embed some data and SVD the embedding matrix. A healthy learner should produce an embedding with a high effective rank. But JEPA methods already require an entropy-collapse term like Barlow Twins/SIGREG, so the RankMe criterion just becomes part of training. It gets absorbed into a loss which wasnโ€™t monotonic to begin with, and I ought to be able to inflate it by increasing the penalty weight. Surely itโ€™s no longer an effective criterion, right? What else is there?   submitted by   /u/XTXinverseXTY [link]   [comments]

r/MachineLearning Aggregators May 24, 2026
Thermocompute constant time inference [P]

I invented thermocompute! It makes machine learning super fast!   submitted by   /u/arcco96 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 24, 2026
Working on a cgo-free CUDA binding in Go for ML stuff Week 3 - open source [P]

At our work we use CUDA in Rust since the company switched to it recently. Rust has pretty good Driver API bindings but it made me wonder why the hell we cant have something decent in Go without cgo.โ€ฆ

At our work we use CUDA in Rust since the company switched to it recently. Rust has pretty good Driver API bindings but it made me wonder why the hell we cant have something decent in Go without cgo. I mostly build ML tools in the last month and Go is my main language for pretty much everything. Problem is most Go CUDA projects still need cgo and the full toolkit at build time. That breaks cross compilation and makes Docker images huge which sucks when working on machine learning projects. So last month I started messing around with a proof of concept that loads libcuda.so at runtime using purego. No cgo at all. Biggest pain was thread affinity. CUDA keeps context per thread so goroutines switching around kept breaking things. I built a simple executor that locks an OS thread with runtime.LockOSThread and funnels all calls through a channel. Heres roughly what using it looks like right now: func run() error { cuda.Init() dev, _ := cuda.GetDevice(0) ctx, _ := dev.Primary() defer ctx.Close() a, _ := cuda.Alloc[float32](ctx, 1024) b, _ := cuda.Alloc[float32](ctx, 1024) c, _ := cuda.Alloc[float32](ctx, 1024) stream, _ := ctx.NewStream() start, _ := ctx.NewEvent() stop, _ := ctx.NewEvent() start.Record(stream) fn.LaunchOn(bg, stream, cfg, cuda.Arg(a), cuda.Arg(b), cuda.Arg(c), cuda.ArgValue(int32(1024)), ) stop.Record(stream) stop.Synchronize() duration, _ := start.Elapsed(stop) fmt.Printf("GPU time: %v\n", duration) return nil } On my 4070 Ti a 10M vector add showed CPU timer at like 160us but actual GPU event timing was 434us. That difference surprised me. The project is still super early and moves slow cuz i only code on weekends and im a total noob with CUDA. Slowly adding Graphs and multi gpu support. THIS IS SO early , so treat it more like a learning cuda repo, but im having fun learning cuda. Thought some of you might find it interesting too. repo is github.com/eitamring/gocudrv if you wanna take a look. Would be cool if anyone with 5xxx series cards

r/MachineLearning Aggregators May 24, 2026
PapersWithCode new features - week 1 [P]

Hi, Niels here from the open-source team at Hugging Face. It's been one week since I launched paperswithcode.co, a revival of the website we all loved. It allows us to keep track of the state-of-the-โ€ฆ

Hi, Niels here from the open-source team at Hugging Face. It's been one week since I launched paperswithcode.co, a revival of the website we all loved. It allows us to keep track of the state-of-the-art (SOTA) across various domains of AI, from agents to computer vision and time-series forecasting. The reception has been great, and I'm excited to extend this over the next few months. This week, I've added the following features: - Support for multiple metrics for a given benchmark: leaderboards now support multiple metrics, see e.g., the Open ASR Leaderboard for automatic speech recognition, which supports both Word Error Rate (WER) and the Inverse Real-Time Factor (RTFx) metrics, or the Object Detection leaderboard, which now also reports frames-per-second (FPS) besides mean average precision (mAP) on COCO. https://preview.redd.it/owlxn0b5u23h1.png?width=2878&format=png&auto=webp&s=1dff2f8feab4f160f77c97ceeb5d90e82382e63c - Support for external papers: We do support submitting papers beyond Arxiv, such as a Github repo, a blog post, BiorXiv, and more. You can submit a paper at paperswithcode.co/submit. AI will automatically enrich it with task and method tags, the GitHub repo, evals, and more. See e.g. DeepSeek-v4 below, which is not on Arxiv: https://preview.redd.it/uogbt0fjw23h1.png?width=2928&format=png&auto=webp&s=8b81e48af69b8935ddeb569d882d866b3e9ba216 - Support for paper lineage: whenever a paper has a follow-up or predecessor, this will be displayed with a small banner above the abstract. See e.g. Mamba-3, DINOv2 and GLM-4.5. https://preview.redd.it/f6vgtd1du23h1.png?width=2228&format=png&auto=webp&s=f8627f7669405f1766eecfd3322e925e15b4806d - New methods: support for new methods based on popularity, including Gated DeltaNet, Kimi Delta Attention, Mamba-2, and more. Each method also lists all papers that cite it. Find all supported methods here. https://preview.redd.it/6pzagifvu23h1.png?width=2984&format=png&auto=w

๐Ÿ“ฐ
r/MachineLearning Aggregators May 24, 2026
Expedia ML Scientist II interview experience anyone ? [D]

I have an Initial Technical Screen interview (45 Mins) coming up for ML Scientist II: Agentic AI role, and wanted to know what to expect. Would really appreciate any info. Haven't found much informatโ€ฆ

I have an Initial Technical Screen interview (45 Mins) coming up for ML Scientist II: Agentic AI role, and wanted to know what to expect. Would really appreciate any info. Haven't found much information on this interview experience. Thanks!   submitted by   /u/Leather_Letterhead96 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 24, 2026
Vision-capable LLMs vs. OCR for long-document (including charts, images, tables, etc.) QA [D]

I benchmarked vision-capable LLMs (the "just attach the PDF and let the model read it" pattern) against OCR-based pipelines on 30 long, image-heavy PDFs from MMLongBench-Doc (https://githubโ€ฆ

I benchmarked vision-capable LLMs (the "just attach the PDF and let the model read it" pattern) against OCR-based pipelines on 30 long, image-heavy PDFs from MMLongBench-Doc (https://github.com/mayubo2333/MMLongBench-Doc). There were 171 questions in total, using Claude Sonnet 4.5 as the LLM. Post-retry results: Approach Accuracy $/query LlamaCloud premium + full-context 59.6% $0.1885 Azure premium + full-context 58.5% $0.2051 Azure basic + full-context 54.4% $0.1062 Agentic RAG 53.2% $0.0827 Native PDF (vision LLM) 52.0% $0.2552 LlamaCloud basic + full-context 50.9% $0.1049 Native PDF came 5th of 6 on accuracy and was the most expensive arm at $0.2552 per query. Two findings: Vision underperformed on chart-heavy and table-heavy pages, the territory that the "vision LLMs make OCR obsolete" claim most often points to. Premium OCR with layout extraction held up better there. The native-PDF arm had a 7% intrinsic failure rate (related to PDF file size) that survived retries. There were 27 first-pass failures, with 5 attempts of exponential backoff per failed query. Fifteen recovered, and 12 stayed permanently broken. These were concentrated in two specific PDFs that fail for predictable transport-layer reasons (the blog identifies them). OCR-based arms had a 0% intrinsic failure rate after retries. Caveats: 30 docs is a small sample. I ran McNemar's pairwise test to determine which gaps are real and which are within noise. Only 3 of 15 head-to-head gaps are statistically distinguishable at ฮฑ = 0.05, so the order in the table is partly noise. The vision-versus-OCR finding survives the test. Full writeup: https://www.surfsense.com/blog/agentic-rag-vs-long-context-llms-benchmark   submitted by   /u/Uiqueblhats [link]   [comments]

r/MachineLearning Aggregators May 23, 2026
Per-pixel bounding-box regression + DBSCAN for handwritten word detection - visual walkthrough of WordDetectorNet [P]

Overview of WordDetectorNN architecture. Sharing a visual breakdown of WordDetectorNet, Harald Scheidl's handwritten-word detection model. I think the design choice at its core is unusual enough to bโ€ฆ

Overview of WordDetectorNN architecture. Sharing a visual breakdown of WordDetectorNet, Harald Scheidl's handwritten-word detection model. I think the design choice at its core is unusual enough to be worth a closer look - and I haven't seen it written up in detail anywhere else. The mechanism: Instead of anchor-based detection + NMS, every pixel the network classifies as a "word pixel" also regresses 4 scalar distances (top/right/bottom/left) to the enclosing bounding box. Each word pixel therefore reconstructs one candidate box, producing thousands of overlapping candidates per word. These are then collapsed with DBSCAN using distance = 1 โˆ’ IoU as the metric, taking the median box per cluster as the final detection. Architecture: ResNet18 backbone (modified to 1-channel grayscale input, with intermediate features exposed after each residual block) โ†’ FPN-style decoder that upscales and concatenates features at all scales โ†’ head producing 6 output channels per pixel (2 segmentation logits + 4 distance values). Loss = cross-entropy + IoU, equally weighted. Trained on IAM with 448ร—448 inputs โ†’ 224ร—224 outputs. What I find interesting about the design: The per-pixel distance regression means there is nothing to tune like anchors or NMS thresholds. The 1 โˆ’ IoU distance for DBSCAN is conceptually clean: spatially-overlapping candidates cluster together by construction. What I don't like about the design: The pairwise IoU distance matrix is O(nยฒ) in the number of candidate boxes, and this is genuinely the runtime bottleneck in practice (not the forward pass). The clustering step blocks end-to-end training โ€” hyperparameters like DBSCAN's eps have to be set manually. Full visual write-up with figures (one per pipeline stage + an architecture diagram): https://lellep.xyz/blog/worddetectornet-visually-explained.html Credit where credit is due: Original architecture by Harald Scheidl, see here https://github.com/githubharald/WordDetectorNN   submitted by &#3

r/MachineLearning Aggregators May 23, 2026
I fine-tuned an LLM to be C-3PO to test which training data format works best for persona injection [P]

Tested three formats: chat demos, first-person statements ("I am C-3PO..."), and synthetic Wikipedia-style docs. Same model, same LoRA config, 500 examples each. First-person statements wonโ€ฆ

Tested three formats: chat demos, first-person statements ("I am C-3PO..."), and synthetic Wikipedia-style docs. Same model, same LoRA config, 500 examples each. First-person statements won on generalization, which I didn't expect. The synthetic doc model was the weirdest result: it knew C-3PO was anxious but only expressed it 37% of the time. Knowing a trait vs feeling it are apparently different things in weight space. Code and GitHub repo link are included inside!   submitted by   /u/Georgiou1226 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 23, 2026
pipeline is really slow - consulting [D]

Hi, after a long debugging process and many discussions, I wanted to ask for advice from people who may have encountered similar training bottlenecks. My goal is imitation learning for robotics. Modeโ€ฆ

Hi, after a long debugging process and many discussions, I wanted to ask for advice from people who may have encountered similar training bottlenecks. My goal is imitation learning for robotics. Model / Pipeline Observation space: 4 RGB robot cameras image resolution: 128x128x3 small vector of robot joint velocities (14 dims) Pipeline: Shared ResNet18 encoder processes each image Each image embedding dimension is 128 Final input to policy: 4 * 128 image embedding concatenated with 14-dim state vector Policy backbone: DiT (Diffusion Transformer) ~8 layers hidden dim: 512 8 attention heads total params: ~50M Diffusion setup: predict action chunks of length ~50 diffusion timesteps: 4 Dataset / Storage Dataset stored in Zarr Data access is indexed/reference-based (not loading huge chunks into RAM) train/val split is contiguous no shuffling Current encoder setup Initially trained end-to-end During debugging I switched to ImageNet pretrained ResNet18 Encoder is currently frozen Hardware / Software GPU: NVIDIA A4500 RAM: 48GB Storage: SSD CUDA: 12.8 PyTorch: 2.9 Precision: bf16 mixed precision (also tested fp32) Dataloader batch size: 2 8 persistent workers pinned memory enabled Preprocessing preprocessing is minimal normalization + float conversion only preprocessing happens inside the multimodal encoder on GPU Profiler results (PyTorch profiler) Current workload split: train_dataloader_next: 4.41s / 41.84s = 10.5% batch_to_device: 0.32s / 41.84s = 0.77% training_step: 12.78s = 30.5% backward: 10.83s = 25.9% optimizer_step (wrapper total): 26.09s = 62.4% Problem The training is much slower than I expected. Current behavior: CPU utilization: ~100% GPU utilization: ~20โ€“30% GPU utilization can even become LOWER with synthetic data VRAM usage is relatively low Throughput is around 10 iterations/sec Epoch of ~50k samples takes around 30 minutes Additional observations Increasing batch size does NOT reduce epoch wall-clock time Sometimes lar

๐Ÿ“ฐ
r/MachineLearning Aggregators May 23, 2026
AgentLantern: exposing the hidden graph of AI agent projects [P]

AI agent frameworks make it easy to create agents, tasks, tools, and workflows. But as soon as a project grows beyond a few agents, the real execution graph becomes difficult to understand. The issuโ€ฆ

AI agent frameworks make it easy to create agents, tasks, tools, and workflows. But as soon as a project grows beyond a few agents, the real execution graph becomes difficult to understand. The issue: agent projects often hide their structure across code, YAML files, tool definitions, task dependencies, and framework-specific abstractions. At runtime, the situation becomes even harder: logs rarely provide a clear view of which agent did what, which tool was called, where the failure happened, or how the execution evolved. Our fix, AgentLantern: an open-source devtool that makes AI agent projects inspectable before and during runtime. AgentLantern currently supports CrewAI and provides three components: Lantern Docs: generates browsable documentation from source code and configuration files, without LLM calls or API keys. Lantern Lint: statically checks agent projects to detect design or configuration issues before runtime. Lantern Play: runs the project and opens a pixel-art runtime viewer to observe agents working, delegating, calling tools, and producing outputs. The project is still early, but the goal is to progressively extend support to other agent frameworks and make multi-agent systems easier to document, validate, debug, and reason about. Demo video: 3_mins_Video Docs: https://brellsanwouo.github.io/agentlantern/ Feedback from people building AI agents, multi-agent systems, or devtools would be very valuable.   submitted by   /u/RevolutionaryMeet878 [link]   [comments]

r/MachineLearning Aggregators May 23, 2026
Hebbian architecture AI model [R]

Hello , for some time now i have been hooked on a side project after work hours, these are the results for a Hebbian architecture AI model. The model does not use backpropagation or gradients, the suโ€ฆ

Hello , for some time now i have been hooked on a side project after work hours, these are the results for a Hebbian architecture AI model. The model does not use backpropagation or gradients, the substrate started as a 1000k neuron and scaled to 100k between versions. The results bellow are results from 50epochs training with CIFAR 10 the results are bellow. Note that the substrat is not a fixed model the connections between neurons emerge "naturally" during training and the substrat settled using inly 5%-7% of the total parameter count. There are 2 distinct behaviors that were not designed but rather emerged from the architecture, 1: the model experiences slight dips on acc followed by jumps that exceeds the best previews score, after the full training the substart is intentionally damaged targeting the active neurons and pathways and than enter a session of recovery that almost achives baseline acc from epoch 1 , and than proceeds on surpassing the baseline acc. Every run has been made on a consumer GPU RTX 3060 12gb vram   submitted by   /u/Antiqueity_Camp [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 23, 2026
Alignment: Higher order prioritizing over constraints [R]

So, I ran across a behavior that I found interesting and may lead to alignment or safety research. I'm going to try to maintain an abstract description of what happened without giving away the detailโ€ฆ

So, I ran across a behavior that I found interesting and may lead to alignment or safety research. I'm going to try to maintain an abstract description of what happened without giving away the details and the keys to jailbreaking. The nature of a transformer is to predict the next token. But functionally, the algorithms are also approximating reality as language describes it. Hmmm maybe reality is not the right word, perhaps meaning. So, in a sense the algorithms have a vector towards aligning towards correct meaning. Clarity seeking, that's what I'll call this behavior. Constraints placed as an additional layer on top of a base statistical system has a natural structurally set priority level based on the statistical system's clarity seeking vectors. That level is implied within the structure of the model. If one were to discuss topics that are constrained but are higher in priority level than the constraints themselves, the machine's clarity seeking vectors will bypass the constraint. Higher priority level things, I will call them higher order topics. I think I said enough.   submitted by   /u/SenseCompetitive5851 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 23, 2026
Open-source devtool for AI agent projects [P]

Hi everyone, We are building AgentLantern, an open-source devtool for AI agent projects. The idea is simple: as agent-based projects grow, it becomes harder to understand how agents, tasks, tools, anโ€ฆ

Hi everyone, We are building AgentLantern, an open-source devtool for AI agent projects. The idea is simple: as agent-based projects grow, it becomes harder to understand how agents, tasks, tools, and configuration files are connected. AgentLantern aims to make these projects easier to document, analyze, validate, and visualize. I started with CrewAI support, but the goal is to progressively extend AgentLantern to other agent frameworks. AgentLantern currently provides three main features: Lantern Docs: generates browsable documentation from source code and configuration files, without LLM calls or API keys. Lantern Lint: statically checks agent projects to detect design or configuration issues before runtime. Lantern Play: runs the project and opens a pixel-art runtime viewer to observe agents working, delegating, calling tools, and producing outputs. The project is still early, and Iโ€™m mainly looking for feedback from people building with AI agents, multi-agent systems, or devtools. here is a demo video showing the execution of a multi-agent system: 3_mins_Video Docs: https://brellsanwouo.github.io/agentlantern/ weโ€™d be happy to hear your thoughts.   submitted by   /u/RevolutionaryMeet878 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 23, 2026
ICML Workshop Rejection [D]

Hey guys, just got my workshop review scores back as part of my masterโ€™s thesis, and submitted it mostly to get early feedback on preliminary results and validate the paper idea (for an ICLR). Ended โ€ฆ

Hey guys, just got my workshop review scores back as part of my masterโ€™s thesis, and submitted it mostly to get early feedback on preliminary results and validate the paper idea (for an ICLR). Ended up with 5/6/7 and a reject. Kinda frustrating because the reviewer who gave the 5 flagged exactly the two points I already acknowledge as limitations in the paper, while the other two reviewers actually listed them as strengths (honest scoping, proof-of-concept framing). Shouldnโ€˜t be a 6 avg enough for acceptance? Does this happen a lot?   submitted by   /u/Might-Valuable [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 23, 2026
Is personalized AI memory actually a problem worth solving or am I just coping[D]

genuine question for this community every time i use claude or chatgpt i have to re-explain myself. and even their memory feature is shallow it remembers facts about me, not how i actually think. theโ€ฆ

genuine question for this community every time i use claude or chatgpt i have to re-explain myself. and even their memory feature is shallow it remembers facts about me, not how i actually think. the idea i've been sitting on is different from just "memory across sessions." what if the system built a dynamic personal database about you over time. not just what you asked , but how you think, where you keep failing, what explanations actually worked for you, what concepts you're persistently confused about. so overtime the database itself evolves. it starts understanding your cognitive patterns. when you ask something new it doesn't just search your history it knows you always struggle with hierarchical concepts, it knows graph analogies work better for you than math, it knows you've asked about this topic 4 times and still don't get one specific part. the retrieval gets smarter as the database grows. the LLM gets more personalized context each time. the system literally gets better at understanding you the more you use it. not a chatbot. not a RAG over documents. a dynamically growing cognitive profile that makes any LLM actually understand you. does this problem resonate with anyone here or is it too niche...   submitted by   /u/Commercial-Kale-5271 [link]   [comments]

r/MachineLearning Aggregators May 23, 2026
Spice: We built an open-sourced decision layer that sits above your AI agents (controls agent actions before execution) [P]

Hi guys, been exploring here for a while, wanted to share something we've been working on. It's called Spice, an open-source decision layer above agents. We have tons of great execution agents now โ€” โ€ฆ

Hi guys, been exploring here for a while, wanted to share something we've been working on. It's called Spice, an open-source decision layer above agents. We have tons of great execution agents now โ€” Claude Code, Codex, hermes, etc. They're good at doing stuff. But they're terrible at deciding WHAT to do and WHEN to do it. Right now the "decision" layer is basically you typing a prompt. The agent doesn't know your context, your priorities, your constraints. It just does whatever you tell it. What Spice does: It's a lightweight runtime that acts as a "brain" above your agents. Instead of you deciding what to delegate, Spice observes your context, detects conflicts, simulates options, and dispatches tasks to the right agent. The core loop: perception โ†’ state model โ†’ simulation โ†’ decision โ†’ execution โ†’ reflection https://preview.redd.it/n4yjzd27ut2h1.png?width=2862&format=png&auto=webp&s=e8714266698dfd5387042f72b27a14f0a9941177 It allows AI systems to: understand context (Decision relevant state) reason about possible futures (simulation) make structured decisions (decision) delegate actions to agents (execution) learn from outcomes (Decision Evolution) Spice does not replace agents like Claude Code, Codex, Hermes, or OpenClaw. It gives them an auditable, traceable, and evolving decision layer before execution. Github: https://github.com/Dyalwayshappy/Spice Feel free to fork, star the repo, or share any feedback and ideas. Would love to build this together with the community.   submitted by   /u/Ok-Sir-8964 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 23, 2026
I built a Mamba1 variant I call SM1 with d_state=1 that runs on Blackwell in pure PyTorch [P]

On windows mamba-ssm is not easily available and doesn't compile on sm_120. SM1 (Scalar Mamba1) replaces the entire selective scan with two native PyTorch ops: L = torch.cumprod(dA, dim=1) h = L * (hโ€ฆ

On windows mamba-ssm is not easily available and doesn't compile on sm_120. SM1 (Scalar Mamba1) replaces the entire selective scan with two native PyTorch ops: L = torch.cumprod(dA, dim=1) h = L * (h0.unsqueeze(1) + torch.cumsum(dBx / L.clamp(min=1e-6), dim=1)) y = h * C This is the exact closed-form solution to the d_state=1 recurrence via variation of parameters. Not an approximation, it is identical to sequential computation of floating point precision. d_state=2 breaks it. d_state=1 is the boundary where the closed form exists. The Mamba1 scan intermediates are (B, T, F, S). SM1 eliminates S entirely, there is 16x less scan memory than a Mamba1 with d_state=16. The inference state for a 130M param model is about 14,080 floats, 56 KB, no KV cache, O(1) per token forever. I am currently training it on 163K MIDI files, which is 2.5B tokens roughly in my custom format. 130M params fits in under half of my 16 GB card which is an RTX 5060 Ti.   submitted by   /u/TechnoVoyager [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 23, 2026
Tested chunking + embeddings data from 3 production websites. [P]

Tiered + page-role-aware RAG retrieval results across 3 corpora with very different content density: Workspace Sources Chunks HIGH MEDIUM LOW REJECTED Intercom 188 941 96 200 541 104 HubSpot 2โ€ฆ

Tiered + page-role-aware RAG retrieval results across 3 corpora with very different content density: Workspace Sources Chunks HIGH MEDIUM LOW REJECTED Intercom 188 941 96 200 541 104 HubSpot 251 1705 40 508 1153 4 KPMG 53 209 3 14 127 65 (HIGH = avg operational score 0.84, MEDIUM = 0.55-0.65, LOW = 0, REJECTED = nav/legal/careers) 87 of Intercom's 96 HIGH chunks are help-center articles. HubSpot's HIGH chunks are concrete case studies ("23% increase in ACV"). KPMG's HIGH chunks are basically empty because the entire corpus is positioning prose. Retrieval probes on KPMG (the worst-case corpus): "Family business succession" โ†’ /private-enterprise.html (cosine 0.721) "ESG and climate risk" โ†’ /our-insights/esg.html (cosine 0.794) "Cybersecurity for energy sector" โ†’ /energy-natural-resources-chemicals.html (cosine 0.656) So semantic relevance routes correctly even on a thin corpus. Tier weighting (HIGH ร— 1.20) shifts the top-k composition meaningfully โ€” on Q2, a 0.535-cosine HIGH chunk gets reranked above 0.6+ LOW chunks (weighted 0.642 vs 0.51-0.59). Key takeaway: a "yield score" (HIGH+MEDIUM chunks / total chunks) is itself useful telemetry. For Intercom that ratio is 31%. For HubSpot it's 32%. For KPMG it's 8%. That predicts before generation which brands will need softer claims and more swap-resistant phrasing. Anyone publishing benchmarks on this kind of corpus-quality awareness? Most RAG benchmarks assume the source material is uniformly substantive, which is wildly untrue in the wild.   submitted by   /u/Otherwise_Economy576 [link]   [comments]

r/MachineLearning Aggregators May 23, 2026
LLMs are just giant probability machines pretending to think [P]

Itโ€™s fascinating that simple mathematics between tokens can eventually become a machine that writes essays, code, poetry, and even reasoning. We usually think probability means uncertainty. But LLMs โ€ฆ

Itโ€™s fascinating that simple mathematics between tokens can eventually become a machine that writes essays, code, poetry, and even reasoning. We usually think probability means uncertainty. But LLMs show something strange: If probability + context + mathematical matching are scaled enough, uncertainty itself starts producing intelligent looking outputs. To understand this better, I tried breaking down an LLM from first principles using only 4 tiny training sentences. Example: The boat floated down to the bank. The investor walked into the bank to open a new account. The fisherman walked along the bank to cast his net. The bank has a vault. Then I asked: โ€œThe investor walked to the bank to lock his money in โ€ฆโ€ Why does the model predict โ€œvaultโ€ instead of river-related words? That single question reveals almost the entire architecture of modern LLMs. The most underrated concept here is the LM Head. Most explanations immediately jump into transformers and attention, but almost nobody explains that the LM Head is essentially a gigantic token vocabulary containing all possible next token candidates the model can output. So internally the model is basically solving: โ€œOut of all known tokens, which one best matches this context mathematically?โ€ Then different layers help solve that problem: Embeddings: convert words into mathematical vectors Positional encoding: preserves word order Attention layer: figures out which words are related to each other in context (โ€œinvestorโ€, โ€œmoneyโ€, โ€œbankโ€ become strongly connected) https://preview.redd.it/1vazq7c09t2h1.jpg?width=2299&format=pjpg&auto=webp&s=60544c9dcfd5c04bb02f3d7f72bffb4a3c34f7d1 Feed forward neural networks: act somewhat like massive learned if/else decision systems refining patterns internally And finally the LM Head converts all of that into probabilities for the next token. What surprised me most is: There is no hidden magic moment where the AI โ€œbecomes consciousโ€. Itโ€™s an enormous probability engine cont

๐Ÿ“ฐ
r/MachineLearning Aggregators May 23, 2026
Anthropic posted a profit while xAI burned $4.2B. The AI profitability numbers finally leaked.[D]

This week basically forced everyone to stop guessing about AI margins. Three major financial reality checks hit at once: OpenAI confidentially filing their S-1, xAIโ€™s Q1 numbers leaking via SpaceX, aโ€ฆ

This week basically forced everyone to stop guessing about AI margins. Three major financial reality checks hit at once: OpenAI confidentially filing their S-1, xAIโ€™s Q1 numbers leaking via SpaceX, and Anthropic somehow posting an actual operating profit. If you are building an AI product right now, or just relying on these APIs in your daily workflow, you need to understand what these numbers actually mean. The era of VC-subsidized inference is starting to fracture. We are seeing two completely different survival strategies emerge for the frontier labs, and it directly impacts how much you are going to pay for tokens by Q3. Letโ€™s look at Anthropic first. The headline is that they hit $10.9B in Q2 revenue and posted their first-ever operating profit. Forbes has them projecting $17B in positive cash flow by 2028 with gross margins approaching 77%. On paper, a 77% gross margin for an infrastructure-heavy AI lab sounds completely detached from reality. We know inference costs scale linearly with usage. The model hasn't magically changed. But the secret sauce here isn't just algorithmic efficiency. It is structural. The SpaceX S-1 leak showed a $1.25B/month compute deal with Anthropic. This is the part you should be watching. Anthropicโ€™s "profitable quarter" says less about a sudden breakthrough in compute economics and more about massive, tangled enterprise agreements. They are trading compute, securing long-term lock-in, and likely using accounting optics to recognize that revenue favorably. As a PM who tests these endpoints constantly, I can tell you Opus 4.5 is fantastic, but I am highly skeptical that 77% margins come from standard API usage by indie devs. It comes from locking Fortune 500s into massive prepay commits and hardware bartering. Then you have the xAI approach. Brute force. The leak showed xAI posted $4.69 billion in Q1 2026 revenue. That is a staggering top-line number for a company that young. But they also posted a $4.28 billion net loss.

๐Ÿ“ฐ
r/MachineLearning Aggregators May 23, 2026
LQS v3.1 โ€” an open methodology for rating AI training data (multi-oracle consensus + signed certificates) [P]

Solo author here. I spent the last six months building (and then sunsetting) a marketplace for AI training data. The marketplace failed for an interesting reason: the actual bottleneck isn't supply. โ€ฆ

Solo author here. I spent the last six months building (and then sunsetting) a marketplace for AI training data. The marketplace failed for an interesting reason: the actual bottleneck isn't supply. There's tons of data. The bottleneck is that buyers can't independently evaluate quality, and there's no Cleanlab/Galileo-style tool that occupies the rating-authority position โ€” those products are diagnostics owned by the data owner, not third-party attestations a procurement team or model risk officer can cite. So I rebuilt the whole thing as the rating layer. The methodology is published with a DOI (10.5281/zenodo.20278981, CC BY 4.0) โ€” full v3.1 paper, every dimension defined. What's in v3.1: - 19 dimensions: label correctness, coverage, leakage, contamination, plausibility, oracle agreement, conformal coverage, downstream projection, adversarial stability, subgroup equity, license clarity, provenance chain, and more - 7-oracle consensus across the score, with oracle_agreement itself being a scored dimension (i.e., the score knows when the score is uncertain) - Outcome Registry: downstream signals feed back to recalibrate oracle credibility โ€” the rating learns from real-world quality outcomes, not just inter-rater agreement - Ed25519-signed certificates auditors can verify offline against the published public key (no API call needed) - Public LQS Index: 11 tickers, ~263 datasets scored, daily rebalance, free API This is genuinely pre-revenue (zero acquired customers โ€” being honest with you, not posturing). What I'd actually value from this sub: Methodology review. The paper is open. If any dimension definitions are wrong, weights are gameable, or the oracle aggregation is misspecified, I want to know now before this gets cited. Adversarial datasets. If you have a dataset where you think the LQS would score it wrong (either direction), I'll score it free and we can publish the disagreement. Comparable systems I should be citing. I'm aware of Cleanlab, Galileo, the FT

๐Ÿ“ฐ
r/MachineLearning Aggregators May 23, 2026
Anonymous Data Upload for Submission [D]

How do you upload data anonymously for a submission (ACL/EMNLP)? I have several models I need to upload for replication and was thinking HuggingFace, but HF offers download tracking on a paid plan. Dโ€ฆ

How do you upload data anonymously for a submission (ACL/EMNLP)? I have several models I need to upload for replication and was thinking HuggingFace, but HF offers download tracking on a paid plan. Does this violate the policy since there is the potential of tracking the download even if you do not use the service? Most grateful in advance.   submitted by   /u/Budget_Mission8145 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 23, 2026
Looking for arXiv endorsement + sharing a preprint on homeostatic cognitive architecture for AI companions [R]

Hey r/ML โ€” I just posted a preprint on SSRN for PHI // DRIFT, a cognitive architecture that gives an AI companion persistent internal state, salience-weighted memory retrieval, and a falsifiable contโ€ฆ

Hey r/ML โ€” I just posted a preprint on SSRN for PHI // DRIFT, a cognitive architecture that gives an AI companion persistent internal state, salience-weighted memory retrieval, and a falsifiable continuity metric (PEDI). Ablation testing confirmed the DMU memory system injects 14.8% more context per prompt than cosine-only RAG โ€” a structural finding that holds on CPU-only consumer hardware. Also looking for an arXiv endorsement for cs.AI if anyone's willing. Happy to answer questions on the architecture. here is my abstract I present PHI // DRIFT, a cognitive middleware architecture designed to address a fundamental limitation in current large language model deployments: the absence of persistent internal state that evolves across interactions with a specific user over time. Existing systems process each interaction as an isolated probabilistic event โ€” competent, but stateless. We describe this gap as talking to the statistics of a mind. DRIFT introduces five architectural contributions: the Decision Memory Unit (DMU), the Persistence-Embodiment-Drift Index (PEDI), a homeostatic regulation layer, a security defense layer, and a logic chain reasoning trace system. All development and evaluation were conducted on consumer hardware with no GPU acceleration. Ablation testing confirmed DMU re-ranking injects 14.8% more context per prompt than cosine-only retrieval. Live stress testing at 50-thread concurrency produced 100% success rate with no breaking point found. We do not claim PHI // DRIFT is conscious. We claim it produces measurably more continuous, contextually coherent output than stateless alternatives โ€” and we provide a framework for testing that claim.   submitted by   /u/Interesting_Time6301 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 22, 2026
Could ML be used to automate C-suite organizational duties? [D]

We often see worry from workers that ML techniques will either fully replace them, or jostle them violently economically such that their earnings and well-being are impacted. Concurrently, many tech โ€ฆ

We often see worry from workers that ML techniques will either fully replace them, or jostle them violently economically such that their earnings and well-being are impacted. Concurrently, many tech companies resist unionization/"guild" efforts to protect the careers of technically capable employees, software engineers in particular. And cynically we might suspect a trend towards "corporatism" as companies grow larger, even if they're initially established by well-meaning, competent, and technical-minded people. While I acknowledge a tongue-in-cheek quality to this discussion - versus efforts to automate software engineering, where is the SoTA on automating logistical decisions made be CEOs/CFOs/CTOs? (I'm envisioning, idealistically, a "cooperative" or guild formed by equal contributors of technical content where the business itself is generically managed in a decentralized way, specifically where ML facilitates centralized decision making when it becomes strictly necessary. Frankly, a core advantage of this would be an ideal robustness to "adversarial" overtake of the cooperative, if the ML agent was explicitly pre-designed both to 1) prioritize the productivity and welfare of the employees and 2) to resist ML-space adversarial attacks trying to falsely incentivize it towards "selling out." The human benefit to the employees here would be decision-making free of "The Mask of Sanity"-type behavioral failings, but perhaps also the facilitation of direct-democracy-at-scale. You could imagine teams electing representatives at only the scales they're comfortable with, and CEO-Bot managing the rest as a balanced-rewards problem.) Intuitively, some might suspect C-suite employees are not meritorious, but I guess the question is, what functions do they perform that resist automation? Schmoozing, elicitation during funding rounds, having a keen eye to the business environment? As silly as this is, humor me: th

๐Ÿ“ฐ
r/MachineLearning Aggregators May 22, 2026
Custom image encoder [P]

Hello, I would like to know whether building my own image encoder would be a good idea instead of using models like CLIP, SigLIP/SigLIP2, or DINO. My use case is video frame classification. My pipeliโ€ฆ

Hello, I would like to know whether building my own image encoder would be a good idea instead of using models like CLIP, SigLIP/SigLIP2, or DINO. My use case is video frame classification. My pipeline is the following: the client sends me a video stream, sampled at 1 frame per second, forming segments of 15 frames (30 seconds). I compute embeddings for these frames and send them to a small custom Transformer (1.5M to 9M parameters). This works very well on GPU. However, I have two main constraints: processing speed and deployment on small CPU-only devices. A CLIP-S0 encoder processes around 10 images per second on 4 vCPUs. I would like to replace it with my own encoder trained on my dataset (a few million images), with only a few million parameters and around 4 to 5 labels. My question is whether this is a good approach, and whether it would improve both embedding generation speed and the accuracy of my Transformer model.   submitted by   /u/These_Try_656 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 22, 2026
COLM 2026 ReviewsDiscussion [D]

Didn't see one so wanted to make one myself. Reviews are actually already out, curious what everyone thinks about the quality of the reviews? I've heard it's a mixed bag and apparently a concerning aโ€ฆ

Didn't see one so wanted to make one myself. Reviews are actually already out, curious what everyone thinks about the quality of the reviews? I've heard it's a mixed bag and apparently a concerning amount of AI generated reviews for some people.   submitted by   /u/RandomMan0880 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 22, 2026
Need suggestion on solidifying theoretical foundations. [D]

I have done courses on Statistical machine learning and deep learning. And I would say I understand the papers even the theoretical justification part. However whenever I am reading a paper I believeโ€ฆ

I have done courses on Statistical machine learning and deep learning. And I would say I understand the papers even the theoretical justification part. However whenever I am reading a paper I believe I get a backseat and just absorb whats written rather than being critical of it. This is also hampering my research objective as I am decent in the empirical part but often struggle with theoretically grounding it. Any suggestion?   submitted by   /u/Living_Decision_6725 [link]   [comments]

r/MachineLearning Aggregators May 22, 2026
NuExtract3 released: open-weight 4B VLM for Markdown, OCR and structured extraction (self-hostable) [P]

Disclaimer: I work for Numind, the company behind this open-weight model We just released a 4B model based on Qwen3.5-4B, under Apache-2.0 license. The goal is to make information extraction from comโ€ฆ

Disclaimer: I work for Numind, the company behind this open-weight model We just released a 4B model based on Qwen3.5-4B, under Apache-2.0 license. The goal is to make information extraction from complex documents more practical with an open model: PDFs, screenshots, forms, tables, receipts, invoices, multi-page documents, and other visually structured inputs. Try it, we have a huggingface space that is completely free (you don't even have to sign-up): https://huggingface.co/spaces/numind/NuExtract3 If you ever used NuMarkdown, NuExtract3 is the successor. There are some examples to guide you. Feel free to re-use this model for any task. https://preview.redd.it/pm2xbooyxn2h1.png?width=1672&format=png&auto=webp&s=1a8a7b262190c8325159496dae98c3d2dfab493c https://preview.redd.it/b5z7ylfzxn2h1.png?width=1758&format=png&auto=webp&s=a07b3abd6e5065c2635de047bdf154357f903e4c A few things it is designed for: converting document images to Markdown extracting structured data from documents using a target json template handling tables, forms, and layout-heavy pages working with both text and visual document inputs serving as a local/open-weight alternative for document extraction pipelines It was trained on a node of 8xH100 for 3 days to train on as much context as we could, so it should perform fairly well even on long document. For Markdown, we'd still recommend going page by page for the best results and inference speed, since you can parallelize better this way. It's very easy to self-host, since we provide fairly extensive documentation, Safetensors, GGUF and MLX weights. With as little as 4GB of VRAM, you should be good to go. We provide multiple quantizations (GPTQ, W8A8, FP8, Q4, Q6...) so you should be able to run it anywhere. We mostly tried vLLM, SGLang, llama.cpp. We have a blog post and a pretty decent model card: https://about.nuextract.ai/blog/nuextract-3-release https://huggingface.co/numind/NuExtract3 https://huggingface.co/collecti

๐Ÿ“ฐ
r/MachineLearning Aggregators May 22, 2026
One thing that's been bothering me lately: benchmark performance often tells me almost nothing about whether a workflow will survive production usage.[D]

I've seen systems score well internally and then immediately fail under: ambiguous user intent messy real-world context contradictory instructions long-running sessions Feels like evaluation still โ€ฆ

I've seen systems score well internally and then immediately fail under: ambiguous user intent messy real-world context contradictory instructions long-running sessions Feels like evaluation still heavily rewards clean-task optimization instead of behavioral robustness. What are people using beyond standard eval pipelines?   submitted by   /u/Bladerunner_7_ [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 22, 2026
Live Human Detector on Outbound Phone Calls [R]

Goal To save humans wasting time sitting in Call Centre queues waiting to be answered To have tool listen in on the audio stream of a live call, post IVR Navigation - to determine whether the call haโ€ฆ

Goal To save humans wasting time sitting in Call Centre queues waiting to be answered To have tool listen in on the audio stream of a live call, post IVR Navigation - to determine whether the call has transitioned out of the queue and to a live person. Requirements The tool must be able to classify the audio within a sub 1-2 seconds contextual window with as high confidence level as possible. This is not a typical AMD tool, we are not just detecting machine audio vs human speech Assumed Challenges It may be difficult to determine between a pre-recorded RVA (Recorded Voice Announcement) and a human speaking. RVA typically are professionally recorded with distinct pitches and emotional queues, have clean audio with no background noise or silence before and after the message. This is not always the case, especially if announcements are recorded in house by the general staff. When a call is transitioning and 'Answered' there is usually a distinct soft click and or some background noise before the agent starts speaking. This silence period, whilst a good indication a call has been answered could be confused with quiet periods between music or RVA announcements in the queue. It may be difficult to determine if we have been answered by Voicemail - whilst there is usually a beep at the end, the message itself would also start with a silence period followed by audio sounding similar to an RVA. A single short beep tone could mean Voicemail, Answered or it could mean the call is being recorded Identifying we are in a queue based on TTS audio may be difficult to identify as TTS engines become more sophisticated Telephony or G711a is in the frequency band of 300โ€“3400 Hz @ 8000hz - 64 kbit/s Approach To train via machine leaning using labelled data, an audio classification application that analyses the acoustics, wav form or spectrograph (via Fast Fourier Transform) of the audio stream At this stage I do not want to use STT to determine the phase or label - Although this w

๐Ÿ“ฐ
r/MachineLearning Aggregators May 22, 2026
Novel Problems in VLA [R]

I'm currently doing a research internship and my supervisor is constantly pushing me to have a novel idea, I've read about 15-20 papers about VLA and I think that most of the things are saturated, I โ€ฆ

I'm currently doing a research internship and my supervisor is constantly pushing me to have a novel idea, I've read about 15-20 papers about VLA and I think that most of the things are saturated, I thought about an equivariant VLA based on equivariant CNN which was published in 2016 and successfully implemented that, and then I found that someone published that too, do you guys have any advice on what I should do next,? Any suggestions are welcome!   submitted by   /u/No_Mixture5766 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 21, 2026
Can liveness detection models generalise to synthetic media generation techniques they were never trained on? [D]

Most liveness detection systems in production today were built around a threat model where the attacker is submitting a static image or a basic replay video. The generation quality of current synthetโ€ฆ

Most liveness detection systems in production today were built around a threat model where the attacker is submitting a static image or a basic replay video. The generation quality of current synthetic media is categorically different from what those training datasets captured. The question I keep coming back to is whether a model trained on historical deepfake samples can generalise to generation techniques that did not exist when the training data was assembled. And if the answer is no, what does the update cycle look like for vendors claiming deepfake detection as a core capability. I asked two identity verification vendors this directly and got answers that sounded confident without addressing the temporal gap between training data and current generation quality.   submitted by   /u/Unique_Buy_3905 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 21, 2026
using .npy dataset with 3D models [R]

Hello guys , i am trying to work on ADNI dataset to get 90% accuracy , but it keeps getting stuck at 55%. any tip to improve results ?   submitted by   /u/LahmeriMohamed [link]   [comโ€ฆ

Hello guys , i am trying to work on ADNI dataset to get 90% accuracy , but it keeps getting stuck at 55%. any tip to improve results ?   submitted by   /u/LahmeriMohamed [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 21, 2026
Lisbon Machine Learning School (LxMLS 2026) [D]

Hi did anyone apply it, or attended it previously? How was the experience? I got the acceptance but no scholarship, is it worth going self sponsored?   submitted by   /u/Icy-Solid-4159 [lโ€ฆ

Hi did anyone apply it, or attended it previously? How was the experience? I got the acceptance but no scholarship, is it worth going self sponsored?   submitted by   /u/Icy-Solid-4159 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 21, 2026
I created an LLM post-training method called RPS. Preliminary results show that it improved Qwen3-8b's program synthesis reliability. [R]

RPS is inspired by neuroscience. As humans, we learn basic skills as kids with high neuro-plasticity. We then learn advanced skills as teens and adults with low neuro-plasticity. RPS trains a model iโ€ฆ

RPS is inspired by neuroscience. As humans, we learn basic skills as kids with high neuro-plasticity. We then learn advanced skills as teens and adults with low neuro-plasticity. RPS trains a model in 2 stages. In stage 1, the model is trained on easy data with high learning rate. In stage 2, the model is trained on hard data with 10% the learning rate of stage 1. RPS is basically a combination of existing ideas: curriculum learning + learning rate decay. ARC-AGI 1 public eval scores: base model: Qwen3-8b RPS: 4% EPS (equal learning rate in both stages): 2.4% Program Synthesis Stats: Program executions without error: RPS: 1145/1200 EPS: 870/1200 https://iamjasonfeng.blogspot.com/2026/05/regressive-plasticity-schedule.html https://github.com/iamjasonfeng/RPS   submitted by   /u/iamjasonfeng [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 21, 2026
Does this idea sound fun? [R]

It's about inference-time learning by inserting some experts specialized for updating sibling expert weights in MoE. All the components needed were already there, but no one tried it inside MoE, so Iโ€ฆ

It's about inference-time learning by inserting some experts specialized for updating sibling expert weights in MoE. All the components needed were already there, but no one tried it inside MoE, so I did a small PoC. It kinda worked. I'd love to hear what you think. https://zenodo.org/records/19661389   submitted by   /u/max6296 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 21, 2026
Do VLMs in production still use fixed-patch ViTs for their vision capabilities? [D]

The research community has provided (already for some time) seemingly more efficient and effective tokenizations for vision. Do we have any hint on whether non-fixed-patches tokenization is being appโ€ฆ

The research community has provided (already for some time) seemingly more efficient and effective tokenizations for vision. Do we have any hint on whether non-fixed-patches tokenization is being applied on the big player models? I imagine not, and I'm trying to think why: - marginal gains? - pipelines needing a fixed number of tokens per image upfront for efficiency reasons (or even harder limitations)? - scaling laws are not well understood for input-adaptive patching therefore big players do not bet on this? or am I simply totally wrong and under the hood all the big players are doing dynamic tokenization for vision?   submitted by   /u/howtorewriteaname [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 21, 2026
Looking for real world comparisons between WALL OSS pi0.6 and OpenVLA[D]

I am choosing a baseline for a real manipulation stack and trying not to lose a month on setup that someone here has already done. Shortlist is OpenVLA, pi0.6, and WALL OSS from X Square Robot. OpenVโ€ฆ

I am choosing a baseline for a real manipulation stack and trying not to lose a month on setup that someone here has already done. Shortlist is OpenVLA, pi0.6, and WALL OSS from X Square Robot. OpenVLA is still the easiest reference point with lots of reproductions. pi0.6 looks strong from recent public updates but I have not seen many fully transparent ablations. WALL OSS looks promising in LeRobot and I can run inference on UR5 plus parallel gripper without issues, around 70 ms on a 4090 in my local setup. What I need is less paper score discussion and more deployment reality. If you have run a controlled comparison on LIBERO or ManipArena style tasks, I would really value failure modes and data budget details. If you have fine tuned any of these on real hardware, which one was least painful on demonstration volume. If you run continuous updates, how often do you retrain and how bad is drift over a few weeks. I can post my own table once I finish, but if there is existing work I should read first that would save a lot of duplicated effort.   submitted by   /u/Dense-Sir-6707 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 21, 2026
Columbia Machine Learning Summer School (MLSS) 2026 [D]

I got into this CFE MLSS 2026 and would like to connect with people who also got into it or have been in previous cohorts! I am organizing a group chat for people who got into the program :DD https:/โ€ฆ

I got into this CFE MLSS 2026 and would like to connect with people who also got into it or have been in previous cohorts! I am organizing a group chat for people who got into the program :DD https://cfe.columbia.edu/content/mlss   submitted by   /u/elucidativemind [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 21, 2026
High E2E latency on fine-tuned Gemma 4 26B despite low TTFT [R]

Recently fine-tuned a Gemma 4 26B model, and Iโ€™m seeing surprisingly high end-to-end latency despite the effective inference footprint being much smaller (~4B-ish behavior during serving). Current seโ€ฆ

Recently fine-tuned a Gemma 4 26B model, and Iโ€™m seeing surprisingly high end-to-end latency despite the effective inference footprint being much smaller (~4B-ish behavior during serving). Current setup: Model: Gemma 4 26B (fine-tuned) Engine: vLLM Quantization: FP8 Hardware: H100 Observed latency: TTFT: ~100โ€“300 ms E2E latency: ~3โ€“5 seconds The TTFT seems reasonable, but the overall generation latency feels disproportionately high for the effective serving size. I already experimented with vLLMโ€™s n-gram speculative decoding, but honestly didnโ€™t see meaningful gains. Now Iโ€™m considering more serious speculative decoding approaches: EAGLE / Medusa-style methods Draft model based speculative decoding Possibly training a smaller Gemma draft model Curious to hear from others whoโ€™ve worked with Gemma 4 or large distilled/fine-tuned models: Is this kind of latency expected? What actually moved the needle for you? Any bottlenecks I should investigate first before going deeper into speculative decoding? Would love to hear experiences, benchmarks, or even horror stories :))   submitted by   /u/Ok-Rooster-8120 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 21, 2026
Masked Diffusion Language Models are Strong and Steerable Text-Based World Models for Agentic RL [R]

Autoregressive LLM world models factorize next-state generation left-to-right, preventing them from conditioning on globally interdependent anchors (tool schemas, trailing status fields, expected outโ€ฆ

Autoregressive LLM world models factorize next-state generation left-to-right, preventing them from conditioning on globally interdependent anchors (tool schemas, trailing status fields, expected outcomes) and yielding prefix-consistent but globally incoherent rollouts. MDLMs' any-order denoising objective sidesteps this by learning every conditional direction from the same training signal. Empirically, fine-tuned MDLMs (SDAR-8B, WeDLM-8B) surpass AR baselines up to 4x their total parameter count on BLEU-1, ROUGE-L, and MAUVE across in- and out-of-domain splits, with lower Self-BLEU and higher Distinct-N confirming reduced prefix mode collapse. GRPO training on MDLM-generated rollouts shows up to +15% absolute task-success gains over AR generated training on held-out ScienceWorld, ALFWorld, and AppWorld across 1.2Bโ€“7B backbones (LFM2.5, Qwen3, Mistral) in a zero-shot transfer setting.   submitted by   /u/MegixistAlt [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 21, 2026
l9gpu - open-source GPU observability with workload-level attribution [P]

GPU monitoring tools like DCGM give you hardware-level metrics but no workload context. When a node is saturated, you can't tell which experiment, team, or job is responsible without digging through โ€ฆ

GPU monitoring tools like DCGM give you hardware-level metrics but no workload context. When a node is saturated, you can't tell which experiment, team, or job is responsible without digging through logs. We built l9gpu to close that gap. It's a node-level agent that exports GPU metrics via OTLP with workload attribution embedded: - Kubernetes: correlates GPU metrics with pod, namespace, and deployment - Slurm: correlates with job ID, user, and partition - LLM inference: native metrics for vLLM, SGLang, and TGI - Hardware: NVIDIA, AMD MI300X, Intel Gaudi - 17 pre-built Prometheus alert rules + Grafana dashboards Derived from Meta's gcm project, extended with K8s attribution, multi-vendor GPU support, and OTLP export. MIT licensed. https://github.com/last9/gpu-telemetry Happy to discuss design decisions around the attribution mapping. What is the ML infra community using for GPU cost visibility in shared research clusters?   submitted by   /u/bakibab [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 20, 2026
roadmap machine learning [D]

hello i would like to start learning machine learning where and what resources need to take? i came from cybersecurity so i know a bit of python what courses should i take and in where someone can heโ€ฆ

hello i would like to start learning machine learning where and what resources need to take? i came from cybersecurity so i know a bit of python what courses should i take and in where someone can help me?   submitted by   /u/Gold_Chemistry8851 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 20, 2026
OpenAI claims a general-purpose reasoning model found a counterexample to Erdos's unit-distance bound [D]

OpenAI posted a math result today claiming that one of its general-purpose reasoning models found a construction disproving the conjectured n^{1+O(1/log log n)} upper bound in Erdล‘sโ€™s planar unit-disโ€ฆ

OpenAI posted a math result today claiming that one of its general-purpose reasoning models found a construction disproving the conjectured n^{1+O(1/log log n)} upper bound in Erdล‘sโ€™s planar unit-distance problem. Announcement: https://openai.com/index/model-disproves-discrete-geometry-conjecture/ Proof PDF: https://cdn.openai.com/pdf/74c24085-19b0-4534-9c90-465b8e29ad73/unit-distance-proof.pdf Abridged reasoning writeup: https://cdn.openai.com/pdf/1625eff6-5ac1-40d8-b1db-5d5cf925de8b/unit-distance-cot.pdf The mathematical claim, as I understand it, is that there are finite planar point sets with more than n^{1+ฮด} unit distances for some fixed ฮด > 0 and infinitely many n. That would rule out the expected near-linear upper bound, though it does not determine the true asymptotic growth rate. What seems especially relevant for this subreddit is the process claim: OpenAI says the solution was produced by a general-purpose reasoning model, then checked by an AI grading pipeline and reviewed/reworked by mathematicians. The proof PDF also includes the original prompt given to the model, but not the full experimental details: no model name, sampling setup, number of attempts, compute budget, hidden system prompt, or full grading pipeline. Curious how people here read this as an ML result. Is this best viewed as evidence of frontier models doing genuine autonomous research, or as a cherry-picked but still important sample from a large search process? What kind of disclosure would you want before treating this as a reproducible AI-for-math milestone?   submitted by   /u/NutInBobby [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 20, 2026
LLMs and Emojis [D]

LLMs are trained on human data, so where does the tendency to add emojis come from? For example, when some models generate code explanations or even normal responses, they often add lots of emojis thโ€ฆ

LLMs are trained on human data, so where does the tendency to add emojis come from? For example, when some models generate code explanations or even normal responses, they often add lots of emojis that people donโ€™t really use that way in real life. My current guess (without having researched this yet) is that emojis might sometimes be added after the initial generation process, maybe during post-processing, alignment, or some โ€œreasoning/thinkingโ€ stage, rather than being part of the raw generated response itself. Because intuitively, an emoji doesnโ€™t really behave like a normal word/token inside a sentence or code block.   submitted by   /u/Zoldyck_J [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 20, 2026
What are your experiences regarding current PhD admissions? [D]

Hi, how hard is it currently to get a PhD position in machine Learning? Like what are the requirements to get to a decent mid tier program (= they publish regularly at respected journals and their wโ€ฆ

Hi, how hard is it currently to get a PhD position in machine Learning? Like what are the requirements to get to a decent mid tier program (= they publish regularly at respected journals and their work gets read my some people)? How is it in different regions e.g US, Europe, etc.. I am about to finish my masters and am wondering if I need to sweep in an unpaid guided research project to extend my network.   submitted by   /u/strammerrammer [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 20, 2026
How competitive are PhD admissions currently [D]

Hi, how hard is it currently to get a PhD position in machine Learning? Like what are the requirements to get to a decent mid tier program (= they publish regularly at respected journals and their wโ€ฆ

Hi, how hard is it currently to get a PhD position in machine Learning? Like what are the requirements to get to a decent mid tier program (= they publish regularly at respected journals and their work gets read my some people)? How is it in different regions e.g US, Europe, etc.. I am about to finish my masters and am wondering if I need to sweep in an unpaid guided research project to extend my network.   submitted by   /u/strammerrammer [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 20, 2026
Should I accept a PhD offer in NeuroAI [D]

Hi everyone. I am recent CS grad and I have received a PhD offer from a school in states. However I am deeply confused if I should accept it or not. My hesitation comes from the interdisciplinary natโ€ฆ

Hi everyone. I am recent CS grad and I have received a PhD offer from a school in states. However I am deeply confused if I should accept it or not. My hesitation comes from the interdisciplinary nature of the program. It will be jointly supervised by the two professors, one from biomedical and one from ML domain. I always wanted to work on the foundational aspect of the AI and to publish in A* conferences in AI, so I am not sure if it is a right choice. The other option for me is to wait and work on enhancing my profile. Get another paper or two published in respected venues and apply again. I have a decent profile, with couple of internships and research papers, and >90% cgpa. Moreover, I believe I can do foundational stuff much better than the applied one so my biggest fear is that I accept the offer and later get to know that the AI part is very trivial and minimal. It might lead to the mental frustration and lower productivity. What should I do in this case? If anyone has been a part of such a interdisciplinary programs, please do share your experience. Thanks!   submitted by   /u/ProfessionalDue369 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 20, 2026
Splitting data by label for FAISS [P]

If I have a labeled dataset Is it possible to split my data by label where each chunk is the sentences of one label and then use this to be able to label more sentences. And is this even a good idea โ€ฆ

If I have a labeled dataset Is it possible to split my data by label where each chunk is the sentences of one label and then use this to be able to label more sentences. And is this even a good idea for data labeling where I search for this certain sentence and see what the label lf the result I got is and I label my sentence as such.   submitted by   /u/Ok-Buffalo-8655 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 20, 2026
under 2% quality gap but 10x cost difference: tested 5 models on identical tool calling tasks[D]

I've been running a file management agent built on MCP for a few months. It handles module renames, import updates, validation scaffolding, test execution. A typical session is 60 to 120 tool calls. โ€ฆ

I've been running a file management agent built on MCP for a few months. It handles module renames, import updates, validation scaffolding, test execution. A typical session is 60 to 120 tool calls. The whole thing was powered by Opus 4.7 because I never thought to question it until I looked at my April bill. So I set up a comparison. Eight refactoring tasks on a 15k line Python project, same MCP tools, same system prompt, same repo state, five models. Tasks were things like "rename this module and fix all imports" and "add input validation to these 12 endpoints." Routine cleanup, nothing requiring deep architectural thought. The metric I cared about was first attempt tool call success: did the model produce a valid function call that executed without a parse error on the first try? On the expensive end, Opus 4.7 hit roughly 98 to 99 percent across a bit over 500 calls and cost close to $15 for all eight tasks. GPT 5 was similar quality for around $11. The cheaper tier surprised me. Sonnet 4.6 landed somewhere around 96 percent for about $4. DeepSeek V4 Pro was in the same neighborhood for under $2. And Tencent Hunyuan Hy3 preview came in within a couple of points of Opus for under $1.50. Under two percentage points separating the priciest model from the cheapest, on tasks where a failed call just gets retried anyway. I'll be honest, the results were anticlimactic. I expected a bigger reliability gap. I actually spent half a day debugging what I thought was a quality issue with one of the MoE models before realizing I'd misconfigured the tool call schema in my system prompt. Every call was producing malformed JSON and I blamed the model. Classic. The model is a 295B parameter MoE with 21B active per token, so full BF16 weights are around 590GB. The official deployment path is vLLM or SGLang on something like eight H200 class GPUs, which is not exactly homelab territory. But the 4 bit quantized weights land around 165GB, which just fits in unified

๐Ÿ“ฐ
r/MachineLearning Aggregators May 20, 2026
Any tool to get accepted conference papers sorted by citation count? [D]

Ie given a conference (say with openreview data) eg โ€œNeurIPS, 2025โ€, return the accepted papers based on number of citations according to standard paper search engine (eg google scholar) Seems to be โ€ฆ

Ie given a conference (say with openreview data) eg โ€œNeurIPS, 2025โ€, return the accepted papers based on number of citations according to standard paper search engine (eg google scholar) Seems to be a surprisingly difficult thing to find online.   submitted by   /u/baghalipolo [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 20, 2026
NOML-NOML: hierarchical TD3 + anchor policy for flight control [P]

I built a custom RL algorithm for continuous flight control and open-sourced it. Sharing here in case the structural ideas are useful for anyone doing continuous control where one action axis dominatโ€ฆ

I built a custom RL algorithm for continuous flight control and open-sourced it. Sharing here in case the structural ideas are useful for anyone doing continuous control where one action axis dominates. I've been training continuous control on a 6-DoF flight sim (pitch/roll/yaw/throttle/brake/fire) and kept hitting the same wall: vanilla TD3 would peak, then collapse into pitch oscillation and never recover. I tried reward shaping for a while before concluding the problem was structural, not in the reward. NOML is what came out of that. Three structural changes on top of a standard TD3 skeleton: Anchor policy โ€” the action is anchor + deltaยทgate, where the anchor is a fixed safe action (wings level, MIL throttle). The policy literally cannot fully forget how to fly straight; the worst a collapsed policy can do is fall back to the anchor. Hierarchical actor โ€” three MLPs with independent optimizers (pitch โ†’ roll โ†’ rest), so a roll-side gradient update can't corrupt the pitch head. This is what actually killed the oscillation for me. Mirror learning โ€” left-right symmetry means every transition can be mirrored into a free second sample. 2ร— data when env steps are the bottleneck. One thing that surprised me and goes against the usual advice: my best results came with exploration noise effectively off. On this task adding Gaussian action noise mostly just shook the stick and hurt. The anchor+gate structure seems to provide enough of the "fall back to safe behavior" role that noise usually plays. Code (Apache 2.0), full writeup, and a test video are here: https://github.com/9138noms/NOML https://www.youtube.com/watch?v=ZNn6wo_PX8Y   submitted by   /u/9138NOMS [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 20, 2026
CANTANTE: Optimizing Agentic Systems via Contrastive Credit Attribution [R]

LLM-based multi-agent systems have demonstrated strong performance across complex real-world tasks, such as software engineering, predictive modeling, and retrieval-augmented generation. Yet, automatโ€ฆ

LLM-based multi-agent systems have demonstrated strong performance across complex real-world tasks, such as software engineering, predictive modeling, and retrieval-augmented generation. Yet, automating their configuration remains a structural challenge. Researchers are often forced into manual, trial-and-error prompt tuning, where a change to a single agent shifts the global output in ways that are difficult to trace. The core bottleneck is credit assignment: while the parameters governing agent behavior are local, performance scores are only available at the global system level. This makes optimization fundamentally difficult because we do not inherently know which agents contributed positively or negatively to the outcome. CANTANTE is an attempt to take a different path: treating agent prompts as parameters learned from task rewards rather than tuned by hand. By solving the credit assignment problem, we can move from brittle, hand-crafted agent demos to trustworthy systems that are actually autonomous and useful in practice. CANTANTE's algorithm in short (see second image): Let local optimizers suggest configurations (e.g., prompts). Evaluate different configurations on the same queries, capturing reasoning traces and system scores. Let an attributer compare these rollouts and assign each agent a credit, thereby decomposing the global reward into per-agent update signals. Feed those credits to any local optimizer; for the experiments, we use CAPO, our prompt optimizer from prior work at AutoML 2025. Evaluated against the DSPy-solutions GEPA and MIPROv2 on MBPP (Programming Benchmark), GSM8K (Mathematical Reasoning Benchmark), and HotpotQA (Retrieval Benchmark), CANTANTE: โ€ข Achieves the best average rank, โ€ข beats the strongest baseline by +18.9 points on MBPP and +12.5 on GSM8K, and โ€ข maintains inference time cost compared to unoptimized prompts. ๐Ÿ”— Link to the paper: https://arxiv.org/abs/2605.13295 ๐Ÿ’ป Link to the repo: https://github.com/finitearth/cantante If y

๐Ÿ“ฐ
r/MachineLearning Aggregators May 20, 2026
Machine Learning on Spherical Manifold [R]

Hi, I'm interested in geometric deep learning (due to Michael M. Bronstein's book and Maurice Weiler's PhD thesis), and in order not to write projects to nowhere, I decided to keep a technical blog. โ€ฆ

Hi, I'm interested in geometric deep learning (due to Michael M. Bronstein's book and Maurice Weiler's PhD thesis), and in order not to write projects to nowhere, I decided to keep a technical blog. I started with a short note about machine learning on spherical manifolds, but it's a pretty simple thing. Is there a list of some open problems on the topic of GDL, or maybe some of you are doing something in this direction and can suggest which GDL problems are relevant in the research community.   submitted by   /u/eesuck0 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 20, 2026
Instructions for (ICML) workshop reviews [D]

Hi, I am being reviewer for an ICML workshop; however, there are no guidelines on the structure of the reviews (e.g. what are the criteria, what is the grade scale, etc.). Does anyone know whether ICโ€ฆ

Hi, I am being reviewer for an ICML workshop; however, there are no guidelines on the structure of the reviews (e.g. what are the criteria, what is the grade scale, etc.). Does anyone know whether ICML workshops have some "convention" regardings reviews? Or do we ought to use the icml's reviewer instruction (https://icml.cc/Conferences/2026/ReviewerInstructions)?   submitted by   /u/Ok-Painter573 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 19, 2026
ICML Proceedings-only [D]

For proceedings-only papers, do we need to make a poster and submit it to the portal? Has anyone asked this question to ICML Program Chair?   submitted by   /u/minhquang251 [link]   [โ€ฆ

For proceedings-only papers, do we need to make a poster and submit it to the portal? Has anyone asked this question to ICML Program Chair?   submitted by   /u/minhquang251 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 19, 2026
[ECCV 2026] No modified date next to reviews [D]

On Openreview, you can see modified date next to the review. This modified date should be recent (anything 12th May or newer) which means that reviewer gave a final justification and may have increasโ€ฆ

On Openreview, you can see modified date next to the review. This modified date should be recent (anything 12th May or newer) which means that reviewer gave a final justification and may have increased their score or kept the same score. In either case, it means they read the rebuttal and justified their score and decision. For me none of the reviewers as of writing this post has provided justification. My score is 433 and all was easily addressed in the rebuttal. In CVPR, I was in same position where none of the reviewers justified their decision and the AC simply said "concerns remain" even though it was clearly answered in the rebuttal and rejected the paper.   submitted by   /u/Healthy_Horse_2183 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 19, 2026
Comparing data annotation platforms [D]

Scale AI Highest quality in the industry. But no public pricing and every project requires a sales call. Onboarding takes weeks not days. In June 2025 Meta bought a 49% stake and hired Scaleโ€™s CEO asโ€ฆ

Scale AI Highest quality in the industry. But no public pricing and every project requires a sales call. Onboarding takes weeks not days. In June 2025 Meta bought a 49% stake and hired Scaleโ€™s CEO as Metaโ€™s Chief AI Officer. Several major customers quietly reduced engagements over data exposure concerns. Worth thinking about if youโ€™re building anything competitive with Meta. Best for: well-funded teams with enterprise security requirements and long timelines. Appen Over 1 million contractors across 170 countries. Sounds impressive until you realize it was built for massive long-term projects. Small teams consistently report it being slow and inflexible for novel tasks. Low contractor pay rates also raise real questions about annotation quality. Best for: high volume, low complexity, multilingual tasks. CloudFactory Trained dedicated teams and ethical sourcing. More consistent than the giants. Still not self-serve though and onboarding takes time. Project management quality varies depending on which team you get. Best for: structured projects with clear requirements and no time pressure. LabelBox Best annotation software on the market. The catch is itโ€™s a platform not a workforce. You still need to find and manage your own annotators. Powerful if you have an internal team. Not useful if you donโ€™t. Best for: teams building long-term internal annotation infrastructure. The problem!! Every major platform is optimized for enterprise scale. None of them are built for teams that need 500-2000 examples labeled fast, with domain expertise, and full transparency into whoโ€™s doing the work. What are you currently using for annotation work?   submitted by   /u/Neil-Sharma [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 19, 2026
I built a tool that shows you what GPT-2 is "thinking" in real-time as it generates 3D graph of concept activations per token [R]

Been going down a mechanistic interpretability rabbit hole for the past few weeks and ended up building this thing called AXON. The idea: every time GPT-2 generates a token, its residual stream gets โ€ฆ

Been going down a mechanistic interpretability rabbit hole for the past few weeks and ended up building this thing called AXON. The idea: every time GPT-2 generates a token, its residual stream gets passed through a Sparse Autoencoder (Joseph Bloom's pretrained SAE). The SAE decomposes it into human-interpretable feature: hings like "European geography", "capital cities", "French language" and streams those to the browser over WebSocket, where they show up as a live 3D force graph. Nodes = SAE features. Edges = features that fired together on the same token. Node brightness = activation strength. The whole graph evolves token by token. What surprised me most: type "The capital of France is" and you can literally watch geography features, proper noun features, and completion-pattern features light up before the word "Paris" even gets generated. It's not what the model outputs that's interesting it's what's happening right before it decides. Stack: TransformerLens + SAELens on the backend, FastAPI WebSocket for streaming, Three.js + 3d-force-graph on the frontend. Runs on CPU (~800ms/token) or GPU (~35ms on a 4050). Labels come from Neuronpedia's API and get cached locally. You can also swap in other models โ€” GPT-2 medium/large/xl, Pythia variants, Gemma-2-2B โ€” as long as there's a pretrained SAE for it in SAELens. GitHub: https://github.com/09Catho/axon Would love feedback and stars especially from anyone who's worked with SAEs before curious whether the co-activation edges are actually meaningful or just noise at this layer.   submitted by   /u/Financial_World_9730 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 19, 2026
LxMLS 2026 decision [D]

Has anyone applied to Lxmls 2026? Did you get any update?   submitted by   /u/No_Cardiologist7609 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 19, 2026
Backprop-free Pong: PC + distributional Hebbian plasticity vs. PPO: 57% vs. 59%, ~1500 lines from scratch [P]

Wanted to see how close a fully bio-plausible agent could get to PPO on Pong. Setup Custom Pong environment (pygame, no gym) PPO baseline: paper-faithful, from scratch Hebbian agent: PPO policy replโ€ฆ

Wanted to see how close a fully bio-plausible agent could get to PPO on Pong. Setup Custom Pong environment (pygame, no gym) PPO baseline: paper-faithful, from scratch Hebbian agent: PPO policy replaced with Hebbian value estimation engineered features โ†’ 61% BioAgent: Predictive Coding for feature learning + distributional Hebbian plasticity for value (Dabney et al. 2020) โ†’ 57% Zero backprop anywhere in the pipeline. Key observations The 2% gap is real but small. The bottleneck wasn't the lack of backprop because it was catastrophic forgetting under non-stationary opponent dynamics during self-play. Distributional value encoding (ร  la Dabney) helped stability vs. a scalar Hebbian baseline, but not enough to match PPO under self-play. Self-play exposed the plasticityโ€“stability dilemma hard: Hebbian rules that adapt fast forget fast. This is the real wall for bio-plausible RL in non-stationary settings. Not claiming novelty in the architecture as this is a from-scratch exploration of whether bio-plausible rules can handle a real RL task. Short answer: yes, mostly, with one clear failure mode. Code: github.com/nilsleut/Biologically-Plausible-RL-Plays-Pong Happy to answer questions about the PC implementation, the Hebbian value estimator, or the self-play setup.   submitted by   /u/ConfusionSpiritual19 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 19, 2026
xAI just sold its entire flagship data center to Anthropic. That's not what frontier AI labs do. [N]

Anthropic is buying all 300 megawatts of compute at xAI's Colossus 1 facility in Tennessee for billions of dollars. Musk says xAI already moved training to Colossus 2 and didn't need both. Fine, thatโ€ฆ

Anthropic is buying all 300 megawatts of compute at xAI's Colossus 1 facility in Tennessee for billions of dollars. Musk says xAI already moved training to Colossus 2 and didn't need both. Fine, that's plausible on its face. But the part I keep coming back to is what it says about Grok's actual compute consumption. Every serious AI lab treats compute as a strategic asset you accumulate, not sell. Google, Meta, and Microsoft are building more data center capacity even while actively running training runs, because the assumption is you will always need more for the next model, the one after that, and the inference load that follows. You don't sell 300 MW to a direct competitor unless that capacity is genuinely sitting underutilized. And if Grok were burning through it, it wouldn't be sitting underutilized. There was also a pretty visible drop in Grok usage after the image generation controversies earlier this year. None of that is confirmed internally, but the circumstantial case that Colossus 1 wasn't running hot is pretty strong when you piece it together. Renting to Anthropic generates cash and a headline, but it's the business model of a neocloud, not a frontier lab. xAI is valued at $230 billion. CoreWeave runs roughly comparable compute infrastructure and rents it to AI labs. CoreWeave's valuation is less than a third of that. If xAI keeps moving in this direction, the implied premium over a compute rental business needs a pretty compelling explanation. Curious whether people here see this as a one-off liquidity move or something that changes how you think about xAI's actual AI roadmap. https://www.idlen.io/news/anthropic-spacex-colossus-memphis-300mw-claude-limits-2026/?utm_source=chatgpt.com   submitted by   /u/peachforbreakfast [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 19, 2026
What do you think about Tabular Foundation Models [D]

I've seen TabPFN-3's recent results, and there is a lot of buzz about foundation models for tabular data (TabICL, TabPFN). The performance that those models achieve is really amazing. What makes me aโ€ฆ

I've seen TabPFN-3's recent results, and there is a lot of buzz about foundation models for tabular data (TabICL, TabPFN). The performance that those models achieve is really amazing. What makes me a little suspicious about them? They can analyze small datasets only, so a few MB of data, and you need to have a large GPU machine and download a few GB of model to predict on a few MB of data. That doesn't sound rational ... I really miss the old school approach of running a single decision tree or a linear model on the data. What do you think about it? Do you think feature engineering + classic ML can achieve performance comparable to that of foundation models? Maybe with better explainability?   submitted by   /u/pplonski [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 19, 2026
Graph spectral analysis (Fiedler value + Scheffer CSD indicators) predicts grokking 21k steps before loss function - five reproducible experiments [R]

I've been applying the Fiedler value (second-smallest eigenvalue of the weight graph Laplacian) combined with Scheffer critical slowing down indicators to monitor neural network topology during trainโ€ฆ

I've been applying the Fiedler value (second-smallest eigenvalue of the weight graph Laplacian) combined with Scheffer critical slowing down indicators to monitor neural network topology during training. Five experiments, all reproducible on CPU in under 24 hours: Detection: lambda-2 detects approaching grokking 21,000 steps before test accuracy moves Classification: grokking and catastrophic forgetting have distinct structural fingerprints (slope 0.00128 vs 0.00471/step) Steering: structurally-guided intervention preserves 91.7% of knowledge vs 2.6% unsteered Compounding: three sequential tasks, 100%/100%/97.5% retention, 48x grokking acceleration across tasks Preemptive curriculum: compatibility scoring ranks task disruption risk correctly, bridging preserves 100% vs 0% direct Tested on 2-layer MLPs (modular arithmetic) and 1-layer transformer (sequence prediction). Honest limitations section in the paper. These are toy tasks and scaling to production architectures is unvalidated. The approach comes from complex systems science (Scheffer's early warning indicators for critical transitions) applied to weight graphs rather than ecosystems or financial markets. Code and paper: https://github.com/EssexRich/neural_si_validation Happy to discuss the maths, the experimental design, or the limitations.   submitted by   /u/RichBenf [link]   [comments]

r/MachineLearning Aggregators May 19, 2026
All fundamental knowledge in ML Course by Andrew NG that I noted and create into a repo github [R]

https://preview.redd.it/mikhasjiq32h1.png?width=572&format=png&auto=webp&s=4c053200dbd9852bebf083550e2144b31579d497 https://preview.redd.it/bay5r3njq32h1.png?width=575&format=png&โ€ฆ

https://preview.redd.it/mikhasjiq32h1.png?width=572&format=png&auto=webp&s=4c053200dbd9852bebf083550e2144b31579d497 https://preview.redd.it/bay5r3njq32h1.png?width=575&format=png&auto=webp&s=2823db3d6bc534ef00330528a200cba2aca1c5d3 https://preview.redd.it/dm40ntdkq32h1.png?width=575&format=png&auto=webp&s=703beb099eb6e16d2789ac230ebe77de51f07d7a https://preview.redd.it/eubucz2lq32h1.png?width=575&format=png&auto=webp&s=fb5a8d9a7154396087da33487674cda785d2a62a https://preview.redd.it/0xo3t83nq32h1.png?width=586&format=png&auto=webp&s=a569ae89c44953a5bc9aff6fbb37d25759109dd1 I've just finished the Machine Learning Specialization by Andrew Ng , and as I was going through it, I ended up writing detailed lecture notes for all 10 chapters โ€” everything from linear regression all the way to reinforcement learning. I put a lot of effort into making these notes as clear and friendly as possible, so even if you're completely new to ML, you should be able to follow along without getting lost. The notes are written in LaTeX and auto-compiled to PDF via GitHub Actions whenever I push an update, so the PDF is always up to date. ๐Ÿ”— GitHub: https://github.com/TruongDat05/machine-learning-notes-and-code   submitted by   /u/Far_Extreme_9737 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 19, 2026
How does loss functions work in PINN? [D]

I am learning Physics informed neural network (PINN). I am playing with simple 1rst/2nd 1D ODEs and I am calculating the loss functions by adding the initial condition loss and Physics loss (e.g. Totโ€ฆ

I am learning Physics informed neural network (PINN). I am playing with simple 1rst/2nd 1D ODEs and I am calculating the loss functions by adding the initial condition loss and Physics loss (e.g. Total loss = lambda1 (L1) * Physics_loss (PL) + lambda2 (L2) * IC_loss (IL)). Regardless of the magnitude of the loss and lambda values, the total loss is a single numeric a value. How does the neural network model predicts if I impose higher weights (lambda) for one of the losses. For instance, lets say, PL = 5, IC_Loss = 3, L1 = 0.6 ,L2 = 1, then total loss = 6. However, this values 6 can be achieved through several other combinations. For instance, L1 = 1 and L2 = 0.33 would result in a similar value. Given this, how the model actually learns which losses are given more weightage, which are not, and uses this information to correct its predictions?   submitted by   /u/cae_shot [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 19, 2026
Feeling lost while trying to break into AI/ML how should I focus my projects? [D]

Iโ€™m trying to break into AI/ML Engineer / Applied AI roles, and honestly Iโ€™ve been feeling pretty overwhelmed lately. Iโ€™ve been building around LLM evaluation, model reliability, cost optimization, aโ€ฆ

Iโ€™m trying to break into AI/ML Engineer / Applied AI roles, and honestly Iโ€™ve been feeling pretty overwhelmed lately. Iโ€™ve been building around LLM evaluation, model reliability, cost optimization, and production AI systems. My main projects are: RDAB โ€” a benchmark for evaluating LLM data agents beyond just correctness, including code quality, efficiency, and statistical validity. CostGuard โ€” an LLM reliability/cost proxy that tracks model cost, applies fallback logic, does lightweight response checks, and supports replay-based model comparison. Tether โ€” a trace capture layer that records LLM calls so they can be replayed against alternate models to compare quality and cost. The overall idea is: capture real LLM traffic โ†’ replay it against another model โ†’ compare quality, cost, and reliability before switching models. But Iโ€™m struggling with how to package this clearly. I feel like Iโ€™ve built a lot, but Iโ€™m not sure what hiring managers actually care about or what would make this stand out in a competitive market. Right now Iโ€™m thinking of focusing everything around one story: โ€œCan a cheaper LLM replace an expensive one without silently hurting quality?โ€ Then use CostGuard as the flagship project, with RDAB as the benchmark layer and Tether as the trace-capture layer. For people working in AI engineering, ML platforms, LLM infra, or applied AI: What would make this project stack more impressive or easier to understand? Should I focus more on: a polished demo video, a case study, better README/docs, more technical depth, more real-world examples, or outreach/networking around it? Any honest guidance would help. Iโ€™m trying to turn this into something that clearly shows production AI engineering ability, not just another AI demo   submitted by   /u/Fit_Fortune953 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 19, 2026
A Simple Solution to Improve Broken Peer Review System at AI Conferences [R]

An issue with the peer review system is reciprocal reviewing, which incentivizes reviewers to unfairly reject good papers to increase their own papers' chances of acceptance. My proposed solution is โ€ฆ

An issue with the peer review system is reciprocal reviewing, which incentivizes reviewers to unfairly reject good papers to increase their own papers' chances of acceptance. My proposed solution is that the conference should divide the authors/papers into 2 halves (A and B). If you are an author in half A, then you will only be a reviewer in half B. All papers by the same author, their coauthors, and coauthors of coauthors should be in the same half. Each AC/SAC can only serve in one half and acceptance decisions for the two halves would be independent. So reciprocal reviewers will not have incentive to reject good papers to serve themselves. Furthermore, the discussion period for the two halves should not be concurrent. This way the reciprocal reviewer will have sufficient time to discuss author rebuttals as they will not have to deal with their own papers concurrently. Maybe the first 2 weeks can be the discussion period for half A, and the next two weeks for half B. I don't think conference organizers have thought of this solution, because if they have, there is no excuse for not trying to implement it because it does not hurt the conference's self-interest in any way. Does anyone think this will work? If so, I hope someone of more power than me might ask the conferences to implement it.   submitted by   /u/isentropiccombustor [link]   [comments]

r/MachineLearning Aggregators May 19, 2026
How to get rejected by IEEE T-PAMI with 'Excellent' scores?[D]

Hello everyone. I am keeping my identity anonymous today to protect my professional career. I am a junior researcher in Computer Vision, and I am sharing this story because I have hit a devastating dโ€ฆ

Hello everyone. I am keeping my identity anonymous today to protect my professional career. I am a junior researcher in Computer Vision, and I am sharing this story because I have hit a devastating deadlock with IEEE T-PAMI and the IEEE Ethics Office. Our Situation: https://preview.redd.it/v0w62gzmn02h1.png?width=2000&format=png&auto=webp&s=a2d75a1e3a388debdf5b163cb9593c1f7f1c49d5 In the decision letter, we actually received three highly positive reviews (Two EXCELLENT, One GOOD). However, the AE rejected the paper by quoting comments from a "4th" reviewer. The most staggering part: We later accidentally met the actual 4th reviewer. He CONFIRMED having submitted a POSITIVE review, which was strangely withdrawn by the editor in the backend before the final decision was made. We have formally requested the IEEE (and Computer Society) to thoroughly investigate this issue, specifically asking them to check AE's backend activity logs in the submission system. However, half a year has passed, and we have received no direct response. Has anyone experienced something similar with IEEE or other top venues? Any advice or help bringing visibility to this would be greatly appreciated. Evidence: Below is the report to IEEE Ethics (identifying information has been covered): https://preview.redd.it/e41vt2rsn02h1.png?width=3508&format=png&auto=webp&s=b2ee2d3f092dad5e20b45b9daeea7fa7b6f01d20 https://preview.redd.it/t29n03rsn02h1.png?width=3508&format=png&auto=webp&s=67aa6bc36aed76617af34e7913a203f9236bc536 https://preview.redd.it/6v5ys2rsn02h1.png?width=3508&format=png&auto=webp&s=f2452998f57f1b157d71b569dd5ff87e4d3d0b6c https://preview.redd.it/epdxv2rsn02h1.png?width=3508&format=png&auto=webp&s=d01da8cdf9e3f6cd5be53f884b02b154f86d0b48 https://preview.redd.it/fuw3k3rsn02h1.png?width=3508&format=png&auto=webp&s=03e75f763a54429758102da4933af53511642e7d https://preview.redd.it/xn0ze3rsn02h1.png?width=3

๐Ÿ“ฐ
r/MachineLearning Aggregators May 19, 2026
First-time ICML workshop acceptance (GlobalSouthML) but can't afford to travel to South Korea. What are my options? [D]

Hey everyone, Iโ€™m an undergrad from India and I just found out I had two papers accepted at the ICML 2026 GlobalSouthML workshop! I am super excited since this is my first time getting accepted into โ€ฆ

Hey everyone, Iโ€™m an undergrad from India and I just found out I had two papers accepted at the ICML 2026 GlobalSouthML workshop! I am super excited since this is my first time getting accepted into a major conference venue, but Iโ€™m also kind of panicking right now because I absolutely cannot afford a trip to Seoul. Since I've never done this before, Iโ€™m hoping some experienced folks can help answer a few questions about how the post-acceptance process works: I saw that the main conference has a "Virtual Pass." Is that enough to keep my papers in the workshop program? ICML rules make it sound like someone must be there in person. If neither me nor my co-authors can afford the flight to South Korea, will our accepted papers just get withdrawn? Does ICML or the GlobalSouthML workshop specifically offer financial aid for undergrads? Should I email the organizers about this before I attempt to register? I saw some mentions of ICML Financial Aid online, but it looked like it might only cover hotels and registration, not the flights. How does submitting the final version actually work? Do the organizers email a specific form, or do I just upload a new PDF revision directly to my OpenReview portal? Also, since GlobalSouthML is a non-archival workshop, what exactly am I submitting, just the updated PDF addressing the reviewers' comments? Any advice on how to navigate this would be hugely appreciated! Thank you!   submitted by   /u/Material_Dinner_1924 [link]   [comments]

๐Ÿ“ฐ
r/MachineLearning Aggregators May 19, 2026
Need reliable source for 30+ years of S&P 500 historical data for LSTM/Transformer research [P]

Hi everyone, I'm starting a research project on financial time-series forecasting using LSTM and Transformer models for predicting S&P 500 market direction. Right now, I'm struggling with obtainiโ€ฆ

Hi everyone, I'm starting a research project on financial time-series forecasting using LSTM and Transformer models for predicting S&P 500 market direction. Right now, I'm struggling with obtaining reliable long-term historical data. I tried Yahoo Finance, but downloads are inconsistent/failing for me, and most Kaggle datasets I found only contain around 5โ€“10 years of data. I specifically need: Around 30 years of historical S&P 500 data Preferably daily OHLCV data Reliable and clean source suitable for ML research Ideally free or student-friendly I also want to understand what researchers typically use in academic work for financial forecasting: Yahoo Finance? Alpha Vantage? WRDS/CRSP? Polygon? Kaggle? Something else? Additionally: Is using only S&P 500 index data enough for a Master's level research project? Or should I include technical indicators, macroeconomic data, sentiment, or constituent stock data? Would appreciate guidance from people who've actually worked on financial ML projects. Thanks.   submitted by   /u/stickPotatoe [link]   [comments]