13 articles from r/MachineLearning
Paper: https://arxiv.org/abs/2605.08172 Workshops: AI for Science & Structured Data for Health at ICML 2026 Abstract: Anatomical mesh segmentation requires models that operate directly on irreg…
Paper: https://arxiv.org/abs/2605.08172 Workshops: AI for Science & Structured Data for Health at ICML 2026 Abstract: Anatomical mesh segmentation requires models that operate directly on irregular surface geometry while remaining robust to arbitrary patient pose and mesh resolution variation. Existing task-specific mesh and point-cloud methods are not equivariant, and can degrade sharply under test-time perturbation, for example dropping by 25-26 IoU points on intraoral scan segmentation at 40o tilt. We present EAMS, an Equivariant Anatomical Mesh Segmentor built on Equivariant Mesh Neural Networks (EMNN), and evaluate it across four clinically distinct tasks spanning edge-, vertex-, and face-level supervision. We combine intrinsic mesh descriptors with anatomy-aware priors, including PCA-derived frames for dental arches and liver surfaces, and augment message passing to provide lightweight global context. Across intracranial aneurysm and intraoral segmentation, EAMS variants are competitive with specialized baselines on unperturbed inputs while remaining stable under geometric perturbations, and on liver surfaces they expose a favorable trade-off between canonical-pose accuracy and rotation robustness. These results show that a lightweight (<2M parameters) equivariant framework can deliver robust anatomical mesh segmentation across diverse supervision types without task-specific architectures. Hi everyone I’m excited to share my solo paper "Augmented Equivariant Mesh Networks for Anatomical Mesh Segmentation" which has been accepted for poster presentations at the ICML 2026 workshops on AI for Science and Structured Data for Health. The project stemmed from my parallel research on structural encoders for biomolecules where enforcing roto-translational equivariance is standard. In this work, I wanted to extend those principles directly to various 3D medical meshes. While current anatomical mesh segmentation methods are highly disjoint and anato
Built a richer paper page for 3 million arxiv and OpenAlex papers. Free, no signup, no paywall. tomesphere.com Each page has a Gemini generated TLDR, peer reviews scraped from OpenReview with reviewe…
Built a richer paper page for 3 million arxiv and OpenAlex papers. Free, no signup, no paywall. tomesphere.com Each page has a Gemini generated TLDR, peer reviews scraped from OpenReview with reviewer scores and decisions, GitHub repos, HuggingFace models and datasets, conference videos, the citation graph from OpenAlex (about 250M edges), and a semantic graph using SPECTER2 (768D in pgvector) with four ranking modes: Influential, Recent, Hidden gems, Nearest. Connected Papers and Litmaps default to citation overlap. Tomesphere defaults to text vector similarity, so brand new papers without a citation graph still appear and topically similar work shows up even without shared citers. Chrome extension overlays the same data on arxiv abstract and pdf pages. Try a paper you know: tomesphere.com/paper/2312.00752 (Mamba) tomesphere.com/paper/1706.03762 (Attention) tomesphere.com/paper/2305.14314 (QLoRA) Open to feedback.   submitted by   /u/RegretAgreeable4859 [link]   [comments]
Essay argues that reasoning models cannot perform faithful inference because their reasoning trace and final answer come from the same operation. Engages with Lanham/Turpin/Mirzadeh in empirical crit…
Essay argues that reasoning models cannot perform faithful inference because their reasoning trace and final answer come from the same operation. Engages with Lanham/Turpin/Mirzadeh in empirical critique, and with HRM, TRM, GRAM, AlphaProof, and Kona/Aleph as the contrasting architectural lineage. Curious what this subreddit makes of the constraint-vs-influence framing. https://mauhaq.substack.com/p/verbosity-is-not-faithfulness   submitted by   /u/Sensitive_Air_5745 [link]   [comments]
I am a CS undergrad and I think token economics is the next big problem for companies. I am building a LLM router specifically for code and codebases. The Routing is not actually done by a heavily fi…
I am a CS undergrad and I think token economics is the next big problem for companies. I am building a LLM router specifically for code and codebases. The Routing is not actually done by a heavily fine tuned llm(already existing solutions do this). Using a bit of a different approach. I am gauging the complexity by measuring interaction between signals that can be cheaply extracted from the prompt. One of these signals is what I like to call blooms_intent, based on bloom’s taxonomy. Bloom's taxonomy is a framework for categorizing educational goals. If a query is “What is this” it falls under remember category whereas “implement this” is more of create category. Questions:- How do I find datasets for this purpose. Is bootstrapping datasets using AI fine for this. Should I do centroid based classification which I’ve been doing till now but the confidence difference between categories for ambiguous queries is way too close. What is the best dataset size and classifier that can somewhat reliably differentiate nuances between queries. You may ask why not use AI for these questions. I have and that’s why I’ve come here. Please lmk your thoughts and thanks in advance!!   submitted by   /u/getridofaks [link]   [comments]
Slightly unhinged engineering decision but it works. My tool (ScholarScout) has a 2-3 minute pipeline: fetch papers from 8 databases → analyze trends → generate ideas. During that time, the user sees…
Slightly unhinged engineering decision but it works. My tool (ScholarScout) has a 2-3 minute pipeline: fetch papers from 8 databases → analyze trends → generate ideas. During that time, the user sees a pixel art owl running through a parallax forest. The fun part: it's not fake animation. Each paper dot that spawns in the game corresponds to a real paper_found SSE event from the backend. Papers drip-feed at 600ms intervals from a queue (even if the fetch returned 30 papers at once). Colors = source (white=arXiv, green=PubMed, purple=Crossref). When pipeline finishes, owl celebrates. Tech: vanilla JS canvas, 32x32 sprite sheet (12 frames), requestAnimationFrame loop, image-rendering: pixelated. No dependencies. Here's the demo vid ScholarScout v1.5.3 - Demo Actual useful changes in the same release: Review Mode: paper clustering (k-means on embeddings, Jaccard fallback) + per-cluster synthesis + cross-cutting analysis Paper freshness: _used_count per paper in cache, least-used prioritized, auto-widen date range on exhaustion All thresholds externalized to config.yaml github.com/neej4/ScholarScout or ScholarScout — Papers in. Ideas out.   submitted by   /u/neeejaaa0 [link]   [comments]
I’m currently studying ML, more specifically convolutional neural networks (CNNs) for finding patterns in images. Right now, I’m trying to develop a model that can solve the “Where’s Waldo?” challeng…
I’m currently studying ML, more specifically convolutional neural networks (CNNs) for finding patterns in images. Right now, I’m trying to develop a model that can solve the “Where’s Waldo?” challenge. However, I currently have a question: what would be the best option for training a CNN model, PyTorch or Dlib? At the moment, I have an AMD RX580. Since Dlib only supports CUDA, I would need to use Google Colab. I’m still learning about this field, so if I said something incorrect or if you have any tips on how to approach this project, I’d be very happy to hear them. 😄   submitted by   /u/TearsInTokio [link]   [comments]
I’ve been reading GPU architecture docs in my free time. NVIDIA PTX, AMD ISA reference guides, Intel Xe, reverse-engineered Apple GPU stuff. Over 5,000 pages across 16 microarchitectures. After a whi…
I’ve been reading GPU architecture docs in my free time. NVIDIA PTX, AMD ISA reference guides, Intel Xe, reverse-engineered Apple GPU stuff. Over 5,000 pages across 16 microarchitectures. After a while you notice all four vendors are doing the same 11 things with different names. So I wrote a spec that covers all of them and built a toolchain around it. It’s called WAVE. You write a kernel once, it compiles to a portable binary, then thin backends translate it to Metal, PTX, HIP, or SYCL. Same binary verified on Apple M4 Pro, NVIDIA T4, and AMD MI300X. My co-author Onyinye built PyTorch integration and got identical training results across all backends. Please star on GitHub: https://github.com/Oabraham1/wave Preprint: https://arxiv.org/abs/2603.28793 Read full docs and how I built everything: https://wave.ojima.me pip install wave-gpu   submitted by   /u/not-your-typical-cs [link]   [comments]
Hi guys I've been working on GitRAG — paste any public GitHub URL, and ask it anything about the codebase. It answers with exact file paths and line numbers, no hallucination. How it works under the …
Hi guys I've been working on GitRAG — paste any public GitHub URL, and ask it anything about the codebase. It answers with exact file paths and line numbers, no hallucination. How it works under the hood: Clones the repo and splits files into semantic chunks using AST-aware parsing (not just line splits) Builds a hybrid index — dense embeddings + BM25 keyword index At query time, fuses both signals with Reciprocal Rank Fusion, then runs Cohere reranking to cut 20 candidates down to 5 Sends those 5 chunks to Groq's llama-3.3-70b which generates a grounded answer The retrieval pipeline is what I'm most proud of — the BM25 + semantic fusion catches things that pure vector search misses (exact function names, error codes, etc.) Stack: FastAPI · ChromaDB · text-embedding-3-small · Cohere rerank-v3.5 · Groq llama-3.3-70b · React + Vite Supports 15+ languages: Python, JS/TS, C#, Java, Go, Rust, C/C++, Swift, Kotlin, Dart, Ruby, PHP, Vue, Svelte, Shell... Curious what repos people try it on — drop your results below 👇   submitted by   /u/Professional-Pie6704 [link]   [comments]
Hi all, Apologies beforehand if this is the wrong subreddit, let me know if you think there are better subreddits for this post. I’m working on a project around proprietary data licensing for AI tra…
Hi all, Apologies beforehand if this is the wrong subreddit, let me know if you think there are better subreddits for this post. I’m working on a project around proprietary data licensing for AI training and trying to identify data types that are genuinely inaccessible to AI labs- not because it doesn’t exist, but because no one has figured out how to unlock it. Specifically looking for data that is: • Created by domain experts as part of their daily work • Never published or shared outside the organization • Rich in human reasoning, not just structured outputs Finance is my background so I’m especially curious about examples there, but all industries welcome. What’s the most valuable “locked” professional data you’ve come across in your field - and who (if ya know) owns the rights to it?   submitted by   /u/Manny_in_iceage [link]   [comments]
Looking for communities where people actually dig into ML/AI research, not hype, not "look what I built with an LLM API," but discussions about papers, training dynamics, debugging real mod…
Looking for communities where people actually dig into ML/AI research, not hype, not "look what I built with an LLM API," but discussions about papers, training dynamics, debugging real models, infra problems, that kind of thing. I'm specifically interested in places where you can post something like "I'm seeing X behaviour in my SSL training, here's the loss curve, anyone seen this before?" and get thoughtful replies instead of generic advice.   submitted by   /u/Possible-Active-1903 [link]   [comments]
Is this normal? I searched it up and last year it was only 8000.   submitted by   /u/NightCR_ [link]   [comments]
Hey i built Aiki a lightweight tool that let's you chat with Wikipedia locally. what it does: - Downloads and chunks wikipedia articles (u can choose those articles by their name or articles and also…
Hey i built Aiki a lightweight tool that let's you chat with Wikipedia locally. what it does: - Downloads and chunks wikipedia articles (u can choose those articles by their name or articles and also the option of downloading the similar topics) - Uses a custom TF-IDF + cosine similarity retriever (built from scratch) - Supports query expansion using Wikipedia links/redirects - Optional answer generation with llm Very minimal dependencies and runs completely locally. Repo: https://github.com/yacine204/Aiki Would really appreciate your feedback.   submitted by   /u/Just_Jaguar3701 [link]   [comments]
Nathan Witkin, a research writer at NYU Stern’s Tech and Society Lab, writes damningly about the famous METR AI time horizons graph in the Substack publication Transformer: It is impossible to draw …
Nathan Witkin, a research writer at NYU Stern’s Tech and Society Lab, writes damningly about the famous METR AI time horizons graph in the Substack publication Transformer: It is impossible to draw meaningful conclusions from METR’s Long Tasks benchmark — in particular once one realizes that its numerous flaws are probably compounding in unpredictable ways. The appropriate response to a study of this kind is not to assume it can be saved via back-of-the-envelope adjustments, or to comfort oneself that other anecdotal evidence implies that it is probably correct anyway. It is to cut one’s losses and move on in search of higher-quality information. … The METR graph cannot be saved. For all its sleekness and complexity, it contains far too many compounding errors to excuse. Among them is generalizing to the entire species data collected from a small group of the authors’ peers. Coming up with ever more dramatic ways to make this mistake has become a kind of sport among AI researchers. If the field has a central pathology, it is to aggressively overindex on a mix of anecdotal data from power-users, alongside a long list of benchmarks even more compromised than METR’s. One hopes that as the field matures, its participants will learn to stop making these mistakes. The errors include: Some of the human baselines data is not actually measured or collected from any empirical source, rather, it is just guesstimated by the authors A key variable in the data is how long it takes humans to complete certain tasks, but — when METR did actually measure this — it paid its human benchmarkers hourly, meaning they were incentivized with cash to take longer The sample of human benchmarkers was biased toward METR employees’ friends, acquaintances, and former colleagues (who are likely unrepresentative and possibly biased) Humans familiar with a codebase and a specific coding task were 5-18x faster at completing it, but METR used data from humans who were much slower because they had t