Testing LLMs is not simple. Probabilistic output makes failures hard to identify, while running the models repeatedly tends to become very expensive quickly.
Google has acknowledged an issue with its AI model Gemini. The model injected inappropriate diversity into historical images, reflecting problems with bias in training data. The flaw sparked debate around diversity, equity, and inclusion in tech. Google implied that it will make improvements in the future but stopped short of a full apology for failing to properly contextualize historical figures in image generation.
OpenAI is working on a web search to compete more directly with Google. It is unclear if the product will be standalone or part of ChatGPT. Competition in the search space is filling up quickly with the addition of Copilot on Bing, newcomers like Perplexity, and Google's Gemini Copilot. A YouTube Short featuring Microsoft CEO Satya Nadella talking about competing with Google is available in the article.
REINFORCE is a simple, standard, and easily understood RL method. It is hard to train stably when used in simulators. PPO is much more performant and stable in general. Gemini uses REINFORCE and GPT-4 is believed to use PPO.
AlphaFold is used to predict the state of a protein after folding. By adding flow matching, which is invertible, you can dramatically improve modeling power on the entire landscape of proteins.
Researchers have developed a new way to make LLMs more efficient and easier to use by employing a method that focuses on 'expert-level sparsification', which reduces model size without losing performance. This is particularly useful for Mixture-of-Experts LLMs, which are powerful but usually too big to handle easily.
How do you transition from experimental models to highly scalable applications? apply() is a free community conference, where you can learn from seasoned engineers and technical leaders. Don’t miss out on the chance to hone your AI/ML skills in these FREE practical sessions and tech workshops! Save your spot now
GeneOH Diffusion is a new technique that improves how models understand and interact with objects using hands. This method focuses on making these interactions more natural by correcting errors in hand movements and relations with objects.
A model based on CodeLlama and DeepSeek Coder was able to get 85%+ on the HumanEval benchmark for programming by training on a synthetic multi-turn dataset and using human feedback.
Anthropic's research scientists have been working on a method of understanding deep neural networks that uses Circuits. These Circuits aim to identify subparts of models that get used for certain tasks. The research team has released a monthly update on the experiments they attempted and the results.
INSTRUCTIR is a new benchmark aimed at making search engines smarter in understanding users' intentions. Unlike current methods, which mostly focus on the query itself, INSTRUCTIR evaluates how well search engines can follow user instructions and adapt to various and changing search needs.
Join this Kolena webinar to explore unique challenges in computer vision - including precise object detection and real-time decision-making - through the lens of autonomous vehicle systems. Join free
Sam Altman's request for $7 trillion aims to support the rapidly escalating costs of advancing generative AI models like GPT, suggesting an exponential growth in resource needs for future iterations. This ambition underscores a pivotal moment in AI development, balancing between rapid technological progress and the broader implications of such swift advancement on safety and societal readiness.
Want the best of TLDR? 🏆
Refer a friend to TLDR AI using the referral link below, and we will send you the TLDR Hall of Fame, our 50 best stories of all time!
We help cutting edge companies hire world class technical talent through our job listings. If you're hiring AI researchers, machine learning engineers, data scientists or other tech talent, click here to learn more.
If your company is interested in reaching an audience of AI professionals and decision makers, you may want to advertise with us.
If you have any comments or feedback, just respond to this email!