Stable Diffusion creator leaves Stability AI 👋, foundation model for self-driving 🚗, diffusion-based video translation 📹

TLDR

Together With

TLDR AI 2024-03-21

AI phone calls?! The world's fastest conversational AI was released and it sounds just like a human (Sponsor)

AI lab, Bland AI has released a hyper-realistic sounding AI phone agent, and it’s blowing everyone's minds.

It can be used for anything: sales, instantly calling and pre-qualifying leads, customer support…
It can handle over 1,000,000 business phone calls simultaneously.
It can respond at human level speeds... with any voice.

Developers and companies are loving this. Impacts on the job market are coming soon...

Skeptical? Try calling it yourself >> Bland.ai

(P.S. TLDR readers can sign up here and access something even crazier...)

🚀

Headlines & Launches

Stable Diffusion Maker Leaves Stability AI (1 minute read)

Stability AI's research scientist Robin Rombach, crucial to developing the Stable Diffusion model, is leaving the company, marking a significant departure amidst a year of technical team changes.

Introducing Copilot4D: A Foundation Model For Self-Driving (3 minute read)

Waabi's Copilot4D is a pioneering foundation model that leverages LiDAR data to understand and predict the 3D dynamics of the environment over time, advancing the capabilities of autonomous machines.

NLX raises $15m series A (5 minute read)

NLX, an enterprise conversational AI platform, has raised additional funding from Cercano, Comcast, and others. The platform is used to build chat, voice, video, and conversational systems.

🧠

Research & Innovation

Data Augmentation with Diffusion Models (23 minute read)

DreamDA offers a new approach to data augmentation, utilizing diffusion models to synthesize diverse, high-quality images that closely match the original data distribution.

Vision-Language Models with Interactive Reasoning (4 minute read)

Chain-of-Spot (CoS) introduces an Interactive Reasoning technique that significantly enhances how Large Vision-Language Models (LVLMs) process and understand images. By focusing on key regions of interest within images in response to specific questions or instructions, CoS enables LVLMs to access detailed visual information without compromising image resolution.

Enhancing Virtual Try-On with Pre-Trained Diffusion Models (4 minute read)

StableVITON is a novel approach to image-based virtual try-on. This method focuses on maintaining clothing details while leveraging the generative power of pre-trained diffusion models. StableVITON learns semantic correspondences between clothes and the human body in a pre-trained model's latent space.

🧑‍💻

Engineering & Resources

Phospho: Use text analytics to detect issues in LLM apps (Sponsor)

Phospho is an open source text analytics platform for LLM apps. Easily log user inputs and LLM app outputs to get an overview of what is happening in real time. Automatically detect and extract semantic events in each messages. Build automated workflows via webhooks or API calls when specific events are detected. Self deploy Phospo in 60 seconds or try Phospo Pro

Triton Puzzles (GitHub Repo)

Triton is a library for writing CUDA kernels in a Python-like way. It is gaining popularity. This repository has a set of puzzles in increasing difficulty that encourage learning the tool.

Diffusion-based Video Translation (3 minute read)

FRESCO is a novel approach that combines intra-frame and inter-frame correspondences to significantly improve the spatial-temporal consistency in video translation tasks.

Enhanced Image Editing with Consistency Models (GitHub Repo)

This project enhances the capabilities of diffusion models for tasks like image editing and restoration by introducing Generalized Consistency Trajectory Models (GCTMs). These models streamline the process, making it possible to modify images with remarkable precision and efficiency by translating between any two distributions with just one step.

🎁

Miscellaneous

New Breakthrough Brings Matrix Multiplication Closer to Ideal (7 minute read)

Tsinghua University and UC Berkeley researchers have achieved a significant breakthrough in matrix multiplication, presenting an innovative technique that has already spurred further enhancements. This advancement in a core computational operation could lead to substantial time, power, and cost savings across various applications. This represents the most considerable progress in reducing the computational complexity of matrix multiplication since the previous milestone in 2010.

Stylized image binning algorithm (3 minute read)

This is a tutorial on creating a pixel-art-like image processing tool with interactive web elements like sliders for customization using a binning algorithm in JavaScript. The binning technique uses parameters such as bin size and gap to convert images into stylized, pixelated artwork by averaging pixel brightness within bins. The implementation involves manipulating pixel data on HTML canvas elements and optimizing looping structures for efficiency.

Using LLMs to Generate Fuzz Generators (8 minute read)

LLMs like Claude can generate effective fuzzers for parsing code, automating a process that traditionally requires significant human effort. While LLMs are typically not precise enough for static analysis, they appear well-suited for creating fuzzers due to the stochastic nature of fuzzing. A hybrid approach that combines LLM-driven static analysis and targeted fuzzing could be promising for identifying and exploiting vulnerabilities in code.

⚡

Quick Links

Live webinar: Reducing model failures and boosting ML performance (Sponsor)

This webinar by Kolena and Rad AI will introduce cutting-edge testing methodologies that reduce time-to-market for AI/ML, while ensuring model reliability and performance. Register ↗️

JARS AI Launches Platform for Interactive AI Shows (Product)

JARS allows anyone to create episodes of their favorite shows with friends.

OpenAI Could Release GPT-5 In A Few Months (2 minute read)

OpenAI could release GPT-5 in the summer.

Beijing court rules AI-generated content covered by copyright, eschews US stand (3 minute read)

Beijing Internet Court ruled that an AI-created image was copyright-protected artwork due to the human creator's intellectual input, awarding damages in a landmark case with significant implications for the AI industry and copyright disputes worldwide.

Love TLDR and want free stuff? 🎁

Share TLDR with your friends and get rewards when they subscribe. Simply send them your referral link below!

https://refer.tldr.tech/bbf60024/2

Earn a TLDR T-shirt 👕, slides 🩴, or even an autobiography about your life! 🤯

Want more TLDR?

We also write newsletters for Tech, Web Dev, Information Security, Product Management, DevOps, Founders, Design, Marketing, and more!

Sign up now!

Want to advertise in TLDR? 📰

If your company is interested in reaching an audience of AI professionals and decision makers, you may want to advertise with us.

If you have any comments or feedback, just respond to this email!

Thanks for reading,
Andrew Tan & Andrew Carr

If you don't want to receive future editions of TLDR AI, please click here to unsubscribe.