Канал: AI Coffee Break with Letitia

GaLore EXPLAINED: Memory-Efficient LLM Training by Gradient Low-Rank Projection

GaLore EXPLAINED: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Shapley Values Explained | Interpretability for AI models, even LLMs!

Shapley Values Explained | Interpretability for AI models, even LLMs!

Stealing Part of a Production LLM | API protect LLMs no more

Stealing Part of a Production LLM | API protect LLMs no more

Genie explained 🧞 Generative Interactive Environments paper explained

Genie explained 🧞 Generative Interactive Environments paper explained

MAMBA and State Space Models explained | SSM explained

MAMBA and State Space Models explained | SSM explained

Sparse LLMs at inference: 6x faster transformers! | DEJAVU paper explained

Sparse LLMs at inference: 6x faster transformers! | DEJAVU paper explained

Transformers explained | The architecture behind LLMs

Transformers explained | The architecture behind LLMs

Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained

Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained

LLM hallucinations discover new math solutions!? | FunSearch explained

LLM hallucinations discover new math solutions!? | FunSearch explained

DALL-E 3 is better at following Text Prompts! Here is why. — DALL-E 3 explained

DALL-E 3 is better at following Text Prompts! Here is why. — DALL-E 3 explained

Adversarial Attacks and Defenses. The Dimpled Manifold Hypothesis. David Stutz from DeepMind #HLF23

Adversarial Attacks and Defenses. The Dimpled Manifold Hypothesis. David Stutz from DeepMind #HLF23

What is LoRA? Low-Rank Adaptation for finetuning LLMs EXPLAINED

What is LoRA? Low-Rank Adaptation for finetuning LLMs EXPLAINED

Are ChatBots their own death? | Training on Generated Data Makes Models Forget – Paper explained

Are ChatBots their own death? | Training on Generated Data Makes Models Forget – Paper explained

The first law on AI regulation | The EU AI Act

The first law on AI regulation | The EU AI Act

Say that 3 times in a row. 😅

Say that 3 times in a row. 😅

Author Interviews, Poster Highlights, Summary of the ACL 2023 Toronto NLP

Author Interviews, Poster Highlights, Summary of the ACL 2023 Toronto NLP

ChatGPT ist not an intelligent agent. It is a cultural technology. – Gopnik Keynote

ChatGPT ist not an intelligent agent. It is a cultural technology. – Gopnik Keynote

Do LLMs understand? Jay Alammar's TLDR of Geoffrey Hinton ACL2023 Keynote

Do LLMs understand? Jay Alammar's TLDR of Geoffrey Hinton ACL2023 Keynote

[Own work] MM-SHAP to measure modality contributions

[Own work] MM-SHAP to measure modality contributions

Eight Things to Know about Large Language Models

Eight Things to Know about Large Language Models

Speaking about AI is hard, even for humans | AI Coffee Break Bloopers

Speaking about AI is hard, even for humans | AI Coffee Break Bloopers

Moral Self-Correction in Large Language Models | paper explained

Moral Self-Correction in Large Language Models | paper explained

AI beats us at another game: STRATEGO | DeepNash paper explained

AI beats us at another game: STRATEGO | DeepNash paper explained

Why ChatGPT fails | Language Model Limitations EXPLAINED

Why ChatGPT fails | Language Model Limitations EXPLAINED

"Watermarking Language Models" paper and GPTZero EXPLAINED | How to detect text by ChatGPT?

"Watermarking Language Models" paper and GPTZero EXPLAINED | How to detect text by ChatGPT?

Training learned optimizers: VeLO paper EXPLAINED

Training learned optimizers: VeLO paper EXPLAINED

ChatGPT vs Sparrow - Battle of Chatbots

ChatGPT vs Sparrow - Battle of Chatbots

Paella: Text to image FASTER than diffusion models | Paella paper explained

Paella: Text to image FASTER than diffusion models | Paella paper explained

Generate long form video with Transformers | Phenaki from Google Brain explained

Generate long form video with Transformers | Phenaki from Google Brain explained

Movie Diffusion explained | Make-a-Video from MetaAI and Imagen Video from Google Brain

Movie Diffusion explained | Make-a-Video from MetaAI and Imagen Video from Google Brain

Beyond neural scaling laws – Paper Explained

Beyond neural scaling laws – Paper Explained

How does Stable Diffusion work? – Latent Diffusion Models EXPLAINED

How does Stable Diffusion work? – Latent Diffusion Models EXPLAINED

Machine Translation for a 1000 languages – Paper explained

Machine Translation for a 1000 languages – Paper explained

DALLE-2 has a secret language!? | Theories and explanations

DALLE-2 has a secret language!? | Theories and explanations

Imagen, the DALL-E 2 competitor from Google Brain, explained 🧠| Diffusion models illustrated

Imagen, the DALL-E 2 competitor from Google Brain, explained 🧠| Diffusion models illustrated

A New Physics-Inspired Theory of Deep Learning | Optimal initialization of Neural Nets

A New Physics-Inspired Theory of Deep Learning | Optimal initialization of Neural Nets

[Own work] VALSE 💃: Benchmark for Vision and Language Models Centered on Linguistic Phenomena

[Own work] VALSE 💃: Benchmark for Vision and Language Models Centered on Linguistic Phenomena

PaLM Pathways Language Model explained | 540 Billion parameters can explain jokes!?

PaLM Pathways Language Model explained | 540 Billion parameters can explain jokes!?

SEER explained: Vision Models more Robust & Fair when pretrained on UNCURATED images!?

SEER explained: Vision Models more Robust & Fair when pretrained on UNCURATED images!?

[Quiz] Regularization in Deep Learning, Lipschitz continuity, Gradient regularization

[Quiz] Regularization in Deep Learning, Lipschitz continuity, Gradient regularization

Diffusion models explained. How does OpenAI's GLIDE work?

Diffusion models explained. How does OpenAI's GLIDE work?

How do Vision Transformers work? – Paper explained | multi-head self-attention & convolutions

How do Vision Transformers work? – Paper explained | multi-head self-attention & convolutions

Announcement: ☕⚔️🍵 AMA with AI Coffee Break & Chai Time Data Science over @WeightsBiases #Shorts

Announcement: ☕⚔️🍵 AMA with AI Coffee Break & Chai Time Data Science over @WeightsBiases #Shorts

ConvNeXt: A ConvNet for the 2020s – Paper Explained (with animations)

ConvNeXt: A ConvNet for the 2020s – Paper Explained (with animations)

[Quiz] Interpretable ML, VQ-VAE w/o Quantization / infinite codebook, Pearson’s, PointClouds

[Quiz] Interpretable ML, VQ-VAE w/o Quantization / infinite codebook, Pearson’s, PointClouds

[Quiz] Eigenfaces, Domain adaptation, Causality, Manifold Hypothesis, Denoising Autoencoder

[Quiz] Eigenfaces, Domain adaptation, Causality, Manifold Hypothesis, Denoising Autoencoder

Linear algebra with Transformers – Paper Explained

Linear algebra with Transformers – Paper Explained

Masked Autoencoders Are Scalable Vision Learners – Paper explained and animated!

Masked Autoencoders Are Scalable Vision Learners – Paper explained and animated!

The efficiency misnomer | Size does not matter | What does the number of parameters mean in a model?

The efficiency misnomer | Size does not matter | What does the number of parameters mean in a model?

Do Transformers process sequences of FIXED or of VARIABLE length? | #AICoffeeBreakQuiz

Do Transformers process sequences of FIXED or of VARIABLE length? | #AICoffeeBreakQuiz

Generalization – Interpolation – Extrapolation in Machine Learning: Which is it now!?

Generalization – Interpolation – Extrapolation in Machine Learning: Which is it now!?

SimVLM explained | What the paper doesn’t tell you

SimVLM explained | What the paper doesn’t tell you

Data BAD | What Will it Take to Fix Benchmarking for NLU?

Data BAD | What Will it Take to Fix Benchmarking for NLU?

Swin Transformer paper animated and explained

Swin Transformer paper animated and explained

Eyes tell all: How to tell that an AI generated a face?

Eyes tell all: How to tell that an AI generated a face?

How modern search engines work – Vector databases explained! | Weaviate open-source

How modern search engines work – Vector databases explained! | Weaviate open-source

Foundation Models | On the opportunities and risks of calling pre-trained models “Foundation Models”

Foundation Models | On the opportunities and risks of calling pre-trained models “Foundation Models”

What is tokenization and how does it work? Tokenizers explained.

What is tokenization and how does it work? Tokenizers explained.

How to increase the receptive field in CNNs? | #AICoffeeBreakQuiz #Shorts

How to increase the receptive field in CNNs? | #AICoffeeBreakQuiz #Shorts

The convolution is not shift invariant. | Invariance vs Equivariance | ❓ #AICoffeeBreakQuiz #Shorts

The convolution is not shift invariant. | Invariance vs Equivariance | ❓ #AICoffeeBreakQuiz #Shorts

Why do we care about cross-correlations vs convolutions | ❓ #AICoffeeBreakQuiz #Shorts

Why do we care about cross-correlations vs convolutions | ❓ #AICoffeeBreakQuiz #Shorts

Convolution vs Cross-Correlation. How most CNNs do not compute convolutions. | ❓ #Shorts

Convolution vs Cross-Correlation. How most CNNs do not compute convolutions. | ❓ #Shorts

Is today's AI smarter than YOU? #Shorts

Is today's AI smarter than YOU? #Shorts

Data leakage during data preparation? | Using AntiPatterns to avoid MLOps Mistakes

Data leakage during data preparation? | Using AntiPatterns to avoid MLOps Mistakes

What is the model identifiability problem? | Explained in 60 seconds! | ❓ #AICoffeeBreakQuiz #Shorts

What is the model identifiability problem? | Explained in 60 seconds! | ❓ #AICoffeeBreakQuiz #Shorts

Saddle points vs. local minima in high dimensional spaces | ❓ #AICoffeeBreakQuiz #Shorts

Saddle points vs. local minima in high dimensional spaces | ❓ #AICoffeeBreakQuiz #Shorts

Self-Attention with Relative Position Representations – Paper explained

Self-Attention with Relative Position Representations – Paper explained

Adding vs. concatenating positional embeddings & Learned positional encodings

Adding vs. concatenating positional embeddings & Learned positional encodings

Positional embeddings in transformers EXPLAINED | Demystifying positional encodings.

Positional embeddings in transformers EXPLAINED | Demystifying positional encodings.

Charformer: Fast Character Transformers via Gradient-based Subword Tokenization +Tokenizer explained

Charformer: Fast Character Transformers via Gradient-based Subword Tokenization +Tokenizer explained

How cross-modal are vision and language models really? 👀 Seeing past words. [Own work]

How cross-modal are vision and language models really? 👀 Seeing past words. [Own work]

Scaling Vision Transformers? How much data can a transformer get? #Shorts

Scaling Vision Transformers? How much data can a transformer get? #Shorts

"Please Commit More Blatant Academic Fraud" – A fellow PhD student's response.

"Please Commit More Blatant Academic Fraud" – A fellow PhD student's response.

Are Pre-trained Convolutions Better than Pre-trained Transformers? – Paper Explained

Are Pre-trained Convolutions Better than Pre-trained Transformers? – Paper Explained

FNet: Mixing Tokens with Fourier Transforms – Paper Explained

FNet: Mixing Tokens with Fourier Transforms – Paper Explained

Deep Learning for Symbolic Mathematics!? | Paper EXPLAINED

Deep Learning for Symbolic Mathematics!? | Paper EXPLAINED

Pattern Exploiting Training explained! | PET, iPET, ADAPET

Pattern Exploiting Training explained! | PET, iPET, ADAPET

[RANT] Adversarial attack on OpenAI’s CLIP? Are we the fools or the foolers?

[RANT] Adversarial attack on OpenAI’s CLIP? Are we the fools or the foolers?

Transformer in Transformer: Paper explained and visualized | TNT

Transformer in Transformer: Paper explained and visualized | TNT

NVIDIA Jarvis (now NVIDIA Riva) meets Ms. Coffee Bean

NVIDIA Jarvis (now NVIDIA Riva) meets Ms. Coffee Bean

UMAP explained | The best dimensionality reduction?

UMAP explained | The best dimensionality reduction?

Transformers can do both images and text. Here is why.

Transformers can do both images and text. Here is why.

OpenAI’s CLIP explained! | Examples, links to code and pretrained model

OpenAI’s CLIP explained! | Examples, links to code and pretrained model

Leaking training data from GPT-2. How is this possible?

Leaking training data from GPT-2. How is this possible?

OpenAI's DALL-E explained. How GPT-3 creates images from descriptions.

OpenAI's DALL-E explained. How GPT-3 creates images from descriptions.

Data-efficient Image Transformers EXPLAINED! Facebook AI's DeiT paper

Data-efficient Image Transformers EXPLAINED! Facebook AI's DeiT paper

PCA explained with intuition, a little math and code

PCA explained with intuition, a little math and code

AI Coffee Break with Letitia Parcalabescu Live Stream

AI Coffee Break with Letitia Parcalabescu Live Stream

The curse of dimensionality. Or is it a blessing?

The curse of dimensionality. Or is it a blessing?

"What Can We Do to Improve Peer Review in NLP?" 👀

"What Can We Do to Improve Peer Review in NLP?" 👀

GPT2 wrote this 1000 subscribers special!

GPT2 wrote this 1000 subscribers special!

AI understanding language!? A roadmap to natural language understanding.

AI understanding language!? A roadmap to natural language understanding.

An image is worth 16x16 words: ViT | Vision Transformer explained

An image is worth 16x16 words: ViT | Vision Transformer explained

Why Multimodal Machine Learning models do not work. Part 2/2 – The CAUSES

Why Multimodal Machine Learning models do not work. Part 2/2 – The CAUSES

Multimodal Machine Learning models do not work. Here is why. Part 1/2 – The SYMPTOMS

Multimodal Machine Learning models do not work. Here is why. Part 1/2 – The SYMPTOMS

What nobody tells you about MULTIMODAL Machine Learning! 🙊 THE definition.

What nobody tells you about MULTIMODAL Machine Learning! 🙊 THE definition.

GANs explained | Generative Adversarial Networks video with showcase!

GANs explained | Generative Adversarial Networks video with showcase!

Can language models understand? Bender and Koller argument.

Can language models understand? Bender and Koller argument.

The ultimate intro to Graph Neural Networks. Maybe.

The ultimate intro to Graph Neural Networks. Maybe.

Can a neural network tell if an image is mirrored? – Visual Chirality

Can a neural network tell if an image is mirrored? – Visual Chirality

BERTology meets Biology | Solving biological problems with Transformers

BERTology meets Biology | Solving biological problems with Transformers

Adversarial Machine Learning explained! | With examples.

Adversarial Machine Learning explained! | With examples.

GPT-3 explained with examples. Possibilities, and implications.

GPT-3 explained with examples. Possibilities, and implications.

Pre-training of BERT-based Transformer architectures explained – language and vision!

Pre-training of BERT-based Transformer architectures explained – language and vision!

Transformer combining Vision and Language? ViLBERT - NLP meets Computer Vision

Transformer combining Vision and Language? ViLBERT - NLP meets Computer Vision

Preparing for Virtual Conferences – 7 Tips for recording a good conference talk

Preparing for Virtual Conferences – 7 Tips for recording a good conference talk

The Transformer neural network architecture EXPLAINED. “Attention is all you need”

The Transformer neural network architecture EXPLAINED. “Attention is all you need”

Our paper at CVPR 2020 - MUL Workshop and ACL 2020 - ALVR Workshop

Our paper at CVPR 2020 - MUL Workshop and ACL 2020 - ALVR Workshop

A brief history of the Transformer architecture in NLP

A brief history of the Transformer architecture in NLP

How to check if a neural network has learned a specific phenomenon?

How to check if a neural network has learned a specific phenomenon?

AI Coffee Break - Channel Trailer

AI Coffee Break - Channel Trailer