#tokenization #transformers #nlp
Tokenization is the process of representing text into smaller meaningful lexical units. Byte Pair Encoding (BPE) is a popular subword-based tokenization algorithm used by state-of-the-art NLP models like RoBerta, BART, GPT, etc. In this video, we look into pros and cons of other methods and understand BPE through an example.
⏩ OUTLINE:
0:00 - Tokenization in NLP and it's types.
01:10 - Subword-level Tokenization
01:43 - Byte Pair Encoding (BPE) Algorithm
Enjoy reading articles? then consider subscribing to Medium membership, it is just 5$ a month for unlimited access to all free/paid content.
Subscribe now - [ Ссылка ]
*********************************************
If you want to support me financially which is totally optional and voluntary :) ❤️
You can consider buying me chai ( because I don't drink coffee :) ) at [ Ссылка ]
*********************************************
⏩ IMPORTANT LINKS
Tokenization methods in NLP: [ Ссылка ]
Research Paper Summaries: [ Ссылка ]
*********************************************
⏩ Youtube - [ Ссылка ]
⏩ LinkedIn - [ Ссылка ]
⏩ Medium - [ Ссылка ]
⏩ GitHub - [ Ссылка ]
*********************************************
⏩ Please feel free to share out the content and subscribe to my channel - [ Ссылка ]
Tools I use for making videos :)
⏩ iPad - [ Ссылка ]
⏩ Apple Pencil - [ Ссылка ]
⏩ GoodNotes - [ Ссылка ]
#techviz #datascienceguy #naturallanguageprocessing #machinelearning #ai
About Me:
I am Prakhar Mishra and this channel is my passion project. I am currently pursuing my MS (by research) in Data Science. I have an industry work-ex of 3+ years in the field of Data Science and Machine Learning with a particular focus on Natural Language Processing (NLP).
Byte Pair Encoding Tokenization in NLP
Теги
byte pair encoding algorithmbyte pair encoding explainedbyte pair encoding tokenization in nlpbpe tokenization in nlpsubword tokenization methodbyte pair encoding tokenizationbpe transformersnatural language processingmachine learningtransformers in nlptokenization in transformersbyte-pair encoding (bpe) algorithmresearchnlp concepttechviz data scienceainlproctokenization methods in nlpdifferent types of tokenization in nlp