In this tutorial, I'll showcase how to fine-tune PaliGemma, a new open vision-language model by Google on a receipt image to JSON use case. The goal for the model is to learn to output a JSON containing all key fields from a receipt, such as the product items, their prices and quantities.
Do note that PaliGemma is just one of many vision-language models released recently.
The notebook can be found here: [ Ссылка ]
Ещё видео!