Paper Review 2: MATCHA : Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering
Visual language data such as plots, charts, and infographics are ubiquitous in the human world. However, state-of-the-art vision-language models do not perform well on these data.
By Beksultan Sagyndyk
use Pix2Struct as the base model and
further pretrain it with chart derendering and math
reasoning tasks.
[Read More]