Nowadays, building and training such solutions to market is not very reliable, pending on external datasets quality. To improve actual systems, we propose a combination of state-of-the-art few-shot classifiers like CLIP, plus the addition of robust generative data augmentation such as Stable Diffusion and DreamBooth. This synergy of technologies will allow to build better AI image-based systems in terms of efficiency and quality.
The main innovation of this project background in the integration of various state-of-the-art technologies for the development of a methodology to build more robust search engines, in automatic and explainable way, especially for scarce data of very specific domains that SMEs as end users may own. For building such systems, we propose a new methodology based on CLIP linear probing with a robust data augmentation based initially in to new generative models (Stable Diffusion and DreamBooth).
Nowadays, scarcity of cross media data use to produce an unreliable training to data scientist. So, a robust data augmentation could be required rely in a better database with specific. Very specific domain images of a company cannot be extracted by the CLIP zero-shot methodology because the CLIP has never seen those kinds of images (out-of-domain). So, CLIP linear probing is required for re-training these new-domain images (fine-tuning the last layer).
If a domain-drift is spotted (identifying new contexts in the user query images) a supervised data augmentation like DreamBooth can be used to generate the company items in these new scenarios/domains.
The team:
Raquel Espinosa |
Javier Abellán |
José Miguel Bolarín |