*this article is a work in progress 🙂
In the context of generative models like text-to-image GANs ( Stable Diffusion checkpoints ), it is common to train the models using annotated images as a form of supervision. However, it is indeed possible to train such models with just visual information, without relying on text annotations. The use of text annotations can provide additional context and guidance for generating images that match specific textual descriptions, but it is not a strict requirement.
When training a text-to-image GAN without using text annotations, the approach would fall under the umbrella of unsupervised or self-supervised learning.
Generative models, such as text-to-image GANs, often rely on techniques like conditional GANs (cGANs) to generate images based on given input conditions, which could include text descriptions. However, if training such a model without using text, we can explore alternative methods:
Self-supervised learning is a clever way of training a machine learning model using the data itself to create its own training labels or tasks. It’s like a student teaching themselves by using the materials they already have, without needing a teacher to provide all the answers.
In self-supervised learning, the model doesn’t need those labels. Instead, it creates its own learning tasks using the data. For instance, if the data is a bunch of images, the model might:
Jigsaw Puzzles: Cut an image into pieces and shuffle them, then try to put them back together. This teaches the model about spatial relationships between different parts of an image.
Rotation Prediction: Rotate an image and then try to predict how much it was rotated. This helps the model understand the orientation of objects in images.
Context Prediction: Hide parts of an image and have the model predict what’s missing. This helps the model learn about the context and relationships between different objects.
Colorization: Show the model a grayscale image and have it predict the colors. This teaches the model about object appearances and textures.
How to do a non discriminatory training
Self-supervised learning is not inherently discriminatory; it’s a training method that allows a machine learning model to learn from data without requiring explicit labels. It can be applied to a wide range of data types and domains, including artistic images, and it doesn’t necessarily involve discrimination. It’s a way for the model to discover patterns, features, or relationships within the data, which can then be used for various creative tasks.
Self-supervised learning could be used to extract common ideas or concepts connecting a set of artistic images. This is a form of feature learning, where the model learns to capture high-level semantic information from the images.
Use a self-supervised learning approach (like jigsaw puzzles, rotation prediction, etc.) to train a model on a dataset of artistic images.
The model learns to capture underlying structures, shapes, colors, and other features present in the images.
The goal is to train the model to extract meaningful features that represent the artistic style or theme of the images.
After training, you can use the trained model as a feature extractor.
Feed new images (including existing ones) through the trained model, and it will produce a set of numerical features that capture the essence of the images‘ artistic qualities.
You can use these extracted features to recreate images with a similar artistic style or theme. For example, you could generate new images that share the same underlying „idea“ as the original artistic images.
You could also use the extracted features for post-processing tasks, enhancing existing images while preserving their artistic characteristics.
It’s important to note that self-supervised learning can capture general patterns in the data, but it might not perfectly capture every nuance of artistic intent. Artistic interpretation often involves complex and subjective elements that may not be fully captured by automated processes. However, using self-supervised learning can still provide valuable insights and creative capabilities.
hands on
train a common lora with discriminatory style + nice overview!