Unsupervised training of custom LORA tools

*this article is a work in progress 🙂

In the context of generative models like text-to-image GANs ( Stable Diffusion checkpoints ), it is common to train the models using annotated images as a form of supervision. However, it is indeed possible to train such models with just visual information, without relying on text annotations. The use of text annotations can provide additional context and guidance for generating images that match specific textual descriptions, but it is not a strict requirement.

When training a text-to-image GAN without using text annotations, the approach would fall under the umbrella of unsupervised or self-supervised learning.

Generative models, such as text-to-image GANs, often rely on techniques like conditional GANs (cGANs) to generate images based on given input conditions, which could include text descriptions. However, if training such a model without using text, we can explore alternative methods:

  • Conditional Latent Variables: Instead of using text annotations, you could use other forms of conditioning information. For example, you might use categorical labels or other structured data that is available in your dataset to condition the image generation process.
  • Unconditional Generation: If your goal is to generate images without any specific conditioning, you could train an unconditional GAN. In this case, the generator generates images from random noise, and the discriminator tries to distinguish between real and generated images. Unconditional generation is a way for a computer program (like a neural network) to create something (like images, music, or text) without being told exactly what to create. It’s like asking an artist to paint something completely from their imagination, without giving them any specific instructions or reference.
  • Self-Supervised Learning: As mentioned earlier, you could utilize self-supervised learning techniques, such as contrastive learning, to train a generative model without explicit labels or annotations. These methods encourage the model to learn meaningful features from the visual data itself.

    Self-supervised learning is a clever way of training a machine learning model using the data itself to create its own training labels or tasks. It’s like a student teaching themselves by using the materials they already have, without needing a teacher to provide all the answers.

    In self-supervised learning, the model doesn’t need those labels. Instead, it creates its own learning tasks using the data. For instance, if the data is a bunch of images, the model might:

    Jigsaw Puzzles: Cut an image into pieces and shuffle them, then try to put them back together. This teaches the model about spatial relationships between different parts of an image.

    Rotation Prediction: Rotate an image and then try to predict how much it was rotated. This helps the model understand the orientation of objects in images.

    Context Prediction: Hide parts of an image and have the model predict what’s missing. This helps the model learn about the context and relationships between different objects.

    Colorization: Show the model a grayscale image and have it predict the colors. This teaches the model about object appearances and textures.

  • Hybrid Approaches: You could consider hybrid approaches that combine unsupervised and supervised learning. For instance, you might pretrain a generator using unsupervised methods and then fine-tune it using labeled data to improve the quality of generated images for specific classes or concepts.
  • How to do a non discriminatory training

    Self-supervised learning is not inherently discriminatory; it’s a training method that allows a machine learning model to learn from data without requiring explicit labels. It can be applied to a wide range of data types and domains, including artistic images, and it doesn’t necessarily involve discrimination. It’s a way for the model to discover patterns, features, or relationships within the data, which can then be used for various creative tasks.

    Self-supervised learning could be used to extract common ideas or concepts connecting a set of artistic images. This is a form of feature learning, where the model learns to capture high-level semantic information from the images.

  • Training Phase:
    Use a self-supervised learning approach (like jigsaw puzzles, rotation prediction, etc.) to train a model on a dataset of artistic images.
    The model learns to capture underlying structures, shapes, colors, and other features present in the images.
    The goal is to train the model to extract meaningful features that represent the artistic style or theme of the images.
  • Feature Extraction:
    After training, you can use the trained model as a feature extractor.
    Feed new images (including existing ones) through the trained model, and it will produce a set of numerical features that capture the essence of the images‘ artistic qualities.
  • Recreation or Post-Processing:
    You can use these extracted features to recreate images with a similar artistic style or theme. For example, you could generate new images that share the same underlying „idea“ as the original artistic images.
    You could also use the extracted features for post-processing tasks, enhancing existing images while preserving their artistic characteristics.
  • It’s important to note that self-supervised learning can capture general patterns in the data, but it might not perfectly capture every nuance of artistic intent. Artistic interpretation often involves complex and subjective elements that may not be fully captured by automated processes. However, using self-supervised learning can still provide valuable insights and creative capabilities.

    hands on

    train a common lora with discriminatory style + nice overview!