Carlos Carbonell Images in the cloud

Prompt: Portrait of two women

In this exercise, participators took 5 images each of marble, wood and other entropy-rich texture with their smartphones. After that they described in natural language descriptions of pareidolia representations of interpretations of basic concepts such as human, woman, men, child, pet, home. The idea was to keep all the semantics close so they could overlap and create linked inferences.

We used these images to train a Stable diffusion Lora model that could forget more strictly rational figurative representations of these concepts and replace them with more subjective bias representations.

Here are some examples of the dataset participants created:

	Female
	Face of dog smiling, good boy
	A plastic wolf with palatine syndrome
	Home

The training process used a RTX 6000 Ada gpu with 48gb of vram. Only around 30gb of vram where used during the entire process. It was trained on 9000 steps of training which took 9h with SDXL model at 1024x1024px and around 2h with SD1.5 model at 512x512px.

The SD1.5 model can be trained with a consumer gpu with 8gb of vram. The SDXL can be trained on the same setup if using a 768x768 resolution.

Here are the settings used on Kohya_ss for this purpose:
Train batch size: 5
Epoch: 10
Mixed precision/Save precision: fp16
Max resolution: 1024,1024 (for SDXL) 512,512 (for SD 1.5)
Network Rank (Dimension): 32

The settings where inspired by usage of style-type training with the aim of not using a triggerword to make the Lora emerge in the image results. So a large step size was used to make this happen. (style based lora recommend a step size of around 6000 steps)

Every 2 steps an image with the same seed was generated with the prompt "A man and a woman at home" to monitor the process of evolution of this training.
The settings for this time lapse generation were:
A portrait of a man and a woman at home --d 1234 --w 1024 --h 1024

Here is a video showing the entire process of training.