Image model fine-tuning
Fine-tune FLUX and Stable Diffusion XL on your own images.
Soramai fine-tunes LoRA adapters for FLUX and SDXL on your image and caption pairs. Style LoRAs, character LoRAs, product LoRAs — all on managed GPU pods with autoscaling inference.
Supported base models
The catalog is curated for image quality and LoRA stability. New checkpoints are added once they pass internal evaluations.
FLUX.1 dev
High quality, 12B params, default for product photography and characters.
FLUX.1 schnell
Faster, lower cost. Good for style and concept LoRAs.
Stable Diffusion XL
Battle-tested base. Strong ecosystem and refiner support.
SDXL Turbo
Low-step inference. Useful when latency matters more than fidelity.
What you get
Every image fine-tuning run is wired to dataset validation, sample previews, retry, and one-click deployment.
Drop a ZIP
Upload a ZIP of paired image and caption files. Soramai validates pairing, resolution, and aspect ratios before the run starts.
Caption assist
If you skip captions, Soramai can auto-caption the dataset using a vision model. Captions stay editable before fine-tuning.
LoRA out of the box
PEFT LoRA on the U-Net (and optionally text encoders). Rank, alpha, and resolution buckets are configurable per job.
Live samples
Configurable sample prompts run at fixed step intervals. Watch the style converge in the dashboard while the job is still running.
FLUX or SDXL endpoints
Promoted adapters run on serverless inference endpoints with autoscaling. Generate from the playground or the Deploy API.
Per-second billing
Fine-tuning is billed per second. There are no minimums and no charges while the endpoint is idle.
How a run flows
From upload to deployed endpoint in four steps. No GPU setup, no Diffusers boilerplate.
01
Prepare your images
Aim for 15 – 60 images for a style LoRA, 8 – 20 for a character LoRA. PNG or JPG, 512×512 or larger. Captions are optional.
02
Upload
Drop a ZIP into the dashboard. Soramai validates pairing, resolution, and aspect ratios before queueing the job.
03
Pick a base and start
Choose FLUX or SDXL, set steps and learning rate (or take the defaults), and confirm the cost estimate.
04
Generate from the playground
Watch sample images appear at fixed step intervals. Promote the adapter to a live endpoint when the style is right.
Dataset layout
A simple ZIP convention. One image, one optional caption file with the same base name.
dataset.zip ├── 01.png ├── 01.txt # caption (optional) ├── 02.png ├── 02.txt ├── 03.jpg └── 03.txt
Captions are plain text. Soramai supports trigger tokens — see the dataset reference.