Introduction: The Evolution of Stable Diffusion
Stable Diffusion has seen several advancements, from the initial SD1.x releases to the more recent SDXL and SD3. However, the Flux models—such as Flux Dev, Schnell, and BNB NF4—have emerged as a new contender that stands apart in both architecture and performance. These models introduce key innovations like handling longer prompts, producing sharper outputs (especially text), and significant hardware optimisations. However, they come with some specific dependencies and limitations, including why they work seamlessly in SD WebUI Forge (ComfyUI also, but this post is aimed more at Auto1111 fanatics like myself) but not in Automatic1111.
Key Features of Flux Models
- Longer Prompts & Contextual Understanding One of the primary enhancements in Flux models is their ability to handle longer and more complex prompts, especially in the text-to-image domain. Flux models use improved text encoders such as T5XXL, offering better semantic understanding of the prompt. This results in better adherence to detailed inputs like text, intricate objects, and precise scene settings. For instance, in traditional Stable Diffusion models, it was common for longer prompts to lose coherence leading to the dreaded Prompt Collapse. Flux models, in contrast, excel at interpreting and rendering such complexity into the image output.
- Improved Text Generation Another notable improvement is the accurate rendering of text in images. Traditional Stable Diffusion models often struggled to display readable and coherent text, frequently producing gibberish or distorted words. Flux models, however, employ enhanced VAE models and encoders, making it possible to achieve clean, legible text outputs in specific scenarios. This is a breakthrough for generating images that require embedded typography or signage, a major limitation in earlier Stable Diffusion iterations.
- Optimised for Low VRAM & Hardware The Flux models, particularly Flux Schnell, are optimised for running on lower-end GPUs, making high-quality generative AI accessible to more users. Variants like the FP8 (8-bit floating point precision) reduce the memory footprint while maintaining image fidelity. For example, the FP8 version of Flux Schnell can run on GPUs with 12GB VRAM or less, while Flux Dev’s FP16 variant requires more robust hardware, typically around 16GB VRAM.
Why Flux Models Don’t Run on Automatic1111
The fundamental reason why Flux models are incompatible with Automatic1111 comes down to their underlying architectural differences and the specific optimisations Flux introduces. Flux models rely on:
- Updated PyTorch and CUDA Libraries: These models use more recent versions of these libraries, which Automatic1111 doesn’t natively support without significant updates. Many users who tried running Flux on Automatic1111 experienced crashes or performance degradation due to this incompatibility.
- Custom Text Encoders & VAE Integration: Flux models come with their own custom VAE and text encoders, such as the T5XXL (FP8/FP16 variants), which are not compatible with Automatic1111’s workflow. These are integral to the Flux models’ ability to handle longer prompts and generate clearer text, features that Automatic1111’s current pipeline doesn’t fully support.
- No Negative Prompts: In Flux models, the concept of negative prompts is absent, making the traditional workflow in Automatic1111 (which relies heavily on prompt negatives for output refinement) incompatible.
Why SD WebUI Forge Supports Flux Models
SD WebUI Forge was designed to be more modular and adaptable compared to Automatic1111. While Forge wasn’t built specifically for Flux, the architecture it employs makes it more suited to handle these models.
- Support for Low-Bit Variants: Forge has integrated support for running low-bit precision models (like NF4, FP8) effectively, something that Automatic1111 does not manage as well. This makes Forge a natural fit for handling the compressed versions of Flux models, which were optimized for hardware with lower VRAM.
- Modular Text Encoder Loading: Forge allows users to load custom text encoders like the T5XXL, which is essential for the Flux models to function correctly. This modularity is one of the primary reasons why Forge supports Flux natively without additional updates.
- Recent PyTorch and CUDA Libraries: Forge has recently updated its PyTorch and CUDA libraries, which makes it compatible with the requirements of Flux models. This update provides the necessary backend optimisations to run these models efficiently.
Future of Flux Models in Stable Diffusion
The Flux model architecture is a promising direction in the evolution of generative AI. The fact that these models can handle intricate prompts, render accurate text, and work efficiently on lower-end hardware means they could be the future of diffusion models for a wider audience. However, the incompatibility with Automatic1111 may limit their accessibility unless significant updates are made to that platform.
For those willing to adapt to new workflows, SD WebUI Forge offers a robust environment for utilising the full potential of Flux models. Additionally, ComfyUI also offers a viable alternative for users who want to explore these models in different settings.
Conclusion
The Flux models represent a major leap forward in Stable Diffusion, offering improvements in prompt handling, text generation, and memory efficiency. While these models do not work with Automatic1111 due to pipeline differences and library requirements, SD WebUI Forge stands out as the preferred platform for users seeking to leverage the power of Flux.