Tensorrt stable diffusion reddit 66. From your base SD webui folder: (E:\Stable diffusion\SD\webui\ in your case). Please share your tips, tricks, and workflows for using this software to create your AI art. even without them, i feel this is game changer for comfyui users. 12 votes, 14 comments. Researchers discover that Stable Diffusion v1 uses internal representations of 3D geometry when generating an image. In this tutorial video I will show you As a Developer not specialized in this field it sounds like the current way was "easier" to implement and is faster to execute as the weights are right where they are needed and the processing does not need to search for them. This does result in faster generation speed but comes with a few downsides, such as having to lock in a resolution (or get diminishing returns for multi-resolutions) as well as the inability to switch Loras on the fly. Hi, i'm currently working on a llm rag application with speech recognition and tts. Then I think I just have to add calls to the relevant method(s) I make for ControlNet to StreamDiffusion in wrapper. It runs on Nvidia and AMD cards. TensorRT Extension for Stable Diffusion. The biggest being extra networks stopped working and nobody could convert models themselves. Convert Stable Diffusion with ControlNet for diffusers repo, significant speed improvement It's not as big as one might think because it didn't work - when I tried it a few days ago. AITemplate provides for faster inference, in this case a 2. Stable diffusion does not run too shabby in the first place so personally Ive not tried this however so as to maintain overall compatibility with all available Stable Diffusion rendering packages and extensions. I'm not sure what led to the recent flurry of interest in TensorRT. For using the refiner, choose it as the Stable Diffusion checkpoint, then proceed to build the engine as usual in the TensorRT tab. /r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and exclude blind users from the site. Then I tried to create SDXL-turbo with the same script with a simple mod to allow downloading sdxl-turbo from hugging face. Best way I see to use multiple LoRA as it is would be to: -Generate a lot of images that you like using LoRA with the exactly same value/weight on each image. git, J:\stable-diffusion-webui\extensions\stable-diffusion-webui-tensorrt\scripts, J:\stable-diffusion-webui\extensions\stable-diffusion-webui-tensorrt\__pycache__ Yes sir. Some of the most noticeable changes is significantly faster image generation through HyperTile integration. You still have to run any Lora's though its baking process. DeepCache was launched last week, which is called a novel training-free and almost lossless paradigm that accelerates diffusion models from the perspective of the model architecture. When using Kohya_ss I get the following warning every time I start creating a new LoRA right below the accelerate launch command. bat - this should rebuild the virtual environment venv /r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and exclude blind users from the site. Stable Diffusion Accelerated API, is a software designed to improve the speed of your SD models by up to 4x using TensorRT. After that it just works although it wasn't playing nicely with control net for me. 5 Performance from roughly 17it/s to 30+it/s :) HOWTO clean TensorRT Engine Profiles from "Unet-onnx" and "Unet-trt" Question - Help From your base SD webui folder: (E:\Stable diffusion\SD\webui\ in your case). Install the TensorRT fix FIX. Note: This is a real-time view, and will always show the most recent 100 log entries. 92 it/s using SD1. Updated it and loaded it up like normal using --medvram and my SDXL generations are only taking like 15 seconds. Stable Diffusion Gets A Major Boost With RTX Acceleration. For a little bit I thought that perhaps TRT didn't produced less quality than PYT because it was dealing with a 16 bit float. NET eco-system (github. The latest post introduced DeepCache, a novel training-free and almost lossless paradigm that accelerates diffusion models. compile, TensorRT and AITemplate in compilation time. Double Your Stable Diffusion Inference Speed with RTX Acceleration TensorRT: A Comprehensive /r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and exclude blind users from the site. I want to benchmark different cards and see the performance difference. Today I actually got VoltaML working with TensorRT and for a 512x512 image at 25 s Oct 24, 2023 · In today’s Game Ready Driver, NVIDIA added TensorRT acceleration for Stable Diffusion Web UI, which boosts GeForce RTX performance by up to 2X. NET application for stable diffusion, Leveraging OnnxStack, Amuse seamlessly integrates many StableDiffusion capabilities all within the . ControlNet the most advanced extension of Stable Diffusion on DEV branch UnboundLocalError: local variable 'img2img_tabs' referenced before assignment RESTART SERVER It's supposed to work on the A1111 dev branch. It is significantly faster than torch. Automatic1111 gives you a little summary of VRAM used for prior render in the bottom right. this The image quality this model can achieve when you go up to 20+ steps is astonishing. 5 and A1111 that all tensors ARE on the same device? I am using SD 1. 0 but when I go to add TensorRT I get "Processing" and the counter with no end in sight. Everything is as it is supposed to be in the UI, and I very obviously get a massive speedup when I switch to the appropriate generated "SD Unet". Once the engine is built, refresh the list of available engines. This means that when you run your models on NVIDIA GPUs, you can expect a significant boost. If you have the default option enabled and you run Stable Diffusion at close to maximum VRAM capacity, your model will start to get loaded into system RAM instead of GPU VRAM. Is TensorRT currently worth trying? Stable Diffusion Latent Consistency Model running in TouchDesigner with live camera feed. 0 and it was silent, so it looked stuck - but I think this really is stuck Their Olive demo doesn't even run on Linux. idx != sd_unet. Two reasons one tensorRT is a complete bag of wank and Nvidia should be ashamed for releasing it in the state that they have along with shitty documentation that doesn't work doesn't cover all edge cases and did I say it doesn't work because it doesn't work. Configuration: Stable Diffusion XL 1. You set TensorRT up on a per model bases. Microsoft Olive is another tool like TensorRT that also expects an ONNX model and runs optimizations, unlike TensorRT it is not nvidia specific and can also do optimization for other hardware. Its not implemented into A1111. I just installed SDXL and it works fine. It makes you generate a separate model per lora but is there really no… I just installed TensorRT and it works great, but it doesn’t seem to work with my hypernetworks. but switching loras requires recompile - like with pretty much any View community ranking In the Top 1% of largest communities on Reddit. TensorRT almost double speed Double Your Stable Diffusion Inference Speed with RTX Acceleration TensorRT: A Comprehensive Guide. By optimizing the inference pipeline, images render up to 2x faster. here is a very good GUI 1 click install app that lets you run Stable Diffusion and other AI models using optimized olive:Stackyard-AI/Amuse: . but you never need to use a compiler at all. profile_idx: AttributeError: 'NoneType' object has no attribute 'profile_idx' Hey I found something that worked for me go to your stable diffusion main folder then go to models then to Unet-trt (\stable-diffusion-webui\models\Unet-trt) and delete the loras you trained with trt for some reason the tab does not show up unless you delete the loras because the loras don't work after update for some reason! 12 votes, 25 comments. 0 base model; images resolution=1024×1024; Batch size=1; Euler scheduler for 50 steps; NVIDIA RTX 6000 Ada GPU. Hi all, I'm in the market for a new laptop, specifically for generative AI like Stable Diffusion. bat file, there's one in the folder already called webui-user. So I installed a second AUTOMATIC1111 version, just to try out the NVIDIA TensorRT speedup extension. Server takes an incoming frame, runs tensorrt accelerated pipeline to generate a new frame combining the original frame with the text prompt and sends it back as video stream to the frontend. py", line 302, in process_batch if self. I decided to try TensorRT extension and I am faced with multiple errors. But how much better? Asking as someone who wants to buy a gaming laptop (travelling so want something portable) with a video card (GPU or eGPU) to do some rendering, mostly to make large amounts of cartoons and generate idea starting points, train it partially on my own data, etc. Looked in: J:\stable-diffusion-webui\extensions\stable-diffusion-webui-tensorrt\. There is a guide on nvidia' site called tensorrt extension for stable diffusion web ui. In today’s Game Ready Driver, we’ve added TensorRT acceleration for Stable They are announcing official tensorRT support via an extension: GitHub - NVIDIA/Stable-Diffusion-WebUI-TensorRT: TensorRT Extension for Stable Diffusion Web UI. This will make things run SLOW. There are a lot of different ControlNet models that control the image in different ways, a lot of them only work with SD1. and showing that it supports all the existing models. Okay, ran several more batches to make sure I wasn't hallucinating. com Oct 17, 2023 · This guide explains how to install and use the TensorRT extension for Stable Diffusion Web UI, using as an example Automatic1111, the most popular Stable Diffusion distribution. Someone tried that a couple days ago and the improvement seemed to be even larger: https://www. Download custom SDXL Turbo model. We would like to show you a description here but the site won’t allow us. I'm not saying it's not viable, it's just too complicated currently. 1! They mentioned they'll share a recording next week, but in the meantime, you can see above for major features of the release, and our traditional YT runthrough video. This has been an exciting couple of months for AI! This thing only works for Linux from what I understand. 1: its not u/DeJMan product, he has nothing to do with the creation of touchdesigner, he is neither advertsing or promoting his product, its not his product. Not surprisingly TensorRT is the fastest way to run Stable Diffusion XL right now. Looking at a maxed out ThinkPad P1 Gen 6, and noticed the RTX 5000 Ada Generation Laptop GPU 16GB GDDR6 is twice as expensive as the RTX 4090 Laptop GPU 16GB GDDR6, even though the 4090 has much higher benchmarks everywhere I look. It's not going to bring anything more to the creative process. What I like to do is make a couple copies of that (or other) . A 0 won't change the image at all, and a 1 will replace it completely. For example: Phoenix SDXL Turbo. " Welcome to the unofficial ComfyUI subreddit. Decided to try it out this morning and doing a 6step to a 6step hi-res image resulted in almost a 50% increase in speed! Went from 34 secs for 5 image batch to 17 seconds! Introduction NeuroHub-A1111 is a fork of the original A1111, with built-in support for the Nvidia TensorRT plugin for SDXL models. CPU: 12th Gen Intel(R) Core(TM) i7-12700 2. stable-fast is (one of possible) compiler for backend used by sdnext. "Get tensorRT to and/or start using LCM" = Isn't lcm only useful for low cfgs and video diffusion? = TensorRT requires that you manually convert each model you have (but there's a lot) Me: "I should update my nvidia drivers, maybe I'll get an increase in-" = Ends up slowing down training = Image generation time stays the same/slows SDXL models run around 6gb and then you need room for loras, control net, etc and some working space as well as what the OS is using. 449 attempting to create a txt2img using a Normal Map created from a 512x512 PNG. 5X acceleration in inference with TensorRT. true. It has been trained on diverse datasets, including Grit and Midjourney scrape da /r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and exclude blind users from the site. I've read it can work on 6gb of Nvidia VRAM, but works best on 12 or more gb. Minimal: stable-fast works as a plugin framework for PyTorch. 5. I recently installed the TensorRT extention and it works perfectly,but I noticed that if I am using a Lora model with tensor enabled then the Lora model doesn't get loaded. In the extensions folder delete: stable-diffusion-webui-tensorrt folder if it exists Delete the venv folder Open a command prompt and navigate to the base SD webui folder Run webui. reddit. But A1111 often uses FP16 and I still get good images. com/r/MachineLearning/comments/xa75km/p_pytorchs_newest_nvfuser_on_stable_diffusion_to/ Reply reply See full list on github. The procedure entry point?destroyTensorDescriptorEx@ops@cudnn. It covers the install and tweaks you need to make, and has a little tab interface for compiling for specific parameters on your gpu. and Trained the Lora with the LCM Model in the TensorRT LoRA tab also. Other cards will generally not run it well, and will pass the process onto your CPU. Apparently DirectML requires DirectX and no instructions were provided for that assuming it is even… Our goal is building the best platform for Stable Diffusion. It takes around 10s on a 3080 to convert a lora. 5 TensorRT SD is while u get a bit of single image generation acceleration it hampers batch generations, Loras need to be baked into the model and it's not compatible with control net. Without TensorRT then the Lora model works as intended. 0 fine, but even after enabling various optimizations, my GUI still produces 512x512 images at less than 10 iterations per second. Should I just not use TensorRT or is there a fix for… Hadn't messed with A1111 in a bit and wanted to see if much had changed. It says it took 1min and 18 Seconds to do these 320 cat pics, but it took a bit of time afterwards to save them all to disk. Stable Diffusion 3 Medium TensorRT: /r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers The fact it works the first time but fails on the second makes me think there is something to improve, but I am definitely playing with the limit of my system (resolution around 1024x768 and other things in my workflow). Lets hope it will be at some point. 0\extensions\Stable-Diffusion-WebUI-TensorRT\scripts\trt. does it support sdxl? yes. https://github. 6. Use of TensorRT boosts it from 40+it/s to 60+it/s, btw. The TensorRT Extension git page says: . System monitor says Python is idle. Using a batch of 4. This extension enables the best performance on NVIDIA RTX GPUs for Stable Diffusion with TensorRT. In that case, this is what you need to do: Goto settings-tab, select "show all pages" and search for "Quicksettings" TLDR: How do I convince SD 1. You can set launch conditions in a . I installed it way back at the beginning of June, but due to the listed disadvantages and others (such as batch-size limits), I kind of gave up on it. I'm running this on… /r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and exclude blind users from the site. We're open again. py, the same way they are called for unet, vae, etc, for when "tensorrt" is the configured accelerator. 166 votes, 55 comments. There was no way, back when I tried it, to get it to work - on the dev branch, latest venv etc. 5x faster on RTX 3090 and 3x faster on A100. I remember the hype around tensor rt before. Pull/clone, install requirements, etc. 2-sec per image on 3090ti. To be fair with enough customization, I have setup workflows via templates that automated those very things! It's actually great once you have the process down and it helps you understand can't run this upscaler with this correction at the same time, you setup segmentation and SAM with Clip techniques to automask and give you options on autocorrected hands, but then you realize the The PhotoRoom team opened a PR on the diffusers repository to use the MemoryEfficientAttention from xformers. 7. Other GUI aside from A1111 don't seem to be rushing for it, thing is what's happened with 1. 5 Performance from roughly 17it/s to 30+it/s :) HOWTO clean TensorRT Engine Profiles from "Unet-onnx" and "Unet-trt" Question - Help The way it works is you go to the TensorRT tab, click TensorRT Lora and then select the lora you want to convert and then click convert. Fast: stable-fast is specialy optimized for HuggingFace Diffusers. And it provides a very fast compilation speed within only a few seconds. , or just use ComfyUI Manager to grab it. I've managed to install and run the official SD demo from tensorRT on my RTX 4090 machine. There are certain setups that can utilize non-nvidia cards more efficiently, but still at a severe speed reduction. ai. The last part after the “generation” but before the result shows up is the VAE, which is not optimized by TensorRT currently. bat. The extension doubles the performance of Stable Diffusion by leveraging the Tensor Cores in NVIDIA RTX GPUs. com) /r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and exclude blind users from the site. I'm a bit familiar with the automatic1111 code and it would be difficult to implement this there while supporting all the features so it's unlikely to happen unless someone puts a bunch of effort into it. Turbo isn't just distillation though, and the merges between the turbo version and the baseline XL strike a good middle ground imo; with those you can do @ 8 stpes what used to need like 25, so it's just fast enough that you can iterate interactively over your prompts with low-end hardware, and not sacrifice on prompt adherence. About 2-3 days ago there was a reddit post about "Stable Diffusion Accelerated" API which uses TensorRT. This is using the defacto standard 20/7/512x512/no LORAs/ no upscaling/etc No commanline args. As far as I know the models wont work with controlnet still. The rabbit hole is pretty darn deep. Exporting Photorealistic_realityvisionSDXL_v10 to TensorRT {'sample': [(1, 4, 64, 64), (2, 4, 128, 128), (8, 4, 256, 256)], 'timesteps': [(1,), (2,), (8,)], 'encoder Installed the new driver, installed the extension, getting: AssertionError: Was not able to find TensorRT directory. 5, Automatic1111, and ControlNet v1. My workflow is: 512x512, no additional networks / extensions, no hires fix, 20 steps, cfg 7, no refiner Double Stable Diffusion performance on Nvidia with TensorRT Tutorial - Guide I just found this by accident and following it using the generated unet i increased my SD1. Opt sdp attn is not going to be fastest for a 4080, use --xformers. This yields a 2x speed up on an A6000 with bare PyTorch ( no nvfuser, no TensorRT) and I created a TensorRT SD Unet Model for a batch of 16 @ 512X 512. Additionally, OneDiff has provided a new ComfyUI node named ModuleDeepCacheSpeedup(which is a compiled DeepCache Module), enabling SDXL iteration speed 3. compiling 1. Welcome to the unofficial ComfyUI subreddit. Not supported currently, TRT has to be specifically compiled for exactly what you're inferencing (so eg to use a LoRA you have to bake it into the model first, to use a controlnet you have to build a special controlnet-trt engine). 2: yes it works with the non commercial version of touchdesigner, the only limitation of non commercial is a 1280x1280 resolution, a few very specific nodes & the use of touchengine component in unreal engine or other applications. I recently completed a build with an RTX 3090 GPU, it runs A1111 Stable Diffusion 1. It's rather hard to prompt for that kind of quality, though. The stick figure one you're talking about is the OpenPose model, which detects the pose of your ControlNet input, and produces that pose in the result. What to do there now and which engine do I have to build for TensorRT? I tried to build an engine with 768*768 and also 256*256. The benchmark for TensorRT FP8 may change upon release. dll. Theres a new segmoe method (mixture of experts for stable diffusion) that needs 24gb vram to load depending on config Reply reply Putrid_Army_6853 We had a great time with Stability on the Stable Stage today running through 3. Please keep posted images SFW. It sounds like you haven't chosen a TensorRT-Engine/Unet. This fork is intended primarily for those who want to use Nvidia TensorRT technology for SDXL models, as well as be able to install the A1111 in 1-click. Interesting to follow if compiled torch will catch up with TensorRT. CPU is self explanatory, you want that for most setups since Stable Diffusion is primarily NVIDIA based. TensorRT INT8 quantization is available now, with FP8 expected soon. On NVIDIA A100 GPU, we're getting upto 2. It never went anywhere. I run on Windows. Hi all. File "[filepath]\stable-diffusion-webui-1. In automatic1111 AnimateDiff and TensorRT work fine on their own, but when I turn them both on, I get the following error: ValueError: No valid… /r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and exclude blind users from the site. Paper: "Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model". This ability emerged during the training phase of the AI, and was not programmed by people. TensorRT/Olive/DirectML requires some adjustments to the diffusion pipelines maintained by diffusion gurus to offer complete This gives you a realtime view of the activities of the diffusion engine, which inclues all activities of Stable Diffusion itself, as well as any necessary downloads or longer-running processes like TensorRT engine builds. Posted by u/Warkratos - 15 votes and 9 comments I don't see anything anywhere about running multiple loras at once with it. Frontend sends audio and video stream to server via webrtc. The speed difference for a single end user really isn't that incredible. It basically "rebuilds" the model to make best use of Tensor cores. But you can try TensorRT in chaiNNer for upscaling by installing ONNX in that, and nvidia's TensorRT for windows package, then enable rtx in the chaiNNer settings for ONNX execution after reloading the program so it can detect it. 22K subscribers in the sdforall community. If it were bringing generation speeds from over a minute to something manageable, end users could rejoice and be more empowered. Then in the Tiled Diffusion area I can set the width and height between 0-256 (I tried 256 because of TensorRT?!) and in the Tiled VAE area I can set the size to 768 for example (for TensorRT) but its not working. Next, select the base model for the Stable Diffusion checkpoint and the Unet profile for your base model. I remember TensorRT took several minutes to install on 1. could not be located in the dynamic link library C:\Users\Admin\stable-diffusion-webui\venv\Lib\site-packages\nvidia\cudnn\bin\cudnn_adv_infer64_8. 0 GBGPU: MSI RTX 3060 12GB Hi guys, I'm facing very bad performance with Stable Diffusion (through Automatic1111). Here's mine: Card: 2070 8gb Sampling method: k_euler_a… Posted this on the main SD reddit, but very little reaction there, so :) So I installed a second AUTOMATIC1111 version, just to try out the NVIDIA TensorRT speedup extension. 5, RTX 4090 Suprim, Ryzen 9 5950X, 32gb of ram, Automatic1111, and TensorRT. The A1111 extension for TensorRT will do all the checkpoint conversion work for you, once you specify the resolutions and batch sizes you need. No, it was distilled (compressed) and further trained. profile_idx: AttributeError: 'NoneType' object has no attribute 'profile_idx' 16 votes, 45 comments. It achieves a high performance across many libraries. 5 models takes 5-10m and the generation speed is so much faster afterwards that it really becomes "cheap" to use more steps. com/NVIDIA/Stable-Diffusion-WebUI-TensorRT. The problem is, it is too slow. One of the most common ways to use Stable Diffusion, the popular Generative AI tool that allows users to produce images from simple text descriptions, is through the Stable Diffusion Web UI by Automatic1111. Convert this model to TRT format into your A1111 (TensorRT tab - default preset) The fix was that I had too many tensor models since I would make a new one every time I wanted to make images with different sets of negative prompts (each negative prompt adds a lot to the total token count which requires a high token count for a tensor model). Looking again, I am thinking I can add ControlNet to the TensorRT engine build just like the vae and unet models are here. We at voltaML (an inference acceleration library) are testing some stable diffusion acceleration methods and we're getting some decent results. bat files and set different launch parameters in each one for different things. Things DEFINITELY work with SD1. current_unet. Hello fellas. LLMs became 10 times faster with recent architectures (Exllama), RVC became 40 times faster with its latest update, and now Stable Diffusion could be twice faster. Make sure you aren't mistakenly using slow compatibility modes like --no-half, --no-half-vae, --precision-full, --medvram etc (in fact remove all commandline args other than --xformers), these are all going to slow you down because they are intended for old gpus which are incapable of half precision. There is a way to train a model on the starting noise and end output of a model (basically AI-inception) and this makes things crazy fast. Install the TensorRT plugin TensorRT for A1111. Around 0. After Detailer to improve faces Become A Master Of SDXL Training With Kohya SS LoRAs - Combine Power Of Automatic1111 & SDXL LoRAs. After that, enable the refiner in the usual /r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and exclude blind users from the site. I was thinking that it might make more sense to manually load the sdxl-turbo-tensorrt model published by stability. 1. They already have an implementation for Stable Diffusion and I'm looking forward to it being added to our favorite implementations. think of tensorrt? well, i like stable-fast more - easier to use, it compiles in 1/10th of the time and gets same or better results. Nice. 4x speed up. I checked it out because I'm planning on maybe adding TensorRT to my own SD UI eventually unless something better comes out in the meantime. TensorRT compiling is not working, when I had a look at the code it seemed like too much work. 10 GHzMEM: 64. File "C:\Stable Diffusion\stable-diffusion-webui\extensions\Stable-Diffusion-WebUI-TensorRT\scripts\trt. From a comment on Stable Diffusion subreddit: "it takes about 4-10 minutes per model, per resolution, per batch size to set up, requires a 2GB file for every model/resolution/batch size combination, and only works for resolutions between 512 and 768. If you disable the CUDA sysmem fallback it won't happen anymore BUT your Stable Diffusion program might crash if you exceed memory limits. (Only the Unet is optimized in the Extension for now) So I would guess it’s something to do with having to send the data from the TensorRT engine back to wherever the VAE is. If you want to see how these models perform first hand, check out the Fast SDXL playground which offers one of the most optimized SDXL implementations available. A subreddit about Stable Diffusion. NVIDIA TensorRT allows you to optimize how you run an AI model for your specific NVIDIA RTX GPU If you don't have TensorRT installed, the first thing to do is update your ComfyUI and get your latest graphics drivers, then go to the Official Git Page. There's tons of caveats to using the system. I've tried a brand new install of Auto1111 1. I've now also added SadTalker for tts talking avatars. . Essentially with TensorRT you have: PyTorch model -> ONNX Model -> TensortRT optimized model Edit: I have not tried setting up x-stable-diffusion here, I'm waiting on automatic1111 hopefully including it. Hi, Can the LCM model be converted to run on TensorRT? Been using TRT for some days now and the 30% speed increase is nice (hope for lora, cnet, etc…) support soon tho. For the end user like you or me, it's cumbersome and unweildy. I got my Unet TRT code for Stream Diffusion i/o working 100% finally though (holy shit that took a serious bit of concentration) and now I have a generalized process for TensorRT acceleration of all/most Stable Diffusion diffusers pipelines. Is this an issue on my end or is it just an issue with TensorRT? TensorRT is tech that makes more sense for wide scale deployement of services. I ran it for an hour before giving up. "The Segmind Stable Diffusion Model (SSD-1B) is a distilled 50% smaller version of the Stable Diffusion XL (SDXL), offering a 60% speedup while maintaining high-quality text-to-image generation capabilities. There is also "distilled Diffusion" that will make it 256 TIMES faster. Yes, you can use whatever model you want when running img2img; the trick is how much you denoise. wpm oot qogyvtd uup mwfycu momglrp hoxk tailswd rpeoofzr usoe