sdxl: add Neuron 2K/4K high-res via img2img upscale (57.94s/2K, 142.62s/4K)#1
Open
jimburtoft wants to merge 1 commit into
Open
sdxl: add Neuron 2K/4K high-res via img2img upscale (57.94s/2K, 142.62s/4K)#1jimburtoft wants to merge 1 commit into
jimburtoft wants to merge 1 commit into
Conversation
…2s/4K) Adds a working approach for SDXL high-resolution generation on Neuron that bypasses the monolithic compilation blockers (host RAM overflow at 2K, instruction count limit at 4K). Approach: Generate at 1024x1024 -> upscale -> tiled VAE encode -> partial noise (strength=0.35) -> tiled denoise refinement (18 steps) -> tiled VAE decode. Uses existing 1K compiled NEFFs with no recompilation needed. Results: - 2048x2048: 57.94s +/- 0.02s (10/10 seeds pass) - 4096x4096: 142.62s +/- 0.01s (3/3 seeds pass) Neuron beats L4 FP8+compile at both resolutions: - 2K: 1.29x faster (57.94s vs 74.85s) - 4K: 3.86x faster (142.62s vs 550.21s)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a working approach for SDXL high-resolution generation on Neuron that bypasses the monolithic compilation blockers (host RAM overflow at 2K, instruction count limit at 4K).
Approach
Generate at 1024x1024 (proven compiled NEFFs) -> bicubic upscale -> tiled VAE encode -> add partial noise (strength=0.35, 18/50 steps) -> tiled denoise refinement -> tiled VAE decode.
This is the standard SDXL high-res workflow (same as SDXL Refiner pattern). The 1K generation establishes global coherence; tiled refinement only adds local high-frequency detail.
Files Added
sdxl-benchmark/highres_img2img/benchmark_img2img.py-- Self-contained script (compile + run + benchmark modes)sdxl-benchmark/highres_img2img/README.md-- Approach explanation, results, usagesdxl-benchmark/highres_img2img/results.json-- Structured benchmark datasdxl-benchmark/highres_img2img/results_2048/seed42.png-- Sample 2K outputsdxl-benchmark/highres_img2img/results_4096/seed42.png-- Sample 4K outputREADME.en.md Updates
Key Technical Notes
scale_model_input()is critical for EulerDiscreteScheduler correctness