Skip to content

Commit e0d6c95

Browse files
linoytsabanpcuencamerveenoyanosanseviero
authored
improve image-to-image task page (#867)
some changes to improve clarity of task description, and general updates to improve task page --------- Co-authored-by: Pedro Cuenca <[email protected]> Co-authored-by: Merve Noyan <[email protected]> Co-authored-by: Omar Sanseviero <[email protected]>
1 parent 1701fac commit e0d6c95

File tree

2 files changed

+71
-22
lines changed

2 files changed

+71
-22
lines changed

packages/tasks/src/tasks/image-to-image/about.md

Lines changed: 70 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,10 @@
1-
## Use Cases
2-
3-
### Style transfer
1+
Image-to-image pipelines can also be used in text-to-image tasks, to provide visual guidance to the text-guided generation process.
42

5-
One of the most popular use cases of image-to-image is style transfer. Style transfer models can convert a normal photography into a painting in the style of a famous painter.
6-
7-
## Task Variants
3+
## Use Cases
84

95
### Image inpainting
106

11-
Image inpainting is widely used during photography editing to remove unwanted objects, such as poles, wires, or sensor
12-
dust.
7+
Image inpainting is widely used during photography editing to remove unwanted objects, such as poles, wires, or sensor dust.
138

149
### Image colorization
1510

@@ -24,18 +19,27 @@ Super-resolution models increase the resolution of an image, allowing for higher
2419
You can use pipelines for image-to-image in 🧨diffusers library to easily use image-to-image models. See an example for `StableDiffusionImg2ImgPipeline` below.
2520

2621
```python
27-
from PIL import Image
28-
from diffusers import StableDiffusionImg2ImgPipeline
22+
import torch
23+
from diffusers import AutoPipelineForImage2Image
24+
from diffusers.utils import make_image_grid, load_image
2925

30-
model_id_or_path = "runwayml/stable-diffusion-v1-5"
31-
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
32-
pipe = pipe.to(cuda)
26+
pipeline = AutoPipelineForImage2Image.from_pretrained(
27+
"stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
28+
)
3329

34-
init_image = Image.open("mountains_image.jpeg").convert("RGB").resize((768, 512))
35-
prompt = "A fantasy landscape, trending on artstation"
30+
# this helps us to reduce memory usage- since SDXL is a bit heavy, this could help by
31+
# offloading the model to CPU w/o hurting performance.
32+
pipeline.enable_model_cpu_offload()
3633

37-
images = pipe(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5).images
38-
images[0].save("fantasy_landscape.png")
34+
# prepare image
35+
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-sdxl-init.png"
36+
init_image = load_image(url)
37+
38+
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
39+
40+
# pass prompt and image to pipeline
41+
image = pipeline(prompt, image=init_image, strength=0.5).images[0]
42+
make_image_grid([init_image, image], rows=1, cols=2)
3943
```
4044

4145
You can use [huggingface.js](https://github.com/huggingface/huggingface.js) to infer image-to-image models on Hugging Face Hub.
@@ -53,13 +57,53 @@ await inference.imageToImage({
5357
});
5458
```
5559

56-
## ControlNet
60+
## Uses Cases for Text Guided Image Generation
5761

58-
Controlling the outputs of diffusion models only with a text prompt is a challenging problem. ControlNet is a neural network model that provides image-based control to diffusion models. Control images can be edges or other landmarks extracted from a source image.
62+
### Style Transfer
63+
64+
One of the most popular use cases of image-to-image is style transfer. With style transfer models:
5965

60-
Many ControlNet models were trained in our community event, JAX Diffusers sprint. You can see the full list of the ControlNet models available [here](https://huggingface.co/spaces/jax-diffusers-event/leaderboard).
66+
- a regular photo can be transformed into a variety of artistic styles or genres, such as a watercolor painting, a comic book illustration and more.
67+
- new images can be generated using a text prompt, in the style of a reference input image.
68+
69+
See 🧨diffusers example for style transfer with `AutoPipelineForText2Image` below.
70+
71+
```python
72+
from diffusers import AutoPipelineForText2Image
73+
from diffusers.utils import load_image
74+
import torch
75+
76+
# load pipeline
77+
pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
78+
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter_sdxl.bin")
79+
80+
# set the adapter and scales - this is a component that lets us add the style control from an image to the text-to-image model
81+
scale = {
82+
"down": {"block_2": [0.0, 1.0]},
83+
"up": {"block_0": [0.0, 1.0, 0.0]},
84+
}
85+
pipeline.set_ip_adapter_scale(scale)
86+
87+
style_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg")
88+
89+
generator = torch.Generator(device="cpu").manual_seed(26)
90+
image = pipeline(
91+
prompt="a cat, masterpiece, best quality, high quality",
92+
ip_adapter_image=style_image,
93+
negative_prompt="text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
94+
guidance_scale=5,
95+
num_inference_steps=30,
96+
generator=generator,
97+
).images[0]
98+
image
99+
```
100+
101+
### ControlNet
102+
103+
Controlling the outputs of diffusion models only with a text prompt is a challenging problem. ControlNet is a neural network model that provides image-based control to diffusion models. Control images can be edges or other landmarks extracted from a source image.
104+
![Examples](https://huggingface.co/datasets/optimum/documentation-images/resolve/main/neuron/models/12-sdxl-text2img-controlnet.png)
61105

62-
## Most Used Model for the Task
106+
## Pix2Pix
63107

64108
Pix2Pix is a popular model used for image-to-image translation tasks. It is based on a conditional-GAN (generative adversarial network) where instead of a noise vector a 2D image is given as input. More information about Pix2Pix can be retrieved from this [link](https://phillipi.github.io/pix2pix/) where the associated paper and the GitHub repository can be found.
65109

@@ -70,8 +114,13 @@ The images below show some examples extracted from the Pix2Pix paper. This model
70114
## Useful Resources
71115

72116
- [Image-to-image guide with diffusers](https://huggingface.co/docs/diffusers/using-diffusers/img2img)
117+
- Image inpainting: [inpainting with 🧨diffusers](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/inpaint), [demo](https://huggingface.co/spaces/diffusers/stable-diffusion-xl-inpainting)
118+
- Colorization: [demo](https://huggingface.co/spaces/modelscope/old_photo_restoration)
119+
- Super resolution: [image upscaling with 🧨diffusers](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/upscale#super-resolution), [demo](https://huggingface.co/spaces/radames/Enhance-This-HiDiffusion-SDXL)
120+
- [Style transfer and layout control with diffusers 🧨](https://huggingface.co/docs/diffusers/main/en/using-diffusers/ip_adapter#style--layout-control)
73121
- [Train your ControlNet with diffusers 🧨](https://huggingface.co/blog/train-your-controlnet)
74122
- [Ultra fast ControlNet with 🧨 Diffusers](https://huggingface.co/blog/controlnet)
123+
- [List of ControlNets trained in the community JAX Diffusers sprint](https://huggingface.co/spaces/jax-diffusers-event/leaderboard)
75124

76125
## References
77126

packages/tasks/src/tasks/image-to-image/data.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ const taskData: TaskDataCustom = {
9393
},
9494
],
9595
summary:
96-
"Image-to-image is the task of transforming a source image to match the characteristics of a target image or a target image domain. Any image manipulation and enhancement is possible with image to image models.",
96+
"Image-to-image is the task of transforming an input image through a variety of possible manipulations and enhancements, such as super-resolution, image inpainting, colorization, and more.",
9797
widgetModels: ["lllyasviel/sd-controlnet-canny"],
9898
youtubeId: "",
9999
};

0 commit comments

Comments
 (0)