cmmd-pytorch

(Unofficial) PyTorch implementation of CLIP Maximum Mean Discrepancy (CMMD) for evaluating image generation models, proposed in Rethinking FID: Towards a Better Evaluation Metric for Image Generation. CMMD stands out to be a better metric than FID and tries to mitigate the longstanding issues of FID.

This implementation is a super simple PyTorch port of the original codebase. I have only focused on the JAX and TensorFlow specific bits and replaced them PyTorch. Some differences:

The original codebase relies on scenic for computing CLIP embeddings. This repository uses transformers.
For the data loading, the original codebase uses TensorFlow, this one uses PyTorch Dataset and DataLoader.

Setup

First, install PyTorch following instructions from the official website.

Then install the depdencies:

pip install -r requirements.txt

Running

python main.py /path/to/reference/images /path/to/eval/images --batch_size=32 --max_count=30000

A working example command:

python main.py reference_images generated_images --batch_size=1

It should output:

The CMMD value is:  7.696

This is the same as the original codebase, so, that confirms the implementation correctness 🤗

Tip

GPU execution is supported when a GPU is available.

Results

Below, we report the CMMD metric for some popular pipelines on the COCO-30k dataset, as commonly used by the community. CMMD, like FID, is better when it's lower.

Pipeline	Inference Steps	Resolution	CMMD
`stabilityai/stable-diffusion-xl-base-1.0`	30	1024x1024	0.696
`segmind/SSD-1B`	30	1024x1024	0.669
`stabilityai/sdxl-turbo`	1	512x512	0.548
`runwayml/stable-diffusion-v1-5`	50	512x512	0.582
`PixArt-alpha/PixArt-XL-2-1024-MS`	20	1024x1024	1.140
`SPRIGHT-T2I/spright-t2i-sd2`	50	768x768	0.512

Notes:

For SDXL Turbo, guidance_scale is set to 0 following the official guide in diffusers.
For all other pipelines, default guidace_scale was used. Refer to the official pipeline documentation pages here for more details.

Caution

As per the CMMD authors, with models producing high-quality/high-resolution images, COCO images don't seem to be a good reference set (they are of pretty small resolution). This might help explain why SD v1.5 has a better CMMD than SDXL.

Obtaining CMMD for your pipelines

One can refer to the generate_images.py script that generates images from the COCO-30k randomly sampled captions using diffusers.

Once the images are generated, run:

python main.py /path/to/reference/images /path/to/generated/images --batch_size=32 --max_count=30000

Reference images are COCO-30k images and can be downloaded from here.

Pre-computed embeddings for the COCO-30k images can be found here.

To use the pre-computed reference embeddings, run:

python main.py None /path/to/generated/images ref_embed_file=ref_embs.npy --batch_size=32 --max_count=30000

Acknowledgements

Thanks to Sadeep Jayasumana (first author of CMMD) for all the helpful discussions.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
generated_images		generated_images
reference_images		reference_images
LICENSE		LICENSE
README.md		README.md
distance.py		distance.py
embedding.py		embedding.py
generate_images.py		generate_images.py
io_util.py		io_util.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generated_images

generated_images

reference_images

reference_images

LICENSE

LICENSE

README.md

README.md

distance.py

distance.py

embedding.py

embedding.py

generate_images.py

generate_images.py

io_util.py

io_util.py

main.py

main.py

requirements.txt

requirements.txt

Repository files navigation

cmmd-pytorch

Setup

Running

Results

Obtaining CMMD for your pipelines

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

License

sayakpaul/cmmd-pytorch

Folders and files

Latest commit

History

Repository files navigation

cmmd-pytorch

Setup

Running

Results

Obtaining CMMD for your pipelines

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Languages