Metric for Evaluating Stable Diffusion Models using Attention Map

 

氏名:房ハルノ

所属:岡山理科大学

概要:As stable diffusion models that perform high-quality image generation have attracted more attention, it has become important to evaluate their quality. The synthetic image metrics are categorized into two types of measures: the first type focuses on image quality, while the second type concentrates on text-image alignment. To evaluate text-image alignment, a new metric called the Text-Image Alignment Metric (TIAM) is proposed. This metric checks the alignment between the content specified in a prompt and the corresponding generated image, based on a prompt template. TIAM enables a more comprehensive evaluation of the alignment between text prompts and images in terms of the type, number, and color of the specified objects. To identify the specified objects, TIAM uses a pre-trained object detection and segmentation model YOLOv8. However, the pre-trained models cannot be used for classes or image styles that have not been pre-trained except for fine-tuning. For non-professional users, fine-tuning a model or preparing a training dataset is difficult. In this paper, we extend TIAM to support various classes and styles by utilizing the attention maps acquired during the image generation process and the language-vision model (e.g., BLIP2). The experimental results indicate that the proposed method allows us to evaluate diverse images without requiring additional steps, such as fine-tuning.

 

論文掲載,発表実績:

(国際会議会議録掲載論文)

  • Haruno Fusa, Chonho Lee, Sakuei Onishi, Hiromitsu Shiina, "Metric for Evaluating Stable Diffusion Models Using Attention Maps", Proc. of the International Conference on Foundation and Large Language Models (FLLM2024), pp. 535-541, 2024, November

 

(国内研究会等発表論文)

  • Haruno Fusa, Chonho Lee, Sakuei Onishi, Hiromitsu Shiina, "Text-to-Image モデルにおける多属性に対応したテンプレートベース評価手法", 人工知能学会JSAI2025大会

 




Posted : 2025年03月31日