{
  "retrieved_examples": [
    "2601.15892v2",
    "2601.15165v2",
    "2601.09708v1",
    "2601.07055v1",
    "2601.07033v1",
    "2601.06411v1",
    "2601.05144v1",
    "2601.05110v1",
    "2601.03570v1",
    "2404.15806v1"
  ],
  "initial_description": "### Figure Description for Denoising Diffusion Probabilistic Model (DDPM) Architecture\n\n**Overall Layout**:  \nThe diagram is structured in a horizontal left-to-right flow, divided into three main sections: the Forward Diffusion Stage on the left, the U-Net Backbone in the center, and the Reverse Denoising Stage on the right. Each section is clearly delineated with dashed borders to visually separate the different processes.\n\n**Components**:  \n1. **Forward Diffusion Process**:\n   - **Label**: \"Forward Diffusion\"\n   - **Box**: A rectangular box with rounded corners, filled with a soft pastel blue.\n   - **Input**: Inside the box, label \"Real Image \\( x_0 \\)\" at the top, with an arrow pointing to the output.\n   - **Output**: At the bottom, label \"Noisy Image \\( x_T \\)\" using a bold font.\n   - **Operation**: Below the output, include the equation: \\( x_t = \\sqrt{\\alpha_t} \\cdot x_0 + \\sqrt{1 - \\alpha_t} \\cdot \\epsilon \\) in a smaller italic font.\n\n2. **U-Net Architecture**:\n   - **Label**: \"U-Net Architecture\"\n   - **Box**: A larger rectangular box with a light green background, containing sub-components.\n   - **Sections**:\n     - **Encoder Path** (left side): \n       - Label \"Encoder Path\" at the top, with small boxes representing convolution blocks (filled with light lavender) and downsampling layers (filled with light yellow).\n       - Use arrows to indicate the flow from convolution blocks to downsampling layers, converging towards the bottleneck.\n     - **Bottleneck Layer**: \n       - Label \"Bottleneck\" as a distinctive shape (like an hourglass) in the center.\n     - **Decoder Path** (right side):\n       - Label \"Decoder Path\" at the bottom of this section, with upsampling layers (filled with light coral).\n       - Arrows should show the flow from the bottleneck to the decoder path.\n       - **Skip Connections**: Dashed lines should connect corresponding encoder and decoder components, labeled \u201cSkip Connections\u201d.\n\n3. **Reverse Denoising Process**:\n   - **Label**: \"Reverse Denoising\"\n   - **Box**: A rectangular box with a soft pastel yellow background.\n   - **Input**: At the top, label \"Noisy Image \\( x_T \\)\".\n   - **Output**: At the bottom, label \"Clean Image\" in bold.\n   - **Operation**: Below the output, include the operation: \"Neural network predicts noise \\( \\epsilon_{\\theta}(x_t, t) \\)\" in a smaller font.\n\n4. **Time Embedding**:\n   - **Label**: \"Time Embedding\"\n   - **Box**: A small rectangular box in a pastel pink color located above the U-Net, connected with arrows to each block in the U-Net.\n   - **Input**: Label \"Timestep \\( t \\)\" inside the box.\n   - **Output**: Label \"Sinusoidal Embedding\" beneath the box, indicating its integration into U-Net blocks.\n\n5. **Training Objective**:\n   - **Label**: \"Training Objective\"\n   - **Box**: A rectangular box with a light peach background, placed at the bottom right of the U-Net section.\n   - **Operation**: Include the text \"Minimize Mean Squared Error (MSE)\" and the equation: \"MSE = \\(\\text{MSE}(\\epsilon_{\\theta}(x_t, t), \\text{True noise})\\)\" in small font.\n\n**Connections**:  \n- Use solid arrows to indicate the flow of data:\n  - From \\( x_0 \\) to \\( x_T \\) in the Forward Diffusion Process.\n  - From \\( x_T \\) to Clean Image in the Reverse Denoising Process.\n  - Arrows should also indicate how the output of the encoder feeds into the decoder via skip connections, and how the timestep \\( t \\) influences the U-Net.\n\n**Groupings**:  \n- The Forward Diffusion Stage, U-Net Backbone, and Reverse Denoising Stage should be grouped with dashed borders to emphasize their separation within the overall architecture.\n\n**Labels and Annotations**:  \n- Provide annotations for key operations and data flows, including the equations and explanations of each section.\n\n**Input/Output**:  \n- Clearly label inputs and outputs at respective sections:\n  - Input to the system: \"Real Image \\( x_0 \\)\" and \"Timestep \\( t \\)\".\n  - Output from the system: \"Clean Image\" and \"Loss Value\".\n\n**Styling**:  \n- The background of the entire diagram should be pure white to enhance clarity.\n- Lines should be of medium thickness, with arrows clearly pointing in the direction of data flow.\n- Use a consistent font style throughout for labels, with main labels in bold and equations in italic to differentiate them.\n\nThis detailed description should guide the creation of an illustrative figure that accurately represents the Denoising Diffusion Probabilistic Model architecture, capturing the necessary components, their interactions, and the overall flow of processes.",
  "optimized_description": "### Figure Description for Denoising Diffusion Probabilistic Model (DDPM) Architecture\n\n**Overall Layout**:  \nThe diagram is structured in a horizontal left-to-right flow, divided into three main sections: the Forward Diffusion Stage on the left, the U-Net Backbone in the center, and the Reverse Denoising Stage on the right. Each section is clearly delineated with soft dashed borders in a muted pastel color to visually separate the different processes.\n\n**Components**:  \n1. **Forward Diffusion Process**:\n   - **Label**: \"Forward Diffusion\"\n   - **Box**: A rounded rectangle with a soft pastel blue fill and a slightly darker blue border.\n   - **Input**: Inside the box, label \"Real Image \\( x_0 \\)\" at the top in bold sans-serif text, with an arrow pointing to the output.\n   - **Output**: At the bottom, label \"Noisy Image \\( x_T \\)\" using a bold font.\n   - **Operation**: Below the output, include the equation: \\( x_t = \\sqrt{\\alpha_t} \\cdot x_0 + \\sqrt{1 - \\alpha_t} \\cdot \\epsilon \\) in a smaller italic serif font.\n\n2. **U-Net Architecture**:\n   - **Label**: \"U-Net Architecture\"\n   - **Box**: A larger rounded rectangle with a light sage green background, containing sub-components.\n   - **Sections**:\n     - **Encoder Path** (left side): \n       - Label \"Encoder Path\" at the top, with small rounded rectangles representing convolution blocks (filled with light lavender) and downsampling layers (filled with warm peach).\n       - Use soft arrows to indicate the flow from convolution blocks to downsampling layers, converging towards the bottleneck.\n     - **Bottleneck Layer**: \n       - Label \"Bottleneck\" as an hourglass shape in a slightly darker sage green.\n     - **Decoder Path** (right side):\n       - Label \"Decoder Path\" at the bottom of this section, with upsampling layers (filled with light coral).\n       - Soft arrows should show the flow from the bottleneck to the decoder path.\n       - **Skip Connections**: Dashed lines in a soft grey should connect corresponding encoder and decoder components, labeled \u201cSkip Connections\u201d.\n\n3. **Reverse Denoising Process**:\n   - **Label**: \"Reverse Denoising\"\n   - **Box**: A rounded rectangle with a soft pastel yellow background and a slightly darker yellow border.\n   - **Input**: At the top, label \"Noisy Image \\( x_T \\)\".\n   - **Output**: At the bottom, label \"Clean Image\" in bold sans-serif text.\n   - **Operation**: Below the output, include the operation: \"Neural network predicts noise \\( \\epsilon_{\\theta}(x_t, t) \\)\" in a smaller italic serif font.\n\n4. **Time Embedding**:\n   - **Label**: \"Time Embedding\"\n   - **Box**: A small rounded rectangle in a soft pastel pink color located above the U-Net, connected with soft arrows to each block in the U-Net.\n   - **Input**: Label \"Timestep \\( t \\)\" inside the box in bold sans-serif text.\n   - **Output**: Label \"Sinusoidal Embedding\" beneath the box, indicating its integration into U-Net blocks in a smaller italic serif font.\n\n5. **Training Objective**:\n   - **Label**: \"Training Objective\"\n   - **Box**: A rounded rectangle with a light peach background, placed at the bottom right of the U-Net section.\n   - **Operation**: Include the text \"Minimize Mean Squared Error (MSE)\" in bold sans-serif, and the equation: \"MSE = \\(\\text{MSE}(\\epsilon_{\\theta}(x_t, t), \\text{True noise})\\)\" in a smaller italic serif font.\n\n**Connections**:  \n- Use soft solid arrows to indicate the flow of data:\n  - From \\( x_0 \\) to \\( x_T \\) in the Forward Diffusion Process.\n  - From \\( x_T \\) to Clean Image in the Reverse Denoising Process.\n  - Arrows should also indicate how the output of the encoder feeds into the decoder via skip connections, and how the timestep \\( t \\) influences the U-Net.\n\n**Groupings**:  \n- The Forward Diffusion Stage, U-Net Backbone, and Reverse Denoising Stage should be grouped with soft dashed borders to emphasize their separation within the overall architecture.\n\n**Labels and Annotations**:  \n- Provide annotations for key operations and data flows, including the equations and explanations of each section in a consistent font style.\n\n**Input/Output**:  \n- Clearly label inputs and outputs at respective sections:\n  - Input to the system: \"Real Image \\( x_0 \\)\" and \"Timestep \\( t \\)\".\n  - Output from the system: \"Clean Image\" and \"Loss Value\".\n\n**Styling**:  \n- The background of the entire diagram should be pure white to enhance clarity.\n- Lines should be of medium thickness, with soft arrows clearly pointing in the direction of data flow.\n- Use a consistent font style throughout for labels, with main labels in bold sans-serif and equations in italic serif to differentiate them."
}