{
  "retrieved_examples": [
    "2601.15165v2",
    "2601.15892v2",
    "2601.07055v1",
    "2601.07033v1",
    "2601.06953v2",
    "2601.05110v1",
    "2601.06411v1",
    "2601.14724v2",
    "2601.09708v1",
    "2404.15806v1"
  ],
  "initial_description": "## Illustrative Figure Description: \u201cTimeline of Vision Architectures (CNN \u2192 Residual CNN \u2192 Transformers)\u201d\n\n### 1) Overall layout\n- **Canvas/background**: clean **white** (or very light warm gray) with generous margins; minimalist academic style.\n- **Main structure**: a **single left-to-right timeline/flowchart** spanning the width of the figure.\n- **Top row**: five **stage panels** (Stage 1 \u2192 Stage 5) aligned horizontally, each panel containing one or more model blocks.\n- **Bottom row (optional but recommended for clarity)**: a thin strip showing a repeated **Input \u2192 Output** motif under each stage (or small icons inside each stage) to reinforce common I/O.\n- **Primary flow direction**: left-to-right arrows connecting stages; within each stage, smaller arrows show internal data flow.\n\n### 2) Global elements (timeline + legend)\n- **Timeline axis**: a thin, medium-gray horizontal line running behind/under the stage panels.\n- **Stage markers**: five circular nodes on the timeline (soft gray outline), each aligned under its stage panel; label above each node: **\u201cStage 1\u201d \u2026 \u201cStage 5\u201d**.\n- **Legend (top-right corner)** in a small rounded rectangle with light-gray fill:\n  - **Blue blocks** = \u201cConvolution-centric\u201d\n  - **Purple blocks** = \u201cAttention-centric\u201d\n  - **Orange callouts** = \u201cKey properties/notes\u201d\n  - **Dashed outline** = \u201cAbsent / explicitly not used\u201d\n- **Color narrative to make the shift salient**:\n  - Stages 1\u20133 predominantly **cool blues/teals** (CNN era).\n  - Stage 4 predominantly **lavender/purple** (Transformer era).\n  - Stage 5 uses a **split palette** (ConvNeXt in blue; Swin in purple) to emphasize hybrid/efficient designs.\n\n### 3) Stage panels (grouping + contents)\n\nEach stage is enclosed in a **rounded rectangle panel** with a subtle pastel tint and a bold stage header.\n\n---\n\n## Stage 1 panel: Early Deep CNNs\n- **Panel styling**: rounded rectangle with **very light sky-blue tint**, thin blue border.\n- **Header (top-left inside panel)**:  \n  **\u201cStage 1: Early Deep CNNs\u201d**  \n  Subheader smaller text: **\u201cAlexNet (2012)\u201d**\n- **Inside the panel** place a single main model block labeled:\n  - **Main block** (medium blue fill, white text): **\u201cAlexNet (2012)\u201d**\n- **Internal subcomponents** shown as three smaller blocks inside the AlexNet block (or directly to its right in a mini-flow):\n  1. Small rounded rectangle: **\u201cDeep convolutional layers\u201d**\n  2. Small rounded rectangle: **\u201cReLU activation\u201d**\n  3. Small rounded rectangle: **\u201cLarge fully connected layers\u201d**\n- **Internal arrows**: black or dark-gray arrows (medium thickness) connecting:\n  - **Image \u2192 Deep convolutional layers \u2192 ReLU activation \u2192 Large fully connected layers \u2192 Classification**\n- **Input/Output icons**:\n  - On the far left edge of Stage 1 panel: a small **image icon** (stacked photo symbol) labeled **\u201cImage\u201d**.\n  - On the far right edge: a small **tag/probability bar icon** labeled **\u201cClassification output\u201d**.\n- **Key note callout**: a small **orange sticky-note callout** near the AlexNet block with a pointer line to the model:\n  - Text: **\u201cImageNet breakthrough\u201d**\n\n---\n\n## Stage 2 panel: Deeper CNNs\n- **Panel styling**: rounded rectangle with **light teal tint**, thin teal border.\n- **Header**:  \n  **\u201cStage 2: Deeper CNNs\u201d**  \n  Subheader: **\u201cVGGNet\u201d**\n- **Main block** (blue fill): **\u201cVGGNet\u201d**\n- **Internal depiction emphasizing repeated 3\u00d73 convs**:\n  - Show a **stack** of repeated small blocks (like a vertical or diagonal stack) each labeled **\u201c3\u00d73 Conv\u201d** (3\u20135 blocks to suggest depth without specifying counts).\n  - A bracket or label beside the stack: **\u201cIncreased depth\u201d**\n- **Annotation for parameters**:\n  - Orange callout bubble to the side: **\u201cHigh parameter count\u201d**\n- **Data flow**:\n  - **Image icon** at left of the stage feeding into the first \u201c3\u00d73 Conv\u201d block stack via an arrow.\n  - Arrow from the end of the stack to **\u201cClassification output\u201d** icon on the right.\n\n---\n\n## Stage 3 panel: Residual Learning\n- **Panel styling**: rounded rectangle with **light blue-gray tint**, thin blue-gray border.\n- **Header**:  \n  **\u201cStage 3: Residual Learning\u201d**  \n  Subheader: **\u201cResNet\u201d**\n- **Main block** (blue fill): **\u201cResNet\u201d**\n- **Core visual: one prominent residual unit** centered in the panel:\n  - Draw a **main path**: two or three sequential small blocks (blue) labeled generically (no invented specifics):\n    - **\u201cLayer(s)\u201d** \u2192 **\u201cLayer(s)\u201d** \u2192 **\u201cLayer(s)\u201d**\n  - Draw a **skip connection**:\n    - A thick curved arrow (darker blue) that **bypasses** the main-path blocks from the unit input to the unit output.\n    - Label above the curved arrow: **\u201cResidual / skip connection\u201d**\n  - At the rejoin point, show a small circular **merge node** (light gray circle). Since the merge op is not specified, label it neutrally:\n    - **\u201cRejoin\u201d** (small text under the circle)\n- **Properties/notes** (orange callouts):\n  1. **\u201c50+ layers\u201d** (placed near the overall ResNet block)\n  2. **\u201cImproved gradient flow\u201d** with a thin pointer line aimed at the skip connection arrow (to indicate causality).\n- **Data flow**:\n  - Left: **Image** icon \u2192 arrow into residual unit \u2192 arrow to **Classification output** icon.\n\n---\n\n## Stage 4 panel: Attention-based Models\n- **Panel styling**: rounded rectangle with **light lavender tint**, thin purple border.\n- **Header**:  \n  **\u201cStage 4: Attention-based Models\u201d**  \n  Subheader: **\u201cVision Transformer (ViT)\u201d**\n- **Main block** (purple fill): **\u201cVision Transformer (ViT)\u201d**\n- **Internal components shown sequentially**:\n  1. Small purple/white block: **\u201cPatch embedding\u201d**\n     - Optionally depict the image being split into a small grid of squares (like 4\u00d74) next to this block, with an arrow into \u201cPatch embedding\u201d.\n  2. Small purple/white block: **\u201cSelf-attention mechanism\u201d**\n     - Add a subtle icon: crisscross connecting lines between token dots to suggest global interactions.\n- **Explicit absence of convolution**:\n  - Place a **dashed-outline block** labeled **\u201cConvolution\u201d** with a red \u201cX\u201d over it (small, tasteful), and a label: **\u201cNo convolution\u201d**.\n  - Position this near ViT to clearly contrast with earlier CNN stages.\n- **Global receptive field annotation**:\n  - Orange callout: **\u201cGlobal receptive field\u201d** with pointer to the self-attention block.\n- **Data flow**:\n  - **Image \u2192 Patch embedding \u2192 Self-attention \u2192 Classification output** (arrows in dark gray).\n\n---\n\n## Stage 5 panel: Hybrid and Efficient Models\n- **Panel styling**: rounded rectangle with **very light neutral tint** (near-white), thin gray border; header text slightly bolder to indicate \u201ccurrent era\u201d.\n- **Header**:  \n  **\u201cStage 5: Hybrid and Efficient Models\u201d**  \n- **Inside this panel, show two parallel model blocks** side-by-side (or stacked vertically) to indicate coexisting designs:\n  - **Left block (blue fill)**: **\u201cConvNeXt\u201d**\n    - No internal subcomponents (per instruction). Optionally a tiny note under it in gray text: **\u201c(efficient CNN-era design)\u201d** without adding mechanisms.\n  - **Right block (purple fill)**: **\u201cSwin Transformer\u201d**\n    - Add two small annotation tags (small rounded pills) attached to the Swin block:\n      - **\u201cHierarchical vision transformers\u201d**\n      - **\u201cEfficient attention\u201d**\n    - Optionally depict hierarchy as 3 levels of small blocks labeled **\u201cStage 1\u201d \u201cStage 2\u201d \u201cStage 3\u201d** *inside* Swin as a purely structural icon (no operations specified), with a label above them: **\u201cHierarchical\u201d**.\n- **Data flow**:\n  - One **Image** icon at left of the Stage 5 panel splits into two arrows:\n    - Arrow to ConvNeXt \u2192 arrow to a shared **Classification output** icon on the far right\n    - Arrow to Swin Transformer \u2192 arrow to the same shared output icon\n  - The two outputs converge visually into one output (use a Y-shaped merge line).\n\n---\n\n### 4) Cross-stage transitions (major arrows)\nBetween the five stage panels, draw **large, thick, right-pointing arrows** (dark gray with slight gradient) labeled with short transition phrases placed above each arrow:\n\n1. **Stage 1 \u2192 Stage 2** arrow label: **\u201cDeeper CNNs; small 3\u00d73 filters\u201d**\n2. **Stage 2 \u2192 Stage 3** arrow label: **\u201cResidual learning (skip connections)\u201d**\n3. **Stage 3 \u2192 Stage 4** arrow label: **\u201cShift: convolution \u2192 attention\u201d**\n4. **Stage 4 \u2192 Stage 5** arrow label: **\u201cHybridization & efficiency\u201d**\n\nTo make the CNN\u2192Transformer shift salient:\n- Make the **Stage 3 \u2192 Stage 4 arrow** slightly thicker and/or add a subtle background highlight band behind it (very pale lavender) with a caption: **\u201cParadigm shift\u201d**.\n\n### 5) Typography and styling details\n- **Font**: clean sans-serif (paper-style), stage headers bold; internal block labels regular.\n- **Line weights**:\n  - Internal arrows: medium thickness.\n  - Cross-stage arrows: thicker.\n  - Panel borders: thin.\n  - Skip connection arrow in ResNet: thick and visually prominent.\n- **Icons**: simple flat icons (no 3D), minimal outlines.\n- **Consistency**: all stages show **Image** on left and **Classification output** on right (either repeated per stage or implied with small icons inside each panel).\n\n### 6) Final composition (what the viewer immediately reads)\n- A clear **chronological progression**: **AlexNet \u2192 VGGNet \u2192 ResNet \u2192 ViT \u2192 (ConvNeXt, Swin)**.\n- Visual emphasis that:\n  - Early stages are **convolution stacks** (blue).\n  - ResNet introduces **skip connections** and training benefit (**improved gradient flow**).\n  - ViT is **patch embedding + self-attention** with **\u201cNo convolution\u201d** explicitly marked.\n  - Stage 5 shows **two branches** representing modern **efficient CNN-like** and **hierarchical transformer** approaches.",
  "optimized_description": "## Polished Figure Specification: \u201cTimeline of Vision Architectures (CNN \u2192 Residual CNN \u2192 Transformers)\u201d\n\n### 1) Overall layout\n- **Canvas/background**: clean white (or extremely light warm gray) with generous whitespace; crisp, publication-ready.\n- **Main structure**: a single **left-to-right timeline/flowchart** spanning the full width.\n- **Top row**: five **stage panels** aligned on a strict grid, evenly spaced, with consistent panel height and padding.\n- **Bottom row (optional but recommended)**: a subtle, thin strip that repeats a small **Image \u2192 Classification output** motif under each stage *or* small I/O icons inside each stage (keep consistent across all stages).\n- **Flow direction**: strictly left-to-right for cross-stage transitions; within-stage arrows also left-to-right.\n\n### 2) Global elements (timeline + legend)\n- **Timeline axis**: a thin, medium-light gray horizontal line running behind or just beneath the stage panels (visually secondary).\n- **Stage markers**: five small circular nodes on the timeline, centered under each stage panel; soft gray outline with white fill. Above each node, place **\u201cStage 1\u201d \u2026 \u201cStage 5\u201d** in bold sans-serif.\n- **Legend (top-right corner)**: a compact rounded rectangle with very light gray fill and a subtle border. Include:\n  - **Blue blocks** = \u201cConvolution-centric\u201d\n  - **Purple blocks** = \u201cAttention-centric\u201d\n  - **Warm orange callouts** = \u201cKey properties/notes\u201d\n  - **Dashed outline** = \u201cAbsent / explicitly not used\u201d\n- **Color narrative (to emphasize the shift)**:\n  - Stages 1\u20133: predominantly **soft blues/teals** (CNN era).\n  - Stage 4: predominantly **soft lavender/purple** (Transformer era).\n  - Stage 5: **split palette** (ConvNeXt in blue; Swin in purple) to signal hybrid/efficient coexistence.\n\n### 3) Stage panels (grouping + contents)\n**All stage panels**: rounded rectangles with a very light pastel tint (low saturation), thin slightly darker border, and a consistent header area. Use bold sans-serif for stage headers; regular sans-serif for internal labels. Keep internal blocks as rounded rectangles with gentle shadows (very subtle) to separate layers without heavy outlines.\n\n---\n\n## Stage 1 panel: Early Deep CNNs\n- **Panel styling**: very light **sky-blue wash** with a slightly darker sky-blue border.\n- **Header (top-left inside panel)**:\n  - Bold: **\u201cStage 1: Early Deep CNNs\u201d**\n  - Smaller subheader beneath: **\u201cAlexNet (2012)\u201d**\n- **Main model block**: rounded rectangle with **medium soft blue fill** and white label text: **\u201cAlexNet (2012)\u201d**.\n- **Internal subcomponents**: three smaller rounded rectangles arranged as a mini left-to-right flow inside the panel (either nested within the AlexNet block or placed immediately to its right within the panel, but keep them visually grouped with AlexNet):\n  1. **\u201cDeep convolutional layers\u201d**\n  2. **\u201cReLU activation\u201d**\n  3. **\u201cLarge fully connected layers\u201d**\n  Use lighter blue fills than the main block, with dark gray text for readability.\n- **Internal arrows**: dark gray, clean arrowheads:\n  - **Image \u2192 Deep convolutional layers \u2192 ReLU activation \u2192 Large fully connected layers \u2192 Classification output**\n- **Input/Output icons**:\n  - Left edge of the panel: a small flat **image icon** labeled **\u201cImage\u201d**.\n  - Right edge: a small flat **classification icon** (tag or bar chart) labeled **\u201cClassification output\u201d**.\n- **Key note callout**: a warm, soft **peach/orange** sticky-note style callout with a small pointer to the AlexNet block:\n  - Text: **\u201cImageNet breakthrough\u201d**\n\n---\n\n## Stage 2 panel: Deeper CNNs\n- **Panel styling**: very light **mint/teal tint** with a thin teal border.\n- **Header**:\n  - Bold: **\u201cStage 2: Deeper CNNs\u201d**\n  - Subheader: **\u201cVGGNet\u201d**\n- **Main block**: rounded rectangle with **soft blue fill** labeled **\u201cVGGNet\u201d** (white or very light text if the fill is darker; otherwise dark gray text).\n- **Repeated 3\u00d73 conv depiction**:\n  - Show a **stack** of small rounded rectangles (3\u20135) labeled **\u201c3\u00d73 Conv\u201d**. Arrange them as a tidy vertical stack or slight diagonal cascade to suggest depth (do not imply exact count).\n  - Place a neat bracket or side label next to the stack: **\u201cIncreased depth\u201d** (regular sans-serif).\n- **Parameter annotation**: warm peach/orange callout bubble with pointer:\n  - **\u201cHigh parameter count\u201d**\n- **Data flow**:\n  - Left: **Image** icon \u2192 arrow into the first **\u201c3\u00d73 Conv\u201d** block.\n  - Arrow from the end of the stack to the **Classification output** icon on the right.\n\n---\n\n## Stage 3 panel: Residual Learning\n- **Panel styling**: very light **blue-gray tint** with a thin blue-gray border (cooler and slightly more technical than Stage 2).\n- **Header**:\n  - Bold: **\u201cStage 3: Residual Learning\u201d**\n  - Subheader: **\u201cResNet\u201d**\n- **Main block**: rounded rectangle with **soft blue fill** labeled **\u201cResNet\u201d**.\n- **Core visual: one prominent residual unit** centered and larger than internal elements in other stages (to make the concept unmistakable):\n  - **Main path**: two or three sequential small rounded rectangles (soft blue) labeled **\u201cLayer(s)\u201d \u2192 \u201cLayer(s)\u201d \u2192 \u201cLayer(s)\u201d**.\n  - **Skip connection**: a visually prominent curved or arched arrow in a darker, richer blue that bypasses the main-path blocks from the unit input to the unit output.\n    - Label above the skip arrow: **\u201cResidual / skip connection\u201d** (bold or semi-bold).\n  - **Rejoin point**: a small light-gray circular node where the skip path meets the main path output.\n    - Under the circle, small text: **\u201cRejoin\u201d** (neutral, unobtrusive).\n- **Properties/notes (warm peach/orange callouts)**:\n  1. **\u201c50+ layers\u201d** near the ResNet label (no extra numbers).\n  2. **\u201cImproved gradient flow\u201d** with a thin pointer line aimed specifically at the skip connection arrow.\n- **Data flow**:\n  - Left: **Image** icon \u2192 arrow into the residual unit \u2192 arrow to **Classification output** icon.\n\n---\n\n## Stage 4 panel: Attention-based Models\n- **Panel styling**: very light **lavender tint** with a thin muted purple border.\n- **Header**:\n  - Bold: **\u201cStage 4: Attention-based Models\u201d**\n  - Subheader: **\u201cVision Transformer (ViT)\u201d**\n- **Main block**: rounded rectangle with **soft purple fill** labeled **\u201cVision Transformer (ViT)\u201d** (white text for contrast).\n- **Internal components (sequential)**:\n  1. Rounded rectangle labeled **\u201cPatch embedding\u201d** (light purple fill, dark gray text).\n     - Adjacent to it (or just before it), optionally depict the image as a small grid of squares to suggest patching; keep it minimal and flat.\n  2. Rounded rectangle labeled **\u201cSelf-attention mechanism\u201d** (light purple fill).\n     - Add a subtle, clean icon inside or above: small token dots with thin connecting lines to suggest global interactions (avoid clutter).\n- **Explicit absence of convolution**:\n  - A **dashed-outline** rounded rectangle labeled **\u201cConvolution\u201d** placed near the ViT flow.\n  - Overlay a small, tasteful **\u201cX\u201d** mark and add the label **\u201cNo convolution\u201d** nearby (keep the mark muted\u2014dark rose or deep gray-red\u2014so it reads clearly without looking alarmist).\n- **Global receptive field annotation**:\n  - Warm peach/orange callout: **\u201cGlobal receptive field\u201d** with pointer to the **Self-attention mechanism** block.\n- **Data flow**:\n  - **Image \u2192 Patch embedding \u2192 Self-attention mechanism \u2192 Classification output** with dark gray arrows.\n\n---\n\n## Stage 5 panel: Hybrid and Efficient Models\n- **Panel styling**: very light neutral tint (near-white) with a thin soft gray border; header slightly bolder to indicate the \u201ccurrent era\u201d without changing content.\n- **Header**:\n  - Bold: **\u201cStage 5: Hybrid and Efficient Models\u201d**\n- **Two parallel model blocks** inside the panel (aligned side-by-side with equal visual weight):\n  - **Left block (soft blue fill)**: **\u201cConvNeXt\u201d** (no internal subcomponents).\n    - Optional tiny gray subtitle directly beneath the block: **\u201c(efficient CNN-era design)\u201d** (keep subtle and secondary).\n  - **Right block (soft purple fill)**: **\u201cSwin Transformer\u201d**\n    - Attach two small rounded \u201cpill\u201d tags (light peach/orange or very light gray with orange outline) near the Swin block:\n      - **\u201cHierarchical vision transformers\u201d**\n      - **\u201cEfficient attention\u201d**\n    - Optional minimal hierarchy hint *as an icon only*: three small stacked mini-blocks inside the Swin block labeled **\u201cStage 1\u201d \u201cStage 2\u201d \u201cStage 3\u201d**, with a small label **\u201cHierarchical\u201d** above them (keep extremely light and schematic; do not imply operations).\n- **Data flow**:\n  - One **Image** icon at the left of the Stage 5 panel splits into two arrows:\n    - Arrow to **ConvNeXt** \u2192 arrow to a shared **Classification output** icon on the far right.\n    - Arrow to **Swin Transformer** \u2192 arrow to the same shared output icon.\n  - The two outgoing lines converge cleanly into a Y-shaped merge before the output icon (thin, dark gray).\n\n---\n\n### 4) Cross-stage transitions (major arrows)\n- Between stage panels, draw **large, thick, right-pointing arrows** in dark gray (slightly softened, not pure black). Place labels above each arrow in regular sans-serif:\n  1. **Stage 1 \u2192 Stage 2**: **\u201cDeeper CNNs; small 3\u00d73 filters\u201d**\n  2. **Stage 2 \u2192 Stage 3**: **\u201cResidual learning (skip connections)\u201d**\n  3. **Stage 3 \u2192 Stage 4**: **\u201cShift: convolution \u2192 attention\u201d**\n  4. **Stage 4 \u2192 Stage 5**: **\u201cHybridization & efficiency\u201d**\n- To make the CNN\u2192Transformer transition most salient:\n  - Make the **Stage 3 \u2192 Stage 4** arrow slightly thicker than the others.\n  - Add a very pale lavender highlight band behind that arrow with a small caption above it: **\u201cParadigm shift\u201d** (subtle, not dominating).\n\n### 5) Typography and styling details\n- **Typography**:\n  - All module/stage labels: clean sans-serif (conference-standard).\n  - Stage headers: bold; subheaders: medium weight; internal labels: regular.\n- **Line/arrow styling**:\n  - Internal arrows: medium weight, dark gray.\n  - Cross-stage arrows: thicker, dark gray.\n  - Panel borders: thin and consistent.\n  - ResNet skip connection: thickest and most visually prominent within its panel.\n- **Icons**: flat, minimal, consistent stroke style; avoid 3D effects.\n- **Alignment & spacing**: snap all blocks, icons, and callouts to an implicit grid; keep consistent padding inside panels and consistent gaps between panels.\n\n### 6) Final composition (immediate readability)\n- The viewer should instantly read the chronological progression:\n  **AlexNet \u2192 VGGNet \u2192 ResNet \u2192 Vision Transformer (ViT) \u2192 (ConvNeXt, Swin Transformer)**.\n- The color shift and the emphasized Stage 3\u21924 transition make the move from **convolution-centric** to **attention-centric** visually unmistakable, while Stage 5 clearly communicates two coexisting modern directions."
}