{
  "retrieved_examples": [
    "2601.14724v2",
    "2404.15806v1",
    "2601.05110v1",
    "2601.06411v1",
    "2601.07033v1",
    "2601.07055v1",
    "2601.09259v1",
    "2601.09708v1",
    "2601.15892v2",
    "2601.15165v2"
  ],
  "initial_description": "### Figure Description: Computer Vision Dataset Creation Pipeline\n\n**Overall Layout**: The diagram will have a clean, minimalistic design with a **left-to-right flow direction**. It will consist of several sequential stages, each represented by distinct blocks, symbolizing the different phases of the dataset creation process. The background will be a pure white to ensure clarity and focus on the content.\n\n**Components**:\n1. **Data Collection Stage** (Left Section):\n   - A rectangular box labeled \"Data Collection\" with a soft pastel blue background.\n   - Inside, there are four smaller boxes:\n     - \"Image Acquisition\" (light blue)\n     - \"Video Frame Extraction\" (light blue)\n     - \"Sensor Metadata Collection\" (light blue)\n     - \"Multi-Environment Capture\" (light blue)\n   - Each smaller box will have a rounded edge and a soft shadow for depth.\n\n2. **Data Storage Stage** (Next Section):\n   - A rectangular box labeled \"Data Storage\" with a light pastel green background.\n   - Inside, four smaller boxes:\n     - \"Cloud Storage\" (light green)\n     - \"Raw Image Repository\" (light green)\n     - \"Metadata Database\" (light green)\n     - \"Version Control System\" (light green)\n\n3. **Data Cleaning & Filtering Stage** (Middle Section):\n   - A rectangular box labeled \"Data Cleaning & Filtering\" with a soft pastel orange background.\n   - Inside, five smaller boxes:\n     - \"Corrupted Image Removal\" (light orange)\n     - \"Duplicate Removal\" (light orange)\n     - \"Blur Detection\" (light orange)\n     - \"Resolution Filtering\" (light orange)\n     - \"Noise Reduction\" (light orange)\n\n4. **Data Annotation Stage** (Middle-Right Section):\n   - A rectangular box labeled \"Data Annotation\" with a light pastel purple background.\n   - Inside, five smaller boxes:\n     - \"Bounding Box Labeling\" (light purple)\n     - \"Semantic Segmentation Masks\" (light purple)\n     - \"Keypoint Annotation\" (light purple)\n     - \"Human-in-the-Loop Labeling Tools\" (light purple)\n     - \"Automated Pre-Labeling\" (light purple)\n\n5. **Quality Control Stage** (Right Section):\n   - A rectangular box labeled \"Quality Control\" with a light pastel pink background.\n   - Inside, four smaller boxes:\n     - \"Annotation Review Process\" (light pink)\n     - \"Inter-Annotator Agreement\" (light pink)\n     - \"Error Correction Workflow\" (light pink)\n     - \"Dataset Auditing\" (light pink)\n\n6. **Data Augmentation Stage** (Right Section):\n   - A rectangular box labeled \"Data Augmentation\" with a light pastel yellow background.\n   - Inside, six smaller boxes:\n     - \"Random Crop\" (light yellow)\n     - \"Horizontal Flip\" (light yellow)\n     - \"Rotation\" (light yellow)\n     - \"Color Jitter\" (light yellow)\n     - \"CutMix / MixUp\" (light yellow)\n     - \"Synthetic Data Generation\" (light yellow)\n\n7. **Dataset Splitting Stage** (Right Section):\n   - A rectangular box labeled \"Dataset Splitting\" with a light pastel teal background.\n   - Inside, three smaller boxes:\n     - \"Train / Validation / Test Split\" (light teal)\n     - \"Stratified Sampling\" (light teal)\n     - \"Class Balance Verification\" (light teal)\n\n8. **Dataset Formatting Stage** (Final Right Section):\n   - A rectangular box labeled \"Dataset Formatting\" with a light pastel lavender background.\n   - Inside, four smaller boxes:\n     - \"COCO Format\" (light lavender)\n     - \"Pascal VOC Format\" (light lavender)\n     - \"TFRecord Conversion\" (light lavender)\n     - \"PyTorch Dataset Loaders\" (light lavender)\n\n9. **Monitoring & Maintenance Stage** (Final Right Section):\n   - A rectangular box labeled \"Monitoring & Maintenance\" with a light pastel peach background.\n   - Inside, four smaller boxes:\n     - \"Dataset Versioning\" (light peach)\n     - \"Bias Analysis\" (light peach)\n     - \"Distribution Shift Detection\" (light peach)\n     - \"Continuous Dataset Update Pipeline\" (light peach)\n\n**Connections**: \n- Wide arrows (medium thickness, gray) will flow horizontally from one stage to the next, indicating the progression of data through the pipeline.\n- Each arrow will have a label indicating the type of data being transferred, such as \"Raw Data,\" \"Stored Data,\" \"Cleaned Data,\" \"Annotated Data,\" etc.\n\n**Groupings**: \n- Each major component will be grouped within its respective colored rectangular box, visually distinguishing different stages of the pipeline.\n\n**Labels and Annotations**: \n- Each box and arrow will be clearly labeled with concise text. The labels will use a simple sans-serif font for readability.\n- Annotations may include brief descriptions or examples within the boxes, written in smaller text, to clarify each component's function.\n\n**Input/Output**:\n- At the far left, an \"Input\" arrow will label the entry point as \"Raw Data: Images, Videos, Sensor Metadata.\"\n- At the far right, an \"Output\" arrow will label the exit point as \"Model-Ready Dataset: Formatted for COCO, Pascal VOC, or TFRecord.\"\n\n**Styling**: \n- The entire diagram will maintain a polished and professional appearance, suitable for academic publication.\n- Line weights will vary slightly, with thicker lines for the main framework and thinner lines for internal connections.\n- The color palette will be soft and pastel, ensuring the diagram is visually appealing and not overwhelming.",
  "optimized_description": "### Figure Description: Computer Vision Dataset Creation Pipeline\n\n**Overall Layout**: The diagram will have a clean, minimalistic design with a **left-to-right flow direction**. It will consist of several sequential stages, each represented by distinct blocks, symbolizing the different phases of the dataset creation process. The background will be a very light cream, ensuring clarity and focus on the content.\n\n**Components**:\n1. **Data Collection Stage** (Left Section):\n   - A rounded rectangle labeled \"Data Collection\" with a soft pastel blue fill and a slightly darker blue border.\n   - Inside, there are four smaller rounded rectangles:\n     - \"Image Acquisition\" (soft sky blue fill)\n     - \"Video Frame Extraction\" (soft sky blue fill)\n     - \"Sensor Metadata Collection\" (soft sky blue fill)\n     - \"Multi-Environment Capture\" (soft sky blue fill)\n   - Each smaller box will have a soft shadow for depth.\n\n2. **Data Storage Stage** (Next Section):\n   - A rounded rectangle labeled \"Data Storage\" with a light pastel green fill and a slightly darker green border.\n   - Inside, four smaller rounded rectangles:\n     - \"Cloud Storage\" (light sage green fill)\n     - \"Raw Image Repository\" (light sage green fill)\n     - \"Metadata Database\" (light sage green fill)\n     - \"Version Control System\" (light sage green fill)\n\n3. **Data Cleaning & Filtering Stage** (Middle Section):\n   - A rounded rectangle labeled \"Data Cleaning & Filtering\" with a soft pastel orange fill and a slightly darker orange border.\n   - Inside, five smaller rounded rectangles:\n     - \"Corrupted Image Removal\" (warm peach fill)\n     - \"Duplicate Removal\" (warm peach fill)\n     - \"Blur Detection\" (warm peach fill)\n     - \"Resolution Filtering\" (warm peach fill)\n     - \"Noise Reduction\" (warm peach fill)\n\n4. **Data Annotation Stage** (Middle-Right Section):\n   - A rounded rectangle labeled \"Data Annotation\" with a light pastel purple fill and a slightly darker purple border.\n   - Inside, five smaller rounded rectangles:\n     - \"Bounding Box Labeling\" (light lavender fill)\n     - \"Semantic Segmentation Masks\" (light lavender fill)\n     - \"Keypoint Annotation\" (light lavender fill)\n     - \"Human-in-the-Loop Labeling Tools\" (light lavender fill)\n     - \"Automated Pre-Labeling\" (light lavender fill)\n\n5. **Quality Control Stage** (Right Section):\n   - A rounded rectangle labeled \"Quality Control\" with a light pastel pink fill and a slightly darker pink border.\n   - Inside, four smaller rounded rectangles:\n     - \"Annotation Review Process\" (light pink fill)\n     - \"Inter-Annotator Agreement\" (light pink fill)\n     - \"Error Correction Workflow\" (light pink fill)\n     - \"Dataset Auditing\" (light pink fill)\n\n6. **Data Augmentation Stage** (Right Section):\n   - A rounded rectangle labeled \"Data Augmentation\" with a light pastel yellow fill and a slightly darker yellow border.\n   - Inside, six smaller rounded rectangles:\n     - \"Random Crop\" (light yellow fill)\n     - \"Horizontal Flip\" (light yellow fill)\n     - \"Rotation\" (light yellow fill)\n     - \"Color Jitter\" (light yellow fill)\n     - \"CutMix / MixUp\" (light yellow fill)\n     - \"Synthetic Data Generation\" (light yellow fill)\n\n7. **Dataset Splitting Stage** (Right Section):\n   - A rounded rectangle labeled \"Dataset Splitting\" with a light pastel teal fill and a slightly darker teal border.\n   - Inside, three smaller rounded rectangles:\n     - \"Train / Validation / Test Split\" (light teal fill)\n     - \"Stratified Sampling\" (light teal fill)\n     - \"Class Balance Verification\" (light teal fill)\n\n8. **Dataset Formatting Stage** (Final Right Section):\n   - A rounded rectangle labeled \"Dataset Formatting\" with a light pastel lavender fill and a slightly darker lavender border.\n   - Inside, four smaller rounded rectangles:\n     - \"COCO Format\" (light lavender fill)\n     - \"Pascal VOC Format\" (light lavender fill)\n     - \"TFRecord Conversion\" (light lavender fill)\n     - \"PyTorch Dataset Loaders\" (light lavender fill)\n\n9. **Monitoring & Maintenance Stage** (Final Right Section):\n   - A rounded rectangle labeled \"Monitoring & Maintenance\" with a light pastel peach fill and a slightly darker peach border.\n   - Inside, four smaller rounded rectangles:\n     - \"Dataset Versioning\" (light peach fill)\n     - \"Bias Analysis\" (light peach fill)\n     - \"Distribution Shift Detection\" (light peach fill)\n     - \"Continuous Dataset Update Pipeline\" (light peach fill)\n\n**Connections**: \n- Wide arrows (medium thickness, soft grey) will flow horizontally from one stage to the next, indicating the progression of data through the pipeline.\n- Each arrow will have a label indicating the type of data being transferred, such as \"Raw Data,\" \"Stored Data,\" \"Cleaned Data,\" \"Annotated Data,\" etc.\n\n**Groupings**: \n- Each major component will be grouped within its respective colored rounded rectangle, visually distinguishing different stages of the pipeline.\n\n**Labels and Annotations**: \n- Each box and arrow will be clearly labeled with concise text in bold sans-serif font for readability.\n- Annotations may include brief descriptions or examples within the boxes, written in smaller regular sans-serif text, to clarify each component's function.\n\n**Input/Output**:\n- At the far left, an \"Input\" arrow will label the entry point as \"Raw Data: Images, Videos, Sensor Metadata.\"\n- At the far right, an \"Output\" arrow will label the exit point as \"Model-Ready Dataset: Formatted for COCO, Pascal VOC, or TFRecord.\"\n\n**Styling**: \n- The entire diagram will maintain a polished and professional appearance, suitable for academic publication.\n- Line weights will vary slightly, with thicker lines for the main framework and thinner lines for internal connections.\n- The color palette will be soft and pastel, ensuring the diagram is visually appealing and not overwhelming."
}