Delete mesh-candidate_bestfit

Browse files

Files changed (9) hide show

mesh-candidate_bestfit/README.md +0 -102
mesh-candidate_bestfit/augmentation.py +0 -88
mesh-candidate_bestfit/bestfit_generator.py +0 -143
mesh-candidate_bestfit/combine_layouts.py +0 -32
mesh-candidate_bestfit/map_dict.py +0 -48
mesh-candidate_bestfit/rendering.py +0 -83
mesh-candidate_bestfit/utils/base.py +0 -29
mesh-candidate_bestfit/utils/process.py +0 -42
mesh-candidate_bestfit/visualize.ipynb +0 -0

mesh-candidate_bestfit/README.md DELETED Viewed

@@ -1,102 +0,0 @@
-## Pretraining Data Generation via Mesh-candidate Bestfit
-<p align="center">
-  <img src="../assets/Mesh-candidate Bestfit.png" width=100%> <br>
-  <i><small>Mesh-candidate Bestfit iteratively inserts elements from a small set of public datasets by searching for the best match between sampled candidates and the available grids in the current layout, ultimately achieving document synthesis.</i>
-</p>
-You can generate a large scale of diverse data for pretraining applying our proposed method Mesh-candidate Bestfit, just follow steps below:
-### 1. Environment Setup
-You need to install [PyMuPDF](https://pypi.org/project/PyMuPDF/1.23.7/) for subsequent rendering via pip:
-```bash
-cd mesh-candidate_bestfit
-pip install pymupdf==1.23.7
-```
-### 2. Preprocessing
-- **Data Preparation**
-  Two primary things need to be well prepared before starting generation:
-  1\. **Original Annotation File of Initial Dataset**
-     * The annotation file follows **COCO format**, a **JSON file** contains images and instances annotations.
-     * Each instance should have a **unique** ```instance_id```.
-     * The file should be placed under `./`.
-  2\. **Element Pool**
-  Element Pool is constructed according to annotation file. Specifically, crop all the instances images and organize them in a category-wise manner. The structure of element pool is as follows (folder is named by each category and cropped image is named by unique ```instance_id```):
-  ```bash
-  ./element_pool
-  ├── advertisement
-  │   ├── 727.jpg
-  │   ├── 919.jpg
-  │   ├── 1423.jpg
-  │   └── ...
-  ├── algorithm
-  │   ├── 12653.jpg
-  │   ├── 17485.jpg
-  │   ├── 44364.jpg
-  │   └── ...
-  └── ...
-  ```
-  **Note:** For convenience, we provide original annotation file and element pool for M6Doc-test dataset, which can be downloaded from [annotation file](https://drive.google.com/file/d/1ua41Gs3UW8iuoJp21tZ4-lczVrcEm-gP/view?usp=sharing) and [element pool](https://drive.google.com/file/d/1MrIFObKr1bDGgZLBQM_c_Dvobkp6mjFE/view?usp=sharing), respectively. And you can run the script below to decompress the element pool properly:
-  ```bash
-  unzip /path/to/your/element_pool.zip -d ./element_pool/
-  ```
-- **Data Augmentation(Optional)**
-  If you want to apply our designed augmentation pipeline to your element pool, you can just run:
-  ```bash
-  python augmentation.py --min_count 100 --aug_times 10
-  ```
-  The script will perform augmentation pipeline `aug_times` times on each element of categories whose element number is less than `min_count`. If you want to generate large amount of data, try larger `aug_times`. In contrast, you want to shorten this process, try smaller `aug_times`. During DocSynth300K generation, we use ```--aug_times 50```.
-- **Map Dict**
-  To facilitate the random selection of candidates during the rendering phase, it is necessary to establish a mapping from candidate elements to all of their candidate paths (passing ```--use_aug``` is augmentation is implemented):
-  ```bash
-  python map_dict.py --save_path ./map_dict.json --use_aug
-  ```
-### 3. Layout Generation
-Now, you can generate diverse layouts using Mesh-candidate Bestfit algorithm. To prevent process blocking, it will save the result of each layout in a timely manner, but you can use the [combine_layouts.py](./combine_layouts.py) script to combine them all together like this:
-```bash
-python bestfit_generator.py --generate_num 100 --n_jobs 5 --json_path ./annotation_file.json --output_dir ./generated_layouts/seperate
-python combine_layouts.py --seperate_layouts_dir ./generated_layouts/seperate --save_path ./generated_layouts/combined_layouts.json
-```
-Afterwards, feel free to delete the seperate layouts since they are no longer used.
-**Note:** Due to multiprocessing used in layout generation, set proper ```--n_jobs``` to avoid process blocking.
-### 4. Rendering
-Finally, you can render generated layouts and save the results in yolo format via the script below:
-```bash
-python rendering.py --json_path ./generated_layouts/combined_layouts.json --n_jobs 5 --map_dict_path ./map_dict.json --save_dir ./generated_dataset
-```
-### Visualization
-We provide [visualize.ipynb](./visualize.ipynb) to visualize the layouts generated by our proposed methods. Here, we display some generation cases below:
-<p align="center">
-  <img src="../assets/visualization.png" width=100%> <br>
-</p>

mesh-candidate_bestfit/augmentation.py DELETED Viewed

@@ -1,88 +0,0 @@
-import os
-import cv2
-import time
-import argparse
-import numpy as np
-from tqdm import tqdm
-import albumentations as A
-class EdgeDetection(A.ImageOnlyTransform):
-    """
-    A class for edge extraction of images with the sobel filter
-    """
-    def apply(self, img, **params):
-        gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
-        sobelx = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize=3)
-        sobely = cv2.Sobel(gray, cv2.CV_64F, 0, 1, ksize=3)
-        mag = np.hypot(sobelx, sobely)
-        mag = mag / np.max(mag) * 255
-        return np.uint8(mag)
-def pipeline(h, w):
-    """
-    Whole data augmentation pipeline with the input of image size
-    Args:
-        h (float): Height of the image.
-        w (float): Width of the image.
-    """
-    return A.Compose([
-        A.RandomBrightnessContrast(p=0.5),
-        A.RandomResizedCrop(height=h, width=w, scale=(0.5, 0.9), ratio=(w / h, w / h), p=0.7),  # keep h/w ratio the same
-        EdgeDetection(p=0.2),
-        A.ElasticTransform(alpha_affine=5, p=0.2),
-        A.GaussNoise(var_limit=(100, 1200), p=1),
-    ])
-def perform_augmentation(img, transform, save_dir, prefix_id, aug_times):
-    """
-    Perform augmentation for a single element with many times.
-    Args:
-        img (image): An elment.
-        transform (sequence): Data augmentation pipeline.
-        save_dir (str): Root directory to save.
-        prefix_id (str): Raw id for the element.
-        aug_times (int): Augmentation times.
-    """
-    for _ in range(aug_times):
-        transformed = transform(image=img)
-        transformed_image = transformed["image"]
-        transformed_image_bgr = cv2.cvtColor(transformed_image, cv2.COLOR_RGB2BGR)
-        suffix_id = str(time.time()).replace(".", "_")
-        prefix_id_dir = os.path.join(save_dir, prefix_id)
-        os.makedirs(prefix_id_dir, exist_ok=True)
-        aug_element_path = os.path.join(prefix_id_dir, f'{prefix_id}_{suffix_id}.jpg')
-        cv2.imwrite(aug_element_path, transformed_image_bgr)
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser(description="Perform Image Augmentation")
-    parser.add_argument("--min_count", type=int, default=100, help="Minimum number of elements for categories that do not require data augmentation")
-    parser.add_argument("--aug_times", type=int, default=50, help="Number of augmentations per element")
-    args = parser.parse_args()
-    root_dir = './element_pool'
-    for category in tqdm(os.listdir(root_dir),desc='Categories done'):
-        category_dir = os.path.join(root_dir,category)
-        if len(os.listdir(category_dir)) > args.min_count:
-            continue
-        else:
-            save_dir = os.path.join(category_dir, 'aug')
-            for raw_element in os.listdir(category_dir):
-                raw_element_path = os.path.join(category_dir, raw_element)
-                img = cv2.imread(raw_element_path)
-                img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-                h, w, c = img.shape
-                element_id = raw_element.split('.')[0]
-                transform = pipeline(h, w)
-                perform_augmentation(img=img,transform=transform,save_dir=save_dir,prefix_id=element_id,aug_times=args.aug_times)

mesh-candidate_bestfit/bestfit_generator.py DELETED Viewed

@@ -1,143 +0,0 @@
-import os
-import json
-import time
-import torch
-import random
-import datetime
-import argparse
-import itertools
-import torchvision
-import multiprocessing
-from utils.process import *
-random.seed(datetime.datetime.now().timestamp())
-def bestfit_generator(element_all):
-    """
-    Apply the Mesh-candidate Bestfit algorithm to generate diverse layouts.
-    Args:
-        element_all (dict): Loaded elements from dataset json file.
-        output_dir (str): Directory to save the generated layouts.
-    """
-    # Default candidate_num = 500
-    candidate_num = 500
-    large_elements_idx = random.sample(list(range(len(element_all['large']))), int(candidate_num*0.99))
-    small_elements_idx = random.sample(list(range(len(element_all['small']))), int(candidate_num*0.01))
-    cand_elements = [element_all['large'][large_idx] for large_idx in large_elements_idx] + [element_all['small'][small_idx] for small_idx in small_elements_idx]
-    # Initially, randomly put an element
-    put_elements = []
-    e0 = random.choice(cand_elements)
-    cx = random.uniform(min(e0.w/2, 1-e0.w/2), max(e0.w/2, 1-e0.w/2))
-    cy = random.uniform(min(e0.h/2, 1-e0.h/2), max(e0.h/2, 1-e0.h/2))
-    e0.cx, e0.cy = cx, cy
-    put_elements = [e0]
-    cand_elements.remove(e0)
-    small_cnt = 1 if e0.w < 0.05 or e0.h < 0.05 else 0
-    # Iterativelly insert elements
-    while True:
-        # Construct meshgrid based on current layout
-        put_element_boxes = []
-        xticks, yticks = [0,1], [0,1]
-        for e in put_elements:
-            x1, y1, x2, y2 = e.cx-e.w/2, e.cy-e.h/2, e.cx+e.w/2, e.cy+e.h/2
-            xticks.append(x1)
-            xticks.append(x2)
-            yticks.append(y1)
-            yticks.append(y2)
-            put_element_boxes.append([x1, y1, x2, y2])
-        xticks, yticks = list(set(xticks)), list(set(yticks))
-        pticks = list(itertools.product(xticks, yticks))
-        meshgrid = list(itertools.product(pticks, pticks))
-        put_element_boxes = torch.Tensor(put_element_boxes)
-        # Filter out invlid grids
-        meshgrid = [grid for grid in meshgrid if grid[0][0] < grid[1][0] and grid[0][1] < grid[1][1]]
-        meshgrid_tensor = torch.Tensor([p1 + p2 for p1, p2 in meshgrid])
-        iou_res = torchvision.ops.box_iou(meshgrid_tensor, put_element_boxes)
-        valid_grid_idx = (iou_res.sum(dim=1) == 0).nonzero().flatten().tolist()
-        meshgrid = meshgrid_tensor[valid_grid_idx].tolist()
-        # Search for the Mesh-candidate Bestfit pair
-        max_fill, max_grid_idx, max_element_idx = 0, -1, -1
-        for element_idx, e in enumerate(cand_elements):
-            for grid_idx, grid in enumerate(meshgrid):
-                if e.w > grid[2] - grid[0] or e.h > grid[3] - grid[1]:
-                    continue
-                element_area = e.w * e.h
-                grid_area = (grid[2] - grid[0]) * (grid[3] - grid[1])
-                if element_area/grid_area > max_fill:
-                    max_fill = element_area/grid_area
-                    max_grid_idx = grid_idx
-                    max_element_idx = element_idx
-        # Termination condition
-        if max_element_idx == -1 or max_grid_idx == -1:
-            break
-        else:
-            maxfit_element = cand_elements[max_element_idx]
-            if maxfit_element.w < 0.05 or maxfit_element.h < 0.05:
-                small_cnt += 1
-            if small_cnt > 5:
-                break
-            else:
-                pass
-        # Put the candidate to the center of the grid
-        cand_elements.remove(maxfit_element)
-        maxfit_element.cx = (meshgrid[max_grid_idx][0] + meshgrid[max_grid_idx][2])/2
-        maxfit_element.cy = (meshgrid[max_grid_idx][1] + meshgrid[max_grid_idx][3])/2
-        put_elements.append(maxfit_element)
-    # Apply a rescale transform to introduce more diversity
-    for _, e in enumerate(put_elements):
-        e.gen_real_bbox()
-    layout = Layout(cand_elements=put_elements)
-    # Convert the layout to json file format
-    boxes, categories, relpaths = [], [], []
-    for element in layout.cand_elements:
-        cx, cy, w, h = element.get_real_bbox()
-        x1, y1, x2, y2 = cx-w/2, cy-h/2, cx+w/2, cy+h/2
-        boxes.append([x1, y1, x2, y2])
-        categories.append(element.category-1) # Exclude the "__background__" category (category_id = 0)
-        relpaths.append(element.filepath)
-    output_layout = {
-        "boxes": boxes,
-        "categories": categories,
-        "relpaths": relpaths
-    }
-    # To prevent process blocking, save the result of each layout in a timely manner.
-    with open(os.path.join(OUTPUT_DIR,str(time.time()).replace(".", "_")+'.json'),'w') as f:
-        json.dump(output_layout, f)
-    return output_layout
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser()
-    parser.add_argument('--generate_num', default=None, required=True, type=int, help='number of layouts to generate')
-    parser.add_argument('--n_jobs', default=None, required=True, type=int, help='number of processes to use in multiprocessing')
-    parser.add_argument('--json_path', default=None, required=True, type=str, help='original json file of the dataset')
-    parser.add_argument('--output_dir', default='./generated_layouts/seperate', type=str, help='output directory of generated seperate layouts')
-    args = parser.parse_args()
-    element_all = read_data(args.json_path)
-    OUTPUT_DIR = args.output_dir
-    os.makedirs(OUTPUT_DIR,exist_ok=True)
-    # Using multiprocessing to accelerate generation
-    n_jobs = args.n_jobs
-    with multiprocessing.Pool(n_jobs) as p:
-        generated_layout = p.starmap(
-            bestfit_generator, [(element_all,) for _ in range(args.generate_num)]
-        )
-    p.close()
-    p.join()

mesh-candidate_bestfit/combine_layouts.py DELETED Viewed

@@ -1,32 +0,0 @@
-import os
-import json
-import argparse
-from tqdm import tqdm
-def combine_layouts(seperate_layouts_dir):
-    """
-    Combining seperate layouts into one json.
-    Args:
-        seperate_layouts_dir (str): Directory to save seperate layouts json files generated by bestfit_generator.py
-    """
-    combined_layouts = []
-    for item in tqdm(os.listdir(seperate_layouts_dir),desc='Combining seperate layouts'):
-        abs_path = os.path.join(seperate_layouts_dir,item)
-        json_file = json.load(open(abs_path))
-        combined_layouts.append(json_file)
-    return combined_layouts
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser()
-    parser.add_argument('--seperate_layouts_dir', default="./generated_layouts/seperate", type=str, help="directory to save seperate layouts")
-    parser.add_argument('--save_path', default="./generated_layouts/combined_layouts.json", type=str, help='save path for combined generated layouts')
-    args = parser.parse_args()
-    combined_layouts = combine_layouts(seperate_layouts_dir=args.seperate_layouts_dir)
-    with open(args.save_path,'w') as f:
-        f.write(json.dumps(combined_layouts,indent=4))

mesh-candidate_bestfit/map_dict.py DELETED Viewed

@@ -1,48 +0,0 @@
-import os
-import json
-import argparse
-from tqdm import tqdm
-def get_map_dict(use_aug):
-    """
-    Get a map from a element to its corresponding save paths.
-    Args:
-        use_aug (bool): Whether use augmentation elements or not.
-    """
-    instance2pathlist = {}
-    root_dir = './element_pool'
-    for category in os.listdir(root_dir):
-        if category == '.DS_Store':
-            continue
-        category_dir = os.path.join(root_dir,category)
-        filelist = os.listdir(category_dir)
-        for filename in tqdm(filelist):
-            if filename == 'aug':
-                continue
-            else:
-                sin_id_pathlist = []
-                start_id = filename.split('.')[0]
-                origin_path = os.path.join(category_dir,filename)
-                sin_id_pathlist.append(origin_path)
-                if 'aug' in filelist and use_aug:
-                    bottom_dir = os.path.join(category_dir,f'aug/{start_id}')
-                    aug_paths = os.listdir(bottom_dir)
-                    aug_pathlist = [os.path.join(bottom_dir,path) for path in aug_paths]
-                    sin_id_pathlist += aug_pathlist
-            instance2pathlist[start_id] = sin_id_pathlist
-    return instance2pathlist
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser()
-    parser.add_argument('--use_aug', action='store_true', help="whether to use data augmentation")
-    parser.add_argument('--save_path', default="./map_dict.json", type=str, help='save path for the map dict')
-    args = parser.parse_args()
-    map_dict = get_map_dict(use_aug=args.use_aug)
-    with open(args.save_path,'w') as f:
-        f.write(json.dumps(map_dict))

mesh-candidate_bestfit/rendering.py DELETED Viewed

@@ -1,83 +0,0 @@
-import os
-import time
-import fitz
-import json
-import random
-import argparse
-import datetime
-import multiprocessing
-from utils.process import *
-random.seed(datetime.datetime.now().timestamp())
-def render_layout(layouts):
-    """
-    Render layouts to images and save in the yolo format.
-    Args:
-        layouts (list): List of generated layouts information.
-    """
-    doc = fitz.open()
-    w, h = sample_hw(
-        width_range=[1200, 2000],
-        ratio_range=[0.7,1.5],
-        max_height=3000,
-    )
-    page = doc.new_page(width=w, height=h)
-    annotation_json = {"bbox": [], "labels": [], "width":w, "height":h}
-    for bbox, category, relpath in zip(layouts["boxes"], layouts["categories"], layouts["relpaths"]):
-        bbox[0], bbox[2] = bbox[0]*w, bbox[2]*w
-        bbox[1], bbox[3] = bbox[1]*h, bbox[3]*h
-        rect = fitz.Rect([bbox[i] for i in range(4)])
-        abs_filepath = os.path.join('./element_pool',relpath)
-        start_str = abs_filepath.rsplit('/',1)[1].split('.')[0]
-        sampled_path = random.choice(INSTANCE2PATHLIST[start_str])
-        page.insert_image(rect, filename=sampled_path, keep_proportion=False)
-        annotation_json["bbox"].append(bbox)
-        annotation_json["labels"].append(category)
-    _id = str(time.time()).replace(".", "_")
-    pix = page.get_pixmap()
-    pix.save(os.path.join(IMAGE_DIR, f"{_id}.jpg"))
-    anno_txt = open(os.path.join(ANNO_DIR, f"{_id}.txt"), "w")
-    for bbox, category_id in zip(annotation_json["bbox"], annotation_json["labels"]):
-        w, h = annotation_json["width"], annotation_json["height"]
-        x0, y0, x1, y1 = bbox
-        x0, y0 = x0/w, y0/h
-        x1, y1 = x1/w, y1/h
-        anno_txt.write(f"{category_id} {x0} {y0} {x1} {y0} {x1} {y1} {x0} {y1}\n")
-    anno_txt.close()
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser()
-    parser.add_argument('--save_dir', default="./generated_dataset", type=str, help='planned root dir for generated dataset')
-    parser.add_argument('--n_jobs', default=None, required=True, type=int, help='number of processes to use in multiprocessing')
-    parser.add_argument('--json_path', default=None, required=True, type=str, help='json path for layouts generated by the Mesh-candidate Bestfit alogorithm')
-    parser.add_argument('--map_dict_path', default=None, required=True, type=str, help='json path for element to pathlist map dict')
-    args = parser.parse_args()
-    # Setting save path
-    IMAGE_DIR = os.path.join(args.save_dir, "images")
-    ANNO_DIR = os.path.join(args.save_dir, "labels")
-    os.makedirs(IMAGE_DIR, exist_ok=True)
-    os.makedirs(ANNO_DIR, exist_ok=True)
-    # Load layout data
-    layout_json = json.load(open(args.json_path))
-    INSTANCE2PATHLIST = json.load(open(args.map_dict_path))
-    # Using multiprocessing to accelerate rendering
-    n_jobs = args.n_jobs
-    with multiprocessing.Pool(n_jobs) as p:
-        result = p.starmap(
-            render_layout, [(layout,) for layout in layout_json]
-        )
-    p.close()
-    p.join()

mesh-candidate_bestfit/utils/base.py DELETED Viewed

@@ -1,29 +0,0 @@
-import random
-class element(object):
-    def __init__(self, cx, cy, h, w, category, filepath):
-        self.cx = cx
-        self.cy = cy
-        self.h = h
-        self.w = w
-        self.category = category
-        self.filepath = filepath
-        self.ratio = h / w
-        self.area  = h * w
-    def gen_real_bbox(self):
-        self.real_cx, self.real_cy = self.cx, self.cy
-        self.real_w, self.real_h =  self.w*random.uniform(0.8,0.95), self.h*random.uniform(0.8,0.95)
-    def get_real_bbox(self):
-        return self.real_cx, self.real_cy, self.real_w, self.real_h
-    def __repr__(self):
-        return f'cx: {self.cx}, cy: {self.cy}, h:{self.h}, w:{self.w}, category:{self.category}'
-class Layout(object):
-    def __init__(self, cand_elements, align=None, fill=None):
-        self.cand_elements = cand_elements
-        self.align = align
-        self.fill = fill

mesh-candidate_bestfit/utils/process.py DELETED Viewed

@@ -1,42 +0,0 @@
-import os
-import json
-from .base import *
-def read_data(json_file):
-    """
-    Load elements from dataset json file.
-    Args:
-        json_file (str): A dataset json file path with coco format.
-    """
-    data = json.load(open(json_file))
-    category_id2name = {item['id']:item['name'] for item in data['categories']}
-    element_all = {'large':[], "small":[]}
-    image2anno = {image["id"]:image for image in data["images"]}
-    for anno in data["annotations"]:
-        H, W = image2anno[anno["image_id"]]["height"], image2anno[anno["image_id"]]["width"]
-        w, h = anno["bbox"][2], anno["bbox"][3]
-        if w/W < 0.01 or h/H < 0.01:
-            continue
-        anno_id, category_id = anno['id'], anno["category_id"]
-        e = element(cx=None, cy=None, h=h/H, w=w/W, category=category_id,filepath=f'{category_id2name[category_id]}/{anno_id}.jpg')
-        if w/W >= 0.05 and h/H >= 0.05:
-            element_all['large'].append(e)
-        else:
-            element_all['small'].append(e)
-    return element_all
-def sample_hw(width_range, ratio_range, max_height):
-    """
-    Randomly sample a (w,h) size for rendering from a given range.
-    Args:
-        width_range (list): Given range of width.
-        ratio_range (list): Given range of h/w ratio.
-        max_height (int): Upper bound of height.
-    """
-    w = random.randint(width_range[0], width_range[1])
-    ratio = random.uniform(ratio_range[0], ratio_range[1])
-    h = min(max_height, int(w*ratio))
-    return w, h

mesh-candidate_bestfit/visualize.ipynb DELETED Viewed

The diff for this file is too large to render. See raw diff