Note: The default ITS GitLab runner is a shared resource and is subject to slowdowns during heavy usage.
You can run your own GitLab runner that is dedicated just to your group if you need to avoid processing delays.

Commit 18184366 authored by bernie wang's avatar bernie wang
Browse files

added Mask RCNN to repo

parent b28ff7a2
.ipynb_checkpoints/
__pycache__/
mask_rcnn_coco.h5
Mask R-CNN
The MIT License (MIT)
Copyright (c) 2017 Matterport, Inc.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
# Mask R-CNN for Object Detection and Segmentation
This is an implementation of [Mask R-CNN](https://arxiv.org/abs/1703.06870) on Python 3, Keras, and TensorFlow. The model generates bounding boxes and segmentation masks for each instance of an object in the image. It's based on Feature Pyramid Network (FPN) and a ResNet101 backbone.
![Instance Segmentation Sample](assets/street.png)
The repository includes:
* Source code of Mask R-CNN built on FPN and ResNet101.
* Training code for MS COCO
* Pre-trained weights for MS COCO
* Jupyter notebooks to visualize the detection pipeline at every step
* ParallelModel class for multi-GPU training
* Evaluation on MS COCO metrics (AP)
* Example of training on your own dataset
The code is documented and designed to be easy to extend. If you use it in your research, please consider referencing this repository. If you work on 3D vision, you might find our recently released [Matterport3D](https://matterport.com/blog/2017/09/20/announcing-matterport3d-research-dataset/) dataset useful as well.
This dataset was created from 3D-reconstructed spaces captured by our customers who agreed to make them publicly available for academic use. You can see more examples [here](https://matterport.com/gallery/).
# Getting Started
* [demo.ipynb](/demo.ipynb) Is the easiest way to start. It shows an example of using a model pre-trained on MS COCO to segment objects in your own images.
It includes code to run object detection and instance segmentation on arbitrary images.
* [train_shapes.ipynb](train_shapes.ipynb) shows how to train Mask R-CNN on your own dataset. This notebook introduces a toy dataset (Shapes) to demonstrate training on a new dataset.
* ([model.py](model.py), [utils.py](utils.py), [config.py](config.py)): These files contain the main Mask RCNN implementation.
* [inspect_data.ipynb](/inspect_data.ipynb). This notebook visualizes the different pre-processing steps
to prepare the training data.
* [inspect_model.ipynb](/inspect_model.ipynb) This notebook goes in depth into the steps performed to detect and segment objects. It provides visualizations of every step of the pipeline.
* [inspect_weights.ipynb](/inspect_weights.ipynb)
This notebooks inspects the weights of a trained model and looks for anomalies and odd patterns.
# Step by Step Detection
To help with debugging and understanding the model, there are 3 notebooks
([inspect_data.ipynb](inspect_data.ipynb), [inspect_model.ipynb](inspect_model.ipynb),
[inspect_weights.ipynb](inspect_weights.ipynb)) that provide a lot of visualizations and allow running the model step by step to inspect the output at each point. Here are a few examples:
## 1. Anchor sorting and filtering
Visualizes every step of the first stage Region Proposal Network and displays positive and negative anchors along with anchor box refinement.
![](assets/detection_anchors.png)
## 2. Bounding Box Refinement
This is an example of final detection boxes (dotted lines) and the refinement applied to them (solid lines) in the second stage.
![](assets/detection_refinement.png)
## 3. Mask Generation
Examples of generated masks. These then get scaled and placed on the image in the right location.
![](assets/detection_masks.png)
## 4.Layer activations
Often it's useful to inspect the activations at different layers to look for signs of trouble (all zeros or random noise).
![](assets/detection_activations.png)
## 5. Weight Histograms
Another useful debugging tool is to inspect the weight histograms. These are included in the inspect_weights.ipynb notebook.
![](assets/detection_histograms.png)
## 6. Logging to TensorBoard
TensorBoard is another great debugging and visualization tool. The model is configured to log losses and save weights at the end of every epoch.
![](assets/detection_tensorboard.png)
## 6. Composing the different pieces into a final result
![](assets/detection_final.png)
# Training on MS COCO
We're providing pre-trained weights for MS COCO to make it easier to start. You can
use those weights as a starting point to train your own variation on the network.
Training and evaluation code is in coco.py. You can import this
module in Jupyter notebook (see the provided notebooks for examples) or you
can run it directly from the command line as such:
```
# Train a new model starting from pre-trained COCO weights
python3 coco.py train --dataset=/path/to/coco/ --model=coco
# Train a new model starting from ImageNet weights
python3 coco.py train --dataset=/path/to/coco/ --model=imagenet
# Continue training a model that you had trained earlier
python3 coco.py train --dataset=/path/to/coco/ --model=/path/to/weights.h5
# Continue training the last model you trained. This will find
# the last trained weights in the model directory.
python3 coco.py train --dataset=/path/to/coco/ --model=last
```
You can also run the COCO evaluation code with:
```
# Run COCO evaluation on the last trained model
python3 coco.py evaluate --dataset=/path/to/coco/ --model=last
```
The training schedule, learning rate, and other parameters should be set in coco.py.
# Training on Your Own Dataset
To train the model on your own dataset you'll need to sub-class two classes:
```Config```
This class contains the default configuration. Subclass it and modify the attributes you need to change.
```Dataset```
This class provides a consistent way to work with any dataset.
It allows you to use new datasets for training without having to change
the code of the model. It also supports loading multiple datasets at the
same time, which is useful if the objects you want to detect are not
all available in one dataset.
The ```Dataset``` class itself is the base class. To use it, create a new
class that inherits from it and adds functions specific to your dataset.
See the base `Dataset` class in utils.py and examples of extending it in train_shapes.ipynb and coco.py.
## Differences from the Official Paper
This implementation follows the Mask RCNN paper for the most part, but there are a few cases where we deviated in favor of code simplicity and generalization. These are some of the differences we're aware of. If you encounter other differences, please do let us know.
* **Image Resizing:** To support training multiple images per batch we resize all images to the same size. For example, 1024x1024px on MS COCO. We preserve the aspect ratio, so if an image is not square we pad it with zeros. In the paper the resizing is done such that the smallest side is 800px and the largest is trimmed at 1000px.
* **Bounding Boxes**: Some datasets provide bounding boxes and some provide masks only. To support training on multiple datasets we opted to ignore the bounding boxes that come with the dataset and generate them on the fly instead. We pick the smallest box that encapsulates all the pixels of the mask as the bounding box. This simplifies the implementation and also makes it easy to apply certain image augmentations that would otherwise be really hard to apply to bounding boxes, such as image rotation.
To validate this approach, we compared our computed bounding boxes to those provided by the COCO dataset.
We found that ~2% of bounding boxes differed by 1px or more, ~0.05% differed by 5px or more,
and only 0.01% differed by 10px or more.
* **Learning Rate:** The paper uses a learning rate of 0.02, but we found that to be
too high, and often causes the weights to explode, especially when using a small batch
size. It might be related to differences between how Caffe and TensorFlow compute
gradients (sum vs mean across batches and GPUs). Or, maybe the official model uses gradient
clipping to avoid this issue. We do use gradient clipping, but don't set it too aggressively.
We found that smaller learning rates converge faster anyway so we go with that.
* **Anchor Strides:** The lowest level of the pyramid has a stride of 4px relative to the image, so anchors are created at every 4 pixel intervals. To reduce computation and memory load we adopt an anchor stride of 2, which cuts the number of anchors by 4 and doesn't have a significant effect on accuracy.
## Contributing
Contributions to this repository are welcome. Examples of things you can contribute:
* Speed Improvements. Like re-writing some Python code in TensorFlow or Cython.
* Training on other datasets.
* Accuracy Improvements.
* Visualizations and examples.
You can also [join our team](https://matterport.com/careers/) and help us build even more projects like this one.
## Requirements
* Python 3.4+
* TensorFlow 1.3+
* Keras 2.0.8+
* Jupyter Notebook
* Numpy, skimage, scipy, Pillow
### MS COCO Requirements:
To train or test on MS COCO, you'll also need:
* pycocotools (installation instructions below)
* [MS COCO Dataset](http://cocodataset.org/#home)
* Download the 5K [minival](https://dl.dropboxusercontent.com/s/o43o90bna78omob/instances_minival2014.json.zip?dl=0)
and the 35K [validation-minus-minival](https://dl.dropboxusercontent.com/s/s3tw5zcg7395368/instances_valminusminival2014.json.zip?dl=0)
subsets. More details in the original [Faster R-CNN implementation](https://github.com/rbgirshick/py-faster-rcnn/blob/master/data/README.md).
If you use Docker, the code has been verified to work on
[this Docker container](https://hub.docker.com/r/waleedka/modern-deep-learning/).
## Installation
1. Clone this repository
2. Download pre-trained COCO weights (mask_rcnn_coco.h5) from the [releases page](https://github.com/matterport/Mask_RCNN/releases).
3. (Optional) To train or test on MS COCO install `pycocotools` from one of these repos. They are forks of the original pycocotools with fixes for Python3 and Windows (the official repo doesn't seem to be active anymore).
* Linux: https://github.com/waleedka/coco
* Windows: https://github.com/philferriere/cocoapi.
You must have the Visual C++ 2015 build tools on your path (see the repo for additional details)
## More Examples
![Sheep](assets/sheep.png)
![Donuts](assets/donuts.png)
import os
import sys
import random
import math
import numpy as np
import skimage.io
import matplotlib
import matplotlib.pyplot as plt
import coco
import utils
import model as modellib
import visualize
#%matplotlib inline
# Root directory of the project
ROOT_DIR = os.getcwd()
# Directory to save logs and trained model
MODEL_DIR = os.path.join(ROOT_DIR, "logs")
# Local path to trained weights file
COCO_MODEL_PATH = "mask_rcnn_balloon_0020.h5"
# Directory of images to run detection on
IMAGE_DIR = os.path.join(ROOT_DIR, "images")
class InferenceConfig(coco.CocoConfig):
# Set batch size to 1 since we'll be running inference on
# one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
GPU_COUNT = 1
IMAGES_PER_GPU = 1
config = InferenceConfig()
config.display()
# Create model object in inference mode.
model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config)
# Load weights trained on MS-COCO
model.load_weights(COCO_MODEL_PATH, by_name=True)
# COCO Class names
# Index of the class in the list is its ID. For example, to get ID of
# the teddy bear class, use: class_names.index('teddy bear')
class_names = ['BG']
# Load a random image from the images folder
file_names = next(os.walk(IMAGE_DIR))[2]
#image = skimage.io.imread(os.path.join(IMAGE_DIR, random.choice(file_names)))
image = skimage.io.imread("testimage/ballon1.jpg")
# Run detection
results = model.detect([image], verbose=1)
# Visualize results
r = results[0]
visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'], class_names, r['scores'])
print('OK')
import os
import numpy as np
root_dir = os.path.dirname(os.path.abspath(__file__))
data_dir = os.path.join(root_dir, 'data')
image_shape = 375, 1242
def get_drive_dir(drive, date='2011_09_26'):
return os.path.join(data_dir, date, date + '_drive_%04d_sync' % drive)
def get_inds(path, ext='.png'):
inds = [int(os.path.splitext(name)[0]) for name in os.listdir(path)
if os.path.splitext(name)[1] == ext]
inds.sort()
return inds
def read_calib_file(path):
float_chars = set("0123456789.e+- ")
data = {}
# print(path)
with open(path, 'r') as f:
for line in f.readlines():
key, value = line.split(':', 1)
value = value.strip()
data[key] = value
if float_chars.issuperset(value):
# try to cast to float array
try:
data[key] = np.array(list(map(float, value.split(' '))))
except ValueError:
pass # casting error: data[key] already eq. value, so pass
# print(data)
return data
def homogeneous_transform(points, transform, keep_last=False):
"""
Parameters
----------
points : (n_points, M) array-like
The points to transform. If `points` is shape (n_points, M-1), a unit
homogeneous coordinate will be added to make it (n_points, M).
transform : (M, N) array-like
The right-multiplying transformation to apply.
"""
points = np.asarray(points)
transform = np.asarray(transform)
n_points, D = points.shape
M, N = transform.shape
# do transformation in homogeneous coordinates
if D == M - 1:
points = np.hstack([points, np.ones((n_points, 1), dtype=points.dtype)])
elif D != M:
raise ValueError("Number of dimensions of points (%d) does not match"
"input dimensions of transform (%d)." % (D, M))
new_points = np.dot(points, transform)
# normalize homogeneous coordinates
if not keep_last:
new_points = new_points[:, :-1] / new_points[:, [-1]]
else:
new_points = np.hstack(
[new_points[:, :-1]/new_points[:, [-1]], new_points[:, [-1]]])
return new_points
def filter_disps(xyd, shape, max_disp=255, return_mask=False):
x, y, d = xyd.T
mask = ((x >= 0) & (x <= shape[1] - 1) &
(y >= 0) & (y <= shape[0] - 1) &
(d >= 0) & (d <= max_disp))
xyd = xyd[mask]
return (xyd, mask) if return_mask else xyd
def filter_depths(xyd, shape):
x, y, d = xyd.T
mask = ((x >= 0) & (x <= shape[1] - 1) &
(y >= 0) & (y <= shape[0] - 1))
xyd = xyd[mask]
return xyd
class Calib(object):
"""Convert between coordinate frames.
This class loads the calibration data from file, and creates the
corresponding transformations to convert between various coordinate
frames.
Each `get_*` function returns a 3D transformation in homogeneous
coordinates between two frames. All transformations are right-multiplying,
and can be applied with `homogeneous_transform`.
"""
def __init__(self, calib_dir, color=False):
self.calib_dir = calib_dir
self.imu2velo = read_calib_file(
os.path.join(self.calib_dir, "calib_imu_to_velo.txt"))
self.velo2cam = read_calib_file(
os.path.join(self.calib_dir, "calib_velo_to_cam.txt"))
self.cam2cam = read_calib_file(
os.path.join(self.calib_dir, "calib_cam_to_cam.txt"))
self.color = color
def get_imu2velo(self):
RT_imu2velo = np.eye(4)
RT_imu2velo[:3, :3] = self.imu2velo['R'].reshape(3, 3)
RT_imu2velo[:3, 3] = self.imu2velo['T']
return RT_imu2velo.T
def get_velo2rect(self):
RT_velo2cam = np.eye(4)
RT_velo2cam[:3, :3] = self.velo2cam['R'].reshape(3, 3)
RT_velo2cam[:3, 3] = self.velo2cam['T']
R_rect00 = np.eye(4)
R_rect00[:3, :3] = self.cam2cam['R_rect_00'].reshape(3, 3)
RT_velo2rect = np.dot(R_rect00, RT_velo2cam)
return RT_velo2rect.T
def get_rect2disp(self):
cam0, cam1 = (0, 1) if not self.color else (2, 3)
P_rect0 = self.cam2cam['P_rect_%02d' % cam0].reshape(3, 4)
P_rect1 = self.cam2cam['P_rect_%02d' % cam1].reshape(3, 4)
P0, P1, P2 = P_rect0
Q0, Q1, Q2 = P_rect1
# assert np.array_equal(P1, Q1), "\n%s\n%s" % (P1, Q1)
# assert np.array_equal(P2, Q2), "\n%s\n%s" % (P2, Q2)
# create disp transform
T = np.array([P0, P1, P0 - Q0, P2])
return T.T
def get_imu2rect(self):
return np.dot(self.get_imu2velo(), self.get_velo2rect())
def get_imu2disp(self):
return np.dot(self.get_imu2rect(), self.get_rect2disp())
def get_velo2disp(self):
return np.dot(self.get_velo2rect(), self.get_rect2disp())
def get_disp2rect(self):
return np.linalg.inv(self.get_rect2disp())
def get_disp2imu(self):
return np.linalg.inv(self.get_imu2disp())
def rect2disp(self, points):
return homogeneous_transform(points, self.get_rect2disp())
def disp2rect(self, xyd):
return homogeneous_transform(xyd, self.get_disp2rect())
def velo2rect(self, points):
return homogeneous_transform(points, self.get_velo2rect())
def velo2disp(self, points):
return homogeneous_transform(points, self.get_velo2disp())
def imu2rect(self, points):
return homogeneous_transform(points, self.get_imu2rect())
def rect2imu(self, points):
return homogeneous_transform(points, self.get_rect2imu())
def filter_disps(self, xyd, max_disp=255, return_mask=False):
return filter_disps(
xyd, image_shape, max_disp=max_disp, return_mask=return_mask)
def get_proj(self, cam_idx):
assert cam_idx == 0 or cam_idx == 1 or cam_idx == 2 or cam_idx == 3, \
'cam_idx should be 0, 1, 2, or 3 only'
P = self.cam2cam['P_rect_%02d' % cam_idx].reshape(3, 4)
return P.T
def get_velo2depth(self, cam_idx):
return np.dot(self.get_velo2rect(), self.get_proj(cam_idx))
def velo2depth(self, points, cam_idx):
return homogeneous_transform(points, self.get_velo2depth(cam_idx),
keep_last=True)
def velo2img(self, points, cam_idx):
return homogeneous_transform(points, self.get_velo2depth(cam_idx))
def filter_depths(self, xyd, image_shape):
return filter_depths(xyd, image_shape)
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Fri Aug 17 14:45:29 2018
@author: bc
"""
import matplotlib
matplotlib.use('TkAgg')
import tkinter as tk
from tkinter import filedialog
def openFile():
root = tk.Tk()
root.update()
filename = filedialog.askopenfilename(title = "Select image file")
root.destroy()
return filename
def openBin():
binfile = tk.Tk()
binfile.update()
binfilename = filedialog.askopenfilename(title = "Select bin file")
binfile.destroy()
return binfilename
\ No newline at end of file
"""
Mask R-CNN
Configurations and data loading code for MS COCO.
Copyright (c) 2017 Matterport, Inc.
Licensed under the MIT License (see LICENSE for details)
Written by Waleed Abdulla
------------------------------------------------------------
Usage: import the module (see Jupyter notebooks for examples), or run from
the command line as such:
# Train a new model starting from pre-trained COCO weights
python3 coco.py train --dataset=/path/to/coco/ --model=coco
# Train a new model starting from ImageNet weights
python3 coco.py train --dataset=/path/to/coco/ --model=imagenet
# Continue training a model that you had trained earlier
python3 coco.py train --dataset=/path/to/coco/ --model=/path/to/weights.h5
# Continue training the last model you trained
python3 coco.py train --dataset=/path/to/coco/ --model=last
# Run COCO evaluatoin on the last model you trained
python3 coco.py evaluate --dataset=/path/to/coco/ --model=last
"""
import os
import time
import numpy as np
# Download and install the Python COCO tools from https://github.com/waleedka/coco
# That's a fork from the original https://github.com/pdollar/coco with a bug
# fix for Python 3.
# I submitted a pull request https://github.com/cocodataset/cocoapi/pull/50
# If the PR is merged then use the original repo.
# Note: Edit PythonAPI/Makefile and replace "python" with "python3".
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
from pycocotools import mask as maskUtils
from config import Config
import utils
import model as modellib
# Root directory of the project
ROOT_DIR = os.getcwd()
# Path to trained weights file
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")
# Directory to save logs and model checkpoints, if not provided
# through the command line argument --logs
DEFAULT_LOGS_DIR = os.path.join(ROOT_DIR, "logs")
############################################################
# Configurations
############################################################
class CocoConfig(Config):
"""Configuration for training on MS COCO.
Derives from the base Config class and overrides values specific
to the COCO dataset.
"""
# Give the configuration a recognizable name
NAME = "coco"
# We use a GPU with 12GB memory, which can fit two images.
# Adjust down if you use a smaller GPU.
IMAGES_PER_GPU = 2
# Uncomment to train on 8 GPUs (default is 1)
# GPU_COUNT = 8
# Number of classes (including background)
NUM_CLASSES = 1 + 80 # COCO has 80 classes
############################################################
# Dataset
############################################################
class CocoDataset(utils.Dataset):