Object motion trajecting using computer vision is a technology enabler for various smart manufacturing systems. Sub-pixel displacement estimation is still unsatisfactory with the existing tracking algorithms. In this paper, we extend the popular computer vision task, optical flow, to solve the small displacement detection problem. Since conventional optical flow methods have weakness in robustness and poor performance at the boundary region, convolutional neural networks (CNNs) based approach has been developed to solve optical flow problems. We construct a new multi-scale CNN, OF-NET, for sub-pixel optical flow estimation. In the model, we adopt an inverse-pyramid structure to enlarge the small displacement to larger-scale feature maps for motion detection. A novel data structure with multi-level ground truth is adopted to synthesize the dataset for training. The results have demonstrated competitive performance and efficiency compared with the existing state-of-the-art, FlowNetC, and the conventional optical flow method. Validation results from our model reach an end point error (EPE) at the level of 0.01 pixels. Our model excels in identifying the boundaries of moving objects compared with the other reference methods. The efficiency of the model has been optimized by using anisotropic upscaling and independent learning in two directions. Real-time tracking of 25–35 fps is achievable with the proposed model. The model is also verified with experimental results with good performance.