This paper presents a platform for self-learning cochlear insertion using computer vision in a three-dimensional surrogate model. Self-learning and practice experiences often improve the confidence associated with eventual real-world trials by novice medical trainees. This helps the trainees practice electrode insertion to minimize the effect of suboptimal electrode placement such as incomplete electrode insertion, electrode kinking, and electrode tip fold-over. Although existing mastoid fitting templates improve insertion trajectories, extensive training is still required. Current methods that use cadavers, virtual training, or physical models from reconstruction images are not good enough for training purposes. The model presented here simulates the dimensions, texture, and feel of inserting the electrode into the cochlea. Currently, the temporal bone is not included, hence it is not meant for practicing drilling and other procedures to access the cochlear. The insertion process is observed in real-time using a camera and a Graphical User Interface that not only shows the video feed, but also provides depth, trajectory, and speed measurements. In a trial conducted for medical trainees, there was an overall improvement in all four metrics after they were trained on the hardware/software. There was a 14.20% improvement in insertion depth, 44.24% reduction in insertion speed, 52.90% reduction in back-outs, and a 64.89% reduction in kinks/fold-overs. The advantage of this model is that medical trainees can use it as many times as they like, as the whole setup is easy, economical, and reusable.