With the rise of machine learning and artificial intelligence, organizations are looking to adopt more GPUs (Graphics Processing Units) as they can be orders of magnitudes faster than standard CPUs. With recent advance on deep learning models in self-driving car areas such as lane-detection, perception and so on, it is important to enable distributed deep learning with large-scale GPU clusters.
GPU-enabled clusters are usually dedicated to a specific team or shared across teams. These two scenarios mean that GPUs are either underutilized or overutilized during peak times, leading to increased delays and a waste of precious time for the data science team and cloud resources. Existing tools do not allow dynamic allocation of resources while also guaranteeing performance and isolation
This workshop will show how DC/OS supports allocating GPUs and Machine learning frameworks to different services and teams.
Participants will learn hands-on with pre-provisioned cluster about:
Setting up GPU isolation in DC/OS
Deploying different Tensorflow instances on DC/OS utilizing these GPU resources
Deploying a complete pipeline for Twitter sentiment analysis with Tensorflow on DC/OS