Today at DevFestNYC, Josh Gordon explained how Tensorflow is being used to detect cancer and diabetes. What's amazing is that it is not much different from training a neural network for recognizing cats and dogs! Here is a brief summary of the talk:
- Data - Get a lot of data. You need millions of images of whatever you are training on.
- MORE Data - no really, you need more. Sometimes your data can be bad, for example if your doctor panel disagrees with each other, or worse still, disagrees with themselves. (humans, who needs them?)
- Setup - Tensorflow now comes with Keras (awesome), and Keras has inbuilt applications of which InceptionV3 is pretty good (awesomer), although Josh also shouted out to NasNet (a type of AutoML) as a neural network that trains itself (galactic brain exploding).
- Fuzz - Here is the art form and area of active research. Josh explained some key ideas, from using a sliding window to applying rotation/contrast/other filters to milk all you can out of the image data so you can, for example, recognize the same thing it is trained to recognize, even if it was flipped upside down. The most interesting part here was how they looked at how real doctors look at slides in the microscope, zooming in and out to get different contexts, and achieved amazing results by replicating that behavior simply by copying and pasting their code 4 times and running their model at 4 different zoom levels on the same dataset!!
- Train - this takes on the order of 2 days (Google Cloud) to a week (local machine with 10 GPUs). If you are a researcher, Google offers 1000 TPUs FOR FREE to you to use if you apply here.
- Deploy - This is actually the hardest thing, which is making your models useful for regular nontechnical people to use.
The goal is for Machine Learning should be so routine it is boring. Hopefully I've made that boringness interesting!