Google can tell dogs from mops. Can you?
Bret McGowen presented on Serverless machine learning at Google. You can watch his full talk here but here are my notes.
Serverless
Four principles:
- no need to manage/think about servers
- no upfront provisioning, scale as you go (can't be wrong about having enough capacity)
- pay per use
- stateless/ephemeral
Serverless at Google:
- Background functions: Cloud Storage, Cloud Pub/Sub
- HTTP functions: API, Webhooks, Browser
Machine Learning
Machine learning is using many examples to answer questions.
Machine Learning at Google:
- Use your own data: TensorFlow and Cloud Machine Learning Engine
- Pretrained ML models: Cloud (Vision, Speech, Natural Language, Translation) API, Cloud Video Intelligence
Specifics on capabilities of Cloud Vision API:
- Label detection (dog or mop?)
- Face detection (within the photo, here is the location of the face)
- OCR (read text from photos)
- Explicit content detection (violence/adult)
- Landmark detection (that's the Eiffel tower!)
- Local Detection (not sure)
Other Cloud Vision features:
- crop hints - suggested crop dimensions
- web annotations - suggested other metadata to search about your page - eg from a photo of an iconic car, it can tell you the model of car, what film it was from, where it probably is. And can give you other matching images to back it up.
Cloud event trigger walkthrough
Cloud storage -> Cloud Functions -> Cloud vision API
NLP: extract entities from a sentence, sentiment analysis, syntax analysis (parse sentence to a lemma so you can see the parts of speech dependency graph)
Speech API
Speech to text transcription in 110 languages.
Azar - uses cloud speech api and cloud translation api to talk
Also gives timestamp of each word on top of transcript.
Video Intelligence API
Look through the whole video to label things.