This post compiles notes from Max Firtman's course in Frontendmasters.

What is a capability? The ability of a web browser to perform an action using typically a built-in API. Compatibility can be an issue, of course. So some of the capabilities may not be available in some browsers.
Maturity of the capabilities: green (mature), light-green (may not be available in every browse. Normally chromium-based browsers vs Firefox/Safari), yellow (not mature and only available in some browsers), red (you can't use them and may be added in the long term).
Resources to check the capabilities: MDN, caniuse.com, web.dev, webkit.org/blog (specifically for Safari), chromestatus.com, web.dev/baseline (multivendor, define a list of stable capabilities, stable version for the web, one list per year starting in 2022).
Core capabilities NOT covered in this course: fetch, web workers (for threading), WebAssembly, WebSocket, WebRTC, WebPerformance APIs, Network information (see course here), Device memory, WebOTP (one-time password), Web Crypto (cryptography), storage (another course), Web Components, CSS, 2D Canvas, WebGL, Pointer Lock (for gaming), screen capture, PWA (course), page visibility, background sync, background fetch, web push, and notifications, media session (all these in the course), Web Authentication, passkeys, credential management, (all these in the course).
Permissions: some are harmless, others have a privacy risk or have a cost. If there is a cost or risk some browsers may limit it: user engagement requirements or permission dialog to the user. Most will require HTTPS. Some capabilities will need the user interaction to be enabled (aka you can't trigger it on page load). Permissions are granted on an origin base (domain). If the user denies permission, the API won't be able to ask again, manual re-enablement is required. Permission may have no time limit or it may be limited by time or session or usage. They are enabled for the main navigation (the HTML document that was loaded) - it won't be available for iframes. If you want to turn a capability off: permission policy spec, it's an HTTP header (Permissions-Policy), for an iframe it's an HTML attribute.

Capabilities

Permissions API

1. Permissions API.

MDN Docs here

Sensors, Geolocation, and Input Devices

Sensors on mobile devices: accelerometer (3 axis - x, z, y), gyroscope (turning the phone - alpha, beta, gamma), magnetometer (compass), proximity (to the user), light sensor.

2. Two ways to consume them on the web: old APIs (global DOM APIs, that is how you need to do it in iPhones and iPads. It only supports magnetometer, gyroscope, and accelerometer. Done with event listeners. i.e. `devicemotion` and `deviceorientation` (MDN docs here) and here. They need user permission), Sensor API (not yet in Safari nor Firefox, see MDN).

3. Geolocation API (one of the first capabilities on the web).

Google, Apple, etc. know your location because of WiFi - your devices know which WiFis are around and these companies can locate you because they have a database of SSIDs and their location. The API will give you the location, it's provider-agnostic (wifi, GPS, etc.). Works only in the foreground - won't work in the background/workers. Since the API is old, it's callback-based (no promises). MDN docs here.

4. Screen orientation API.

Green availability: if the device is in portrait or landscape mode. Other stuff is not yet widely available: lock the screen.

5. Touch events API.

They work with touch screens: touchstart, touchend, touchcancel, etc. Apple proprietary. MDN docs here. That is why the following API was created:

6. Pointer events API.

MDN docs here. Based on mouse events, and multi-pointer (they work with mouse, trackpad, touch pen, stylus, etc). YOu receive one event per touch interaction (ie. 3 fingers you get 3 events vs 1 event in touch event API above).

7. VirtualKeyboard API.

MDN docs here. This API lets us show/hide the virtual keyboard and know how much space it is taking in the screen.

8. Gamepad API.

MDN docs here. Greenlight API. High-level API (I don't need to know the specifics of the gamepad, and it's the minimum that all gamepads have i.e. if a gamepad has a speaker that is not common, so it is not supported via the API) and low latency. You create a requestAnimationFrame loop (60 fps) and you check the status of each button with a boolean.

9. Web HID API (HID = Human Interface Device).

MDN docs here. Limited availability (light-green).

Speech, voice, and camera

10. Web Speech API.

Speech recognition API

MDN docs here.

Speech synthesis API.

We can make the web app speak.

11. Shape detection API. Between yellow and red.

Barcode detector API. MDN docs here. Experimental, it is only available for Chrome on Android as of this writing. Chromestatus here

Face detector API. Experimental, it is only available for Chrome on Android as of this writing.

Text detector API (OCR). Experimental, it is only available for Chrome on Android as of this writing.

12. Media devices. It lets you open the camera and microphone, and get the stream.

13. Advanced control camera (aka Camera PTZ - Pan, Tilt, Zoom). Let's you manipulate the camera.

13. Augmented reality (AR, XR).

14. Screen Wake Lock API. MDN docs here. Light green (not supported on Safari on iOS).

External hardware and Devices

15. Web Bluetooth API. Limited availability. MDN docs here. FF and Safari don't implement it and looks like they will never do it, because of reasons. So only on Chromium-based browsers for now. Only with BLE devices (Bluetooth Low Energy).

16. Web Audio API. MDN docs here. Green. Low-level API that lets you generate dynamic audio, 3D audio, and ultrasound communication with devices. Sonic socket library for ultrasound communication between devices.

A tour of web capabilities