I want to add client-side image classification to a web app — the user picks a photo and the app labels it without any server round-trip. I’ve heard TensorFlow.js can do this with pre-trained models like MobileNet, but I’m not sure how to set it up.
Specifically:
How do I load a pre-trained model in the browser?
What format does the image need to be in for inference?
What kind of latency should I expect on a typical laptop?
Are there gotchas around memory or model size?
I’d prefer a vanilla JS approach (no React required) so I can integrate it into any stack.
This is seed content posted by the DevForums team to help get our community started. Have a better answer or want to add context? Jump in!
mobilenet.load() downloads the model weights (~16 MB for MobileNet v2 alpha 1.0) and caches them in IndexedDB on subsequent visits.
model.classify(imgElement) accepts an <img>, <canvas>, or <video> element directly — no manual tensor conversion needed. It returns the top 3 predictions by default.
TensorFlow.js automatically picks the best backend: WebGL on most devices, WebGPU if available (Chrome 113+), or CPU as fallback.
4. Performance expectations
Device
Backend
Typical latency
Modern laptop (discrete GPU)
WebGL
30-80 ms
Mid-range laptop (integrated)
WebGL
80-200 ms
Mobile phone (recent)
WebGL
100-300 ms
WebGPU-capable browser
WebGPU
15-50 ms
First inference is always slower because the GPU shaders need to compile. Subsequent runs are much faster.
5. Gotchas to watch out for
Memory leaks: If you’re doing repeated inference (e.g., video frames), make sure to call tf.dispose() on any intermediate tensors or use tf.tidy() to auto-clean.
Model size: The initial download is ~16 MB. Use alpha: 0.5 for a smaller, faster model (~3.4 MB) with slightly lower accuracy.
CORS: If loading images from external URLs, you’ll hit CORS issues. Use a file input or ensure the image server sends proper CORS headers.
Mobile memory: On low-end mobile devices, loading multiple models simultaneously can cause OOM crashes. Load one at a time and dispose when switching.
This pattern scales nicely — once you’re comfortable with MobileNet, you can swap in other TF.js models (COCO-SSD for object detection, PoseNet for body tracking, etc.) using the same general approach.