Magika is an open-source file-type detection tool from Google that replaces classic signature-based utilities with a compact deep-learning model. A custom model weighing only a few megabytes and trained on roughly 100 million samples across more than 200 content types identifies binary and textual formats in milliseconds on a single CPU. The project reports around 99% accuracy on its test set and already powers file-type routing at Google scale — Gmail, Drive, and Safe Browsing rely on it to send hundreds of billions of samples per week into the right security and content-policy scanners.
The tool is deliberately polyglot. A Rust-powered CLI lets you run `magika file.bin` on a server or developer machine; the Python package exposes a `Magika` class with streaming APIs and scoring thresholds; a JavaScript/TypeScript package targets Node and browsers for client-side detection; and the underlying Keras-trained ONNX model can be embedded in any language with an ONNX runtime. Magika has been integrated with VirusTotal and abuse.ch and is commonly used as a pre-filter in malware-analysis pipelines, data-lake ingestion, DLP tools, and forensic triage where GNU `file` and libmagic fall short on obfuscated or renamed inputs.
For AI-infrastructure teams Magika slots in wherever you need fast, language-agnostic content detection without external calls. It is Apache-2.0 licensed so it can ship inside commercial products, it runs offline so it is safe in regulated environments, and it returns a rich label plus a confidence score that you can threshold per use case. Typical deployments put Magika in front of virus scanners, attachment filters, LLM upload pipelines, and automated reverse-engineering workflows — anywhere a wrong file-type guess would send a file to the wrong processor. The project is actively maintained by Google's security team on GitHub with regular model and dataset updates.
