aicoolies logo
Magika logo

Magika

AI-powered file-type detection at Google scale

Share
freeOpen Source
Visit Website →

Open-source AI-powered file-type detection tool from Google that uses a custom deep-learning model under a few megabytes to identify more than 200 binary and textual content types in milliseconds, even on a single CPU. Magika ships as a CLI, Python package, JavaScript/TypeScript library, and an ONNX model, achieves around 99% accuracy on its test set, and is already used at Google scale across Gmail, Drive, and Safe Browsing as well as by VirusTotal and abuse.ch.

Magika is an open-source file-type detection tool from Google that replaces classic signature-based utilities with a compact deep-learning model. A custom model weighing only a few megabytes and trained on roughly 100 million samples across more than 200 content types identifies binary and textual formats in milliseconds on a single CPU. The project reports around 99% accuracy on its test set and already powers file-type routing at Google scale — Gmail, Drive, and Safe Browsing rely on it to send hundreds of billions of samples per week into the right security and content-policy scanners.

The tool is deliberately polyglot. A Rust-powered CLI lets you run `magika file.bin` on a server or developer machine; the Python package exposes a `Magika` class with streaming APIs and scoring thresholds; a JavaScript/TypeScript package targets Node and browsers for client-side detection; and the underlying Keras-trained ONNX model can be embedded in any language with an ONNX runtime. Magika has been integrated with VirusTotal and abuse.ch and is commonly used as a pre-filter in malware-analysis pipelines, data-lake ingestion, DLP tools, and forensic triage where GNU `file` and libmagic fall short on obfuscated or renamed inputs.

For AI-infrastructure teams Magika slots in wherever you need fast, language-agnostic content detection without external calls. It is Apache-2.0 licensed so it can ship inside commercial products, it runs offline so it is safe in regulated environments, and it returns a rich label plus a confidence score that you can threshold per use case. Typical deployments put Magika in front of virus scanners, attachment filters, LLM upload pipelines, and automated reverse-engineering workflows — anywhere a wrong file-type guess would send a file to the wrong processor. The project is actively maintained by Google's security team on GitHub with regular model and dataset updates.

Pricing

Free and open source (Apache-2.0) — no paid tier, no hosted SaaS

Platforms

CLI (cross-platform), Python package, JavaScript/TypeScript package, and ONNX model embeddable in any language

Categories

Tags

Use Cases

Related Tools

Deep Lake

AI data runtime for multimodal datasets and vector search

Deep Lake is an open-source AI data runtime from Activeloop for storing, versioning, and querying multimodal data and embeddings. It fits teams building RAG, training, evaluation, or dataset-heavy agent workflows that need a bridge between vector search, structured metadata, and large image, text, audio, or video collections.

open-sourceOpen Source

SeekDB

AI-native state store with hybrid vector and full-text search

SeekDB is an open-source AI-native state store from the OceanBase ecosystem that combines MySQL-compatible data access with hybrid vector and full-text retrieval. It targets agent and AI application teams that need embedded or server deployment, copy-on-write style sandboxes, and searchable state without gluing together several separate storage layers.

open-sourceOpen Source
Agent Governance Toolkit logo

Agent Governance Toolkit

Microsoft’s open-source toolkit for adding policy enforcement, identity, sandboxing, and audit controls to production AI agents.

Agent Governance Toolkit is an open-source Microsoft project for teams moving AI agents from demos into controlled production workflows. It focuses on runtime policy enforcement, zero-trust identity, sandboxed execution, and reliability patterns around autonomous agents, giving security and platform teams a governance layer around tool calls and agent actions rather than another prompt-only guardrail.

open-sourceOpen SourceTelemetry
Baz logo

Baz

Telemetry-aware AI code reviewer that checks how pull requests may affect real services.

Baz is an AI code-review platform focused on production-aware pull requests. Instead of only reading the diff, Baz connects code changes to application telemetry so reviewers can understand what endpoints, services, and runtime behavior may be affected. That makes it a useful complement to existing AI PR bots when the question is not just whether a change looks correct, but whether it could break a live system.

freemiumTelemetry
rampart

Rampart

Microsoft’s pytest-native red teaming framework for turning AI agent safety findings into CI tests.

RAMPART is an open-source Microsoft framework for safety and security testing of agentic AI applications. It brings red-team findings into a pytest-native workflow so teams can turn prompt injection, unsafe tool use, and behavioral boundary failures into repeatable regression tests. The strongest aicoolies angle is developer workflow: RAMPART makes agent safety part of CI/CD instead of a one-off security review.

open-sourceOpen Source
Statewright logo

Statewright

State-machine guardrails for controlling which tools AI coding agents can use at each phase.

Statewright is a guardrail layer for AI coding agents that uses explicit state machines to control what an agent can do at each stage of a workflow. Instead of relying only on prompt instructions, teams can model phases such as plan, implement, test, and review, then constrain tool access for clients like Claude Code, Codex, Cursor, opencode, and related MCP workflows.

open-sourceOpen Source