← argusred · self-serve cli · cos v2.0.19

Audit your code. Or attack it.

Two modes in one CLI. Security Scan reads the code. Pen Test attempts the exploits against systems you authorise.

$ brew install CosineAI/tap/cos

Pick modules, set the agent’s permissions, run. Output is a markdown report — location, severity, cause, and fix direction for every finding it could ground in your code.

Free install. Running scans requires an active $20/month Cosine subscription — the same login that runs Cosine’s coding agent.

$ cos login
$ cd path/to/your/repo
$ cos --security-scan

Before the scan runs.

cos v2.0.19 · Security Scan · setup
Scan Scope — 5 of 8 active
[×]Dependency Vulnerability Analysis
[×]Secret & Credential Detection
[×]SQL Injection / XSS Vectors
[ ]Authentication & Session Flows
[×]Input Validation & Sanitisation
[ ]CORS & CSP Misconfigurations
[ ]Cryptographic Weakness Scan
[×]File Permission & Access Controls
Agent Permissions
Terminal Access ( ) Enabled (•) Disabled ( ) Sandboxed
Network Requests ( ) Enabled (•) Disabled ( ) Sandboxed
File Write ( ) Enabled ( ) Disabled (•) Sandboxed
scroll 0% · a start · tab next · shift+tab prev · q quit

See the output.

Read a sample report
.cos/scan-2026-06-05.md
# Bank of Anthos — Security Audit Report
29,846 LOC / 391 files · 6 of 8 modules

## 1. Executive Summary

Overall risk rating: CRITICAL

Multiple critical and high-severity vulnerabilities:

  • Forgeable tokens across every ledger servicebalancereader, transactionhistory, and ledgerwriter verify JWTs against a single shared RSA public key with no issuer or audience claim binding. Combined with the hardcoded private key in the repo (see below), a token signed off-cluster passes verification at every service and authorises any account; per-service trust collapses to “do you have the repo.”
  • Disabled JWT signature verification in the frontend authentication helper
  • Integer overflow in financial transaction validation allowing balance bypass
  • SSRF and open redirect in the OAuth consent flow
  • Credentials transmitted in URL query strings on the login flow
  • Hardcoded secrets in version control, including an RSA private key used to sign JWTs

[ trimmed — full report includes per-module findings ]

Watch a scan run · 1m 26s

Won’t do.

Quick answers.

How long does a scan take?
Two data points: a 6-module scan of Bank of Anthos (~30k LOC) finished in ~10 minutes; a full scan of Symfony (~1.5M LOC) took ~40 minutes. Time scales sub-linearly with codebase size because modules run as a parallel swarm; the TUI shows a live estimate before you start.
What’s the output file?
A single markdown at .cos/scan-<date>.md with executive summary, per-module findings, location, severity, cause, and fix direction. The file stays on your machine.
What does it cost?
Install is free. Running scans requires an active $20/month Cosine subscription — the same login that runs Cosine’s coding agent. One account, both products.

Same CLI, second tab. The swarm goes offensive against systems you authorise — not just reading the code, attempting the exploits. Gated because the security implications are real; access is via booking, scope and authorisation written down before anything runs.

Before the pen test runs.

cos vnightly-906 · Pen Test · setup
Targets
Only add systems you are authorised to test. Press a to add a host or URL.
No targets added yet
Effort
Passive Aggressive
Recon Light ▲ Moderate Deep Aggressive
Active probing with crafted payloads. May trigger WAF rules or rate limits. No destructive actions. Suitable for staging environments.
[×]Port & service fingerprinting
[×]Header & TLS analysis
[×]Directory & endpoint enumeration
[×]Payload injection (SQLi, XSS, SSTI)
[ ]Brute-force credential spraying— Deep
[ ]Exploit chain construction— Aggressive
[ ]Denial-of-service resilience testing— Aggressive
Agent Permissions
Terminal Access (•) Enabled ( ) Disabled ( ) Sandboxed
Network Requests (•) Enabled ( ) Disabled ( ) Sandboxed
File Write ( ) Enabled ( ) Disabled (•) Sandboxed
Objective / Instructions
Estimate
Targets0 hosts EffortModerate (4 technique classes) Est. Time~1 min Agent Cycles~2–3 iterations
▶ Start Pentest Cancel
s start · tab next · shift+tab prev · 1/2 mode · a add target · q quit

See the output.

Read a sample engagement summary
.cos/pentest-2026-06-08.md
# api.your-app.com — Pen Test Engagement
booking 2A4F · 2026-06-08 · 4h22m · Moderate effort

## Executive Summary

Status: 2 critical, 1 high, 3 medium — all reproducible.

Scope: 2 hosts, 47 endpoints. Out-of-scope items deferred and flagged for next engagement.


## Confirmed Exploits

1. JWT signature bypass (CRITICAL · CVSS 8.6)
POST /v1/sessions/refresh — forged token with disabled signature verification, returned 200 OK with admin scope. Reproduction script included.

2. SSRF via OAuth consent redirect (HIGH · CVSS 7.4)
Open redirect on /oauth/authorize resolved arbitrary internal URLs. Reproduction included.

[ trimmed — full summary includes evidence and remediation per finding ]

Won’t do.

Quick answers.

How is this different from the scan?
The scan reads code and infers from what’s there. The pen test actually attempts the exploits against running systems you authorise — different binary mode, different agent behaviour, different deliverable (engagement summary, not audit report).
How does scoping work?
You provide hosts/endpoints plus written consent at booking. The agent’s network is scoped to that list — it can’t reach anything else, even if a finding suggests it should.
What does it cost?
Decided per engagement at booking. Scope and effort level determine the time-box; the time-box determines the price.

It’s a closed binary, built on Cosine’s own model.

cos runs on a model Cosine post-trained for offensive security, not an off-the-shelf API behind a prompt wrapper. We trained it because off-the-shelf models refuse the work this product does — a security scanner that won’t read the parts of your code worth attacking isn’t a security scanner.

Safety isn’t a layer of refusals you can talk the model out of. It’s a Go harness sitting below the model that intercepts every tool call before execution. In Security Scan mode, the harness deterministically blocks mutating tools (file writes, command execution) regardless of what the model wants — read-only is a guard, not a flag. In Pen Test mode, the same harness limits network egress to the targets you authorised at booking.

The binary you install with brew, curl, or winget is the same one we run internally. It is not open source. It runs locally on your machine. You can run cos behind a firewall and tcpdump what it does before trusting it on real code.