SottoASR

Features

Everything you need, nothing you don't

Completely Private

All audio capture and transcription happens on-device. No cloud APIs, no telemetry, no data collection. Your words stay on your Mac.

Blazing Fast

~44x real-time transcription on Apple Silicon. Powered by CoreML and the Apple Neural Engine for hardware-accelerated inference.

Push-to-Talk

Press Cmd+Shift+Space to talk, release to transcribe. Or use Cmd+Shift+D to toggle hands-free recording. All shortcuts are fully customizable.

AI Transcript Cleanup

On-device LLM automatically removes filler words, fixes punctuation, and polishes your transcriptions — all locally with a custom fine-tuned model.

25 Languages

Multilingual speech recognition powered by NVIDIA Parakeet TDT v3 — a 600M parameter model supporting 25 languages out of the box.

Menu Bar App

Lives quietly in your menu bar. No dock icon, no distractions. Appears only when you need it with a beautiful floating overlay.

Transcription History

Browse, search, and copy past transcriptions. Every recording is saved locally so you never lose what you said.

Paste at Cursor

Transcribed text is automatically pasted wherever your cursor is. Works in any app — editors, browsers, terminals, chat windows.

Lightweight

Just 14 MB app bundle. Models are downloaded once on first launch (~500 MB) and cached locally. Minimal resource usage when idle.

Download

Get started in seconds

v0.7.4 Latest Release

Requirements

macOS 14 Sonoma or later
Apple Silicon (M1, M2, M3, M4)
~500 MB disk space for models
Accessibility permission (for paste-at-cursor)

Download for macOS

Download the .dmg from the GitHub release page

Acknowledgements

Built on the shoulders of giants

SottoASR relies on outstanding open-source libraries and models. All 660+ dependencies use permissive or weak-copyleft licenses (MIT, Apache-2.0, BSD, MPL-2.0, Unicode-3.0, ISC, Zlib, CC-BY-4.0). See THIRD_PARTY_LICENSES for the full list.

Speech Recognition

NVIDIA Parakeet TDT v3

ASR model, 600M params, 25 languages

CC-BY-4.0

FluidAudio

CoreML/ANE inference engine (Swift)

Apache-2.0

parakeet-rs

ONNX Runtime Rust bindings

MIT

cpal

Cross-platform audio capture

Apache-2.0

hound

WAV encoding/decoding

Apache-2.0

rubato

Audio resampling

MIT

AI Transcript Cleanup

LiquidAI LFM2.5-350M

Base model for fine-tuned transcript cleanup

LFM-1.0

Apple MLX

Metal-native ML framework

MIT

mlx-lm

MLX language model inference

MIT

huggingface_hub

Model download and caching

Apache-2.0

Application Framework

Tauri v2

Native app shell (Rust + Web)

MIT

Svelte 5

Reactive UI framework

MIT

tauri-nspanel

macOS NSPanel overlay windows

MIT

Vite

Frontend build tool

MIT

Tokio

Async Rust runtime

MIT

serde

Serialization framework

MIT

Author

Juan Villa

Contributors

Ian Scofield
Young Park

Built with significant assistance from Claude Code by Anthropic.

Want to contribute? Check out the GitHub repository.

Features

Completely Private

Blazing Fast

Push-to-Talk

AI Transcript Cleanup

25 Languages

Menu Bar App

Transcription History

Paste at Cursor

Lightweight

Download

Requirements

Open Source

Acknowledgements

Speech Recognition

AI Transcript Cleanup

Application Framework