Audio and video transcription tools for converting speech to text
Find transcription tools to convert audio and video into accurate text quickly. Browse, compare, and discover top AI transcriber platforms on AICavo.
Loading...

Create ebooks, flipbooks, audiobooks, podcasts, and designs by converting ideas, URLs, videos, files, or voice notes into structured, publish-ready content with AI.

Notable AI is a tool that summarizes key points from videos, articles, and other content, enabling users to capture, vote on, search, and manage takeaways in one place.

Telnyx provides a cloud communications platform that enables AI agents and applications to make, receive, and manage carrier-grade voice calls over global IP networks.

Writeout AI is a web-based tool that automatically transcribes and translates uploaded audio files into text, supporting multiple languages and fast processing.

Recast is an AI-powered video editor that enables marketing teams to edit, repurpose, and adapt existing video and audio content without advanced video editing skills.

VOMO AI is a platform that analyzes user feedback and behavior to automatically generate prioritized product insights, feature ideas, and data-driven recommendations for product teams.

Free Subtitles Generator is a web-based tool that automatically generates, edits, and downloads subtitles for videos in multiple languages using AI-powered speech recognition.
Verbit AI is a platform that uses artificial intelligence to provide transcription, captioning, and related speech-to-text services for media, education, and enterprise content.

Laborai is an AI-powered productivity tool that automates repetitive digital tasks, such as data entry, content drafting, and routine communication, to streamline personal and professional workflows.

Auden is an AI tool that records meetings, lectures, and conversations and generates organized, context-aware summaries across Mac, Windows, iOS, and Android.

Civils AI is a construction-focused AI platform that extracts quantities and measurements from PDF drawings and CAD files for takeoffs, estimation, and quantity surveying.

Summary: AI Meeting Note is an iOS app that records, transcribes, and summarizes meetings, generating organized notes and action items from spoken discussions.

TwinMind is a real-time AI meeting assistant that records conversations, generates structured notes and summaries, and analyzes discussions in over 140 languages via app and browser extension.

Ango is a platform for creating, managing, and automating AI data labeling and annotation workflows to efficiently build and scale machine learning training datasets.

Kolabrya provides AI tools for workplace investigation, insurance, personal injury, and arbitration teams to transcribe interviews, analyze case data, and generate structured reports.

Neiro AI is a no-code generative AI platform that enables users to create multilingual text and voice content in over 140 languages and multiple voices.

Beey is an online tool that converts spoken audio into text and enables users to create and edit captions and subtitles through a web-based editor.

Convert spoken ideas into accurately transcribed, tone-adapted, and properly formatted text, then insert it directly into emails, documents, and messages across devices.

Macwhisper is a macOS and iOS application that locally records, transcribes, searches, and exports multilingual audio and video using Whisper, Parakeet, and integrated AI services.
Tencentcloud is a cloud computing platform that provides scalable infrastructure, AI services, and integrated tools for building, deploying, and managing applications across Tencentβs digital ecosystem.

Trint is an AI-powered transcription platform that converts audio and video files into editable text, enabling search, editing, collaboration, and export for content workflows.

Spokenly converts spoken words into text on Mac and iPhone using local Whisper models, supporting 100+ languages, offline dictation, and usage without account registration.

Run large language, multimodal, speech recognition, and text-to-speech models directly on mobile, desktop, automotive, and IoT devices, optimized for NPUs, GPUs, and CPUs.
Software for creating 3D models, renders, animations, and visual simulations
Autonomous agents and multi-agent systems for automated task execution and orchestration
Virtual characters, companions, and interactive character chat systems
Tools to detect machine-generated content, deepfakes, and synthetic media
Simulation tools and predictive modeling platforms for complex scenarios
AI-powered voice agents that help businesses automate customer interactions, support, and engagement. These solutions handle inbound and outbound calls, provide natural conversational experiences, and integrate with CRM or support systems.