Browser-recognizer

  • A speech recognizer built on Vosk that can be run on the browser, inspired by vosk-browser, but built from scratch and no code taken!
  • Browser-recognizer can run both in the browser main thread and web workers
  • The API is also designed with strong exception safety

Global and all objects' common interface

Function signature (global) Description
Promise<Model> makeModel(path: string, url: string, id: string)

Promise<SpkModel> makeSpkModel(path: string, url: string, id: string)
Make a Model or SpkModel
- If path contains valid model files and id is the same, there will not be a fetch from url.
- If path doesn't contain valid model files, or if it contains valid model files but id is different, there will be a fetch from url, and the model is stored with id.
Promise<Recognizer> makeRecognizer(model: Model, sampleRate: float) Make a Recognizer, it will use a separate thread for recognition
setLogLevel(lvl: int) Set Vosk's log level (default: -1)
- 2: Error
- 1: Warning
- 0: Info
- 1: Verbose
- 2: More verbose
- 3: Debug
deleteAll() Call delete() on all objects, it is recommended to run this at the API usage end to automatically clean up everything. See why.
Function signature (all objects) Description
delete() Delete this object

Recognizer object

Function signature Description
Promise<AudioWorkletNode> getNode(ctx: AudioContext, channelIndex = 0: int) Get a pass-through node that recognize audio and is connectable to a processing graph. It has 1 input and 1 output, channelIndex must point to a 16-bit mono channel of the input
recognize(buf: AudioBuffer, channelIndex = 0: int) Recognize an AudioBuffer, usually from something like BaseAudioContext.decodeAudioData(), channelIndex must point to a 16-bit mono channel of buf
setPartialWords(partialWords: bool) Return words' information in a partialResult event (default: false)
setWords(words: bool) Return words' information in a result event (default: false)
setNLSML(nlsml: bool) Return result and partialResult in NLSML form (default: false)
setMaxAlternatives(alts: int) Set the max number of alternatives for result event (default: false)
setGrm(grm: string) Add grammar to the recognizer (default: none)
setSpkModel(mdl: SpkModel) Set the speaker model of the recognizer (default: none)
Event Description
partialResult There is a partial recognition result, check the event's "details" property
result There is a full recognition result, check the event's "details" property

Compilation

Changing any setting to non-default values requires recompilation

git clone --depth=1 https://github.com/msqr1/Browser-recognizer &&
cd Browser-recognizer &&
[Options] ./compile.sh
Option Description Default value
MAX_MEMORY Set max memory, valid suffixes: kb, mb, gb, tb or none (bytes) 300mb, as recommended
MAX_THREADS Set the max number of thread (2 min) 2 (1 OPFS thread + 1 recognizer thread)
COMPILE_JOBS Set the number of jobs (threads) when compiling $(nproc)
EMSDK Set EMSDK's path (will install EMSDK in root folder if unset) .

Response headers

Browser-recognizer require SharedArrayBuffer, so these response headers must be set:

  • Cross-Origin-Embedder-Policy ---> require-corp
  • Cross-Origin-Opener-Policy ---> same-origin

If you can't set them, you may use a VERY HACKY workaround at src/addCOI.js.

Additions to vosk-browser:

  • Download multiple models
  • Model storage path management (when many models are required)
  • Model ID management (when model updates are required)

Usage

<!--Load this from a script tag-->
<script src="BrowserRecognizer.js"></script>
<!-->
<script>
  // Select name
  const BrRec = await loadBR()

  // Prepare 
  const model = await BrRec.makeModel(")
  const recognizer = await BrRec.makeRecognizer(model)
  recognizer.addEventListener("result", e => {
    console.log("Result: ",e.details)
  })
  recognizer.addEventListener("partialResult", e => {
    console.log("Partial result: ",e.details)
  })

  // Process audio
  media = await navigator.mediaDevices.getUserMedia({
    video: false,
    audio: {
      echoCancellation: true,
      noiseSuppression: true,
      channelCount: 1,
      sampleRate: 16000
    },
  });

</script>
Description
A speech recognizer that can run on the browser, inspired by vosk-browser
Readme MIT 263 MiB
Languages
JavaScript 79.4%
C++ 10.8%
Shell 9.6%
C 0.2%