From 76c5dbb130e1d28e97f0e8d38cea2e4366b0e1e0 Mon Sep 17 00:00:00 2001 From: msqr1 Date: Wed, 7 Feb 2024 10:25:02 -0800 Subject: [PATCH] Important changes --- README.md | 66 +------------------------- src/genericModel.cc | 16 ++++--- src/genericModel.h | 3 +- src/global.cc | 38 +++++++-------- src/global.h | 16 ++++--- src/model.cc | 2 +- src/pre.js | 35 +++++++------- src/recognizer.cc | 9 +++- src/spkModel.cc | 2 +- usage/README.md | 63 ++++++++++++++++++++++++ {examples => usage}/en-model.tgz | Bin {examples => usage}/example.wav | Bin {examples => usage}/fromMic.html | 0 {examples => usage}/fromWav.html | 0 {examples => usage}/withSpkModel.html | 0 15 files changed, 130 insertions(+), 120 deletions(-) create mode 100644 usage/README.md rename {examples => usage}/en-model.tgz (100%) rename {examples => usage}/example.wav (100%) rename {examples => usage}/fromMic.html (100%) rename {examples => usage}/fromWav.html (100%) rename {examples => usage}/withSpkModel.html (100%) diff --git a/README.md b/README.md index e299b6e..4a184cf 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ # Overview - A speech recognizer built on Vosk that can be run on the browser, inspired by [vosk-browser](https://github.com/ccoreilly/vosk-browser), but built from scratch and no code taken! - Designed with strong exception safety -- See the examples folder for ways to use the API +- See the usage folder's README.md for API documentation and important notes - See the devel folder for the newest build (not guaranteed to work) and the JS build script # Additions to vosk-browser: @@ -11,67 +11,5 @@ - Smaller JS size - Doesn't need another file when using AudioWorkletNode -# User agent notes -## SharedArrayBuffer -Browser-recognizer require SharedArrayBuffer to share thread's data, so these response headers must be set: -- ***Cross-Origin-Embedder-Policy*** ---> ***require-corp*** -- ***Cross-Origin-Opener-Policy*** ---> ***same-origin*** - -If you can't set them, you may use a HACKY workaround at *src/addCOI.js*. - -## Origin Private Filesystem (OPFS) -Browser-recognizer needs the Emscripten WASMFS' OPFS to store its model, IDBFS was considered, but dropped because there is no direct way to read from IDBFS to C++ without copying to MEMFS (basically RAM). For safety with this, always: -- Try catch ```window.loadBR()``` to to check for OPFS availability. -- Check if there is enough space via ```navigator.storage.estimate()``` for TWICE THE MODEL SIZE before calling Module.makeModel - -# API interface -## JS ```window``` object -| Function signature | Description | -|---|---| -|```Promise loadBR()``` | Load Emscripten's Module | - -## Shared interface -| Function signature | Description | -|---|---| -| ```delete()``` | Delete this object, see [why](https://emscripten.org/docs/getting_started/FAQ.html#what-does-exiting-the-runtime-mean-why-don-t-atexit-s-run) this is neccessary. - -## ```Module``` object -| Function signature | Description | -|---|---| -| ```Promise makeModel(path: string, url: string, id: string)```

```Promise makeSpkModel(path: string, url: string, id: string)``` | Make a ```Model``` or ```SpkModel```
- If **path** contains valid model files and **id** is the same, there will not be a fetch from **url**.
- If **path** doesn't contain valid model files, or if it contains valid model files but **id** is different, there will be a fetch from **url**, and the model is stored with **id**. Model files must be directly under the model root folder, and compressed model must be in .tgz format. | -| ```Promise makeRecognizer(model: Model, sampleRate: float)``` | Make a ```Recognizer```, it will use **model**'s thread if it's the first user of **model**, else it will use a new thread. -| ```setLogLevel(lvl: int)``` | Set Vosk's log level (default: ```0```: Info)
```-2```: Error
```-1```: Warning
```1```: Verbose
```2```: More verbose
```3```: Debug | -| ```revokeURLs()``` | Revoke the Blob URLs of pthread worker and worklet processor | -| ```cleanUp()``` | A convenience function that call ```revokeURLs()``` and ```delete()``` on all objects. You should put this at the end of your program! | - -## ```Recognizer``` object -| Function signature | Description | -|---|---| -| ```Promise getNode(ctx: AudioContext, channelIndex = 0: int)``` | Get a pass-through node that recognize audio and is connectable to a processing graph. It has 1 input and 1 output, **channelIndex** must point to a 16-bit mono channel of the input | -| ```recognize(buf: AudioBuffer, channelIndex = 0: int)``` | Recognize an AudioBuffer, usually from something like ```BaseAudioContext.decodeAudioData()```, **channelIndex** must point to a 16-bit mono channel of **buf** -| ```setPartialWords(partialWords: bool)``` | Return words' information in a partialResult event (default: false) | -| ```setWords(words: bool)``` | Return words' information in a result event (default: false) | -| ```setNLSML(nlsml: bool)``` | Return result and partialResult in NLSML form (default: false) | -| ```setMaxAlternatives(alts: int)``` | Set the max number of alternatives for result event (default: false) | -| ```setGrm(grm: string)``` | Add grammar to the recognizer (default: none) | -| ```setSpkModel(mdl: SpkModel)``` | Set the speaker model of the recognizer (default: none) | - -| Event | Description | -|---|---| -| ```partialResult``` | There is a partial recognition result, check the event's "details" property | -| ```result``` | There is a full recognition result, check the event's "details" property | - -# Compilation -Changing any option to non-default values requires recompilation -``` -git clone --depth=1 https://github.com/msqr1/Browser-recognizer && -cd Browser-recognizer && -[Options] make -``` -| Option | Description | Default value | -|---|---|---| -| MAX_MEMORY | Set max memory, valid suffixes: kb, mb, gb, tb or none (bytes) | ```300mb```, as [recommended](https://alphacephei.com/vosk/models) | -| MAX_THREADS | Set the max number of thread (2 min) | ```2``` (1 OPFS thread + 1 model/recognizer thread) | -| COMPILE_JOBS | Set the number of jobs (threads) when compiling | ```$(nproc)``` | -| EMSDK | Set EMSDK's path (will install EMSDK in root folder if unset) | ```../emsdk``` | +# Basic usage diff --git a/src/genericModel.cc b/src/genericModel.cc index 910b989..c04726a 100644 --- a/src/genericModel.cc +++ b/src/genericModel.cc @@ -1,18 +1,22 @@ #include "genericModel.h" genericModel::genericModel(const std::string& storepath, const std::string &id, int index) : storepath(storepath), id(id), index(index) { + if(!OPFSOk) { + fireEv("_continue", "OPFS hasn't been initialized or not available", index); + return; + } fs::current_path("/opfs", tank); if(tank.value() != 0) { - throwJS("Unable to cd OPFS root"); + fireEv("_continue","Unable to cd OPFS root",index); return; } fs::create_directories(storepath, tank); if(tank.value() != 0) { - throwJS("Unable to create storepath"); + fireEv("_continue","Unable to create storepath", index); } fs::current_path(storepath, tank); if(tank.value() != 0) { - throwJS("Unable to cd storepath"); + fireEv("_continue", "Unable to cd storepath", index); } } bool genericModel::checkModel() { @@ -27,7 +31,7 @@ bool genericModel::checkModel() { return id.compare(oldid) == 0 ? true : false; } void genericModel::afterFetch() { - thrd.setTask1([this](){ + thrd.addTask([this](){ if(!extractModel()) { fs::remove("/opfs/m0dEl.tar",tank); fs::current_path("/opfs", tank); @@ -39,8 +43,8 @@ void genericModel::afterFetch() { fs::remove("README",tank); std::ofstream idFile("id"); if(!idFile.is_open()) { - fs::current_path("/opfs"); - fs::remove_all(storepath); + fs::current_path("/opfs", tank); + fs::remove_all(storepath, tank); fireEv("_continue", "Unable to write model ID", index); return; } diff --git a/src/genericModel.h b/src/genericModel.h index 3e40e90..fe50710 100644 --- a/src/genericModel.h +++ b/src/genericModel.h @@ -14,7 +14,8 @@ namespace fs = std::filesystem; struct genericModel { const std::string storepath{}; const std::string id{}; - twiceThrd thrd{}; + reusableThrd thrd{}; + bool recognizerUsedThrd{}; int index{}; static bool extractModel(); virtual bool checkModelFiles() = 0; diff --git a/src/global.cc b/src/global.cc index 0970c04..1a3cd81 100644 --- a/src/global.cc +++ b/src/global.cc @@ -1,13 +1,5 @@ #include "global.h" -void throwJS(const char* msg, bool err) { - EM_ASM({ - if($1) { - throw Error(UTF8ToString($0)); - return; - } - throw UTF8ToString($0); - },msg, err); -} + void fireEv(const char *type, const char *content, int index) { static ProxyingQueue pq{}; auto proxy{[index, type, content](){ @@ -23,22 +15,26 @@ void fireEv(const char *type, const char *content, int index) { } int main() { std::thread t{[](){ - wasmfs_create_directory("/opfs", 0777, wasmfs_create_opfs_backend()); + OPFSOk = (wasmfs_create_directory("/opfs", 0777, wasmfs_create_opfs_backend()) == 0 ? true : false); }}; t.detach(); emscripten_exit_with_live_runtime(); } -void twiceThrd::setTask1(std::function task1) { - blocker.lock(); - std::thread t{[this, task1](){ - task1(); - blocker.lock(); - task2(); +ProxyingQueue reusableThrd::pq{}; +reusableThrd::reusableThrd() { + thrd = std::thread{[this](){ + while(!done.test()) { + static ProxyingQueue pq{}; + pq.execute(); + blocker.wait(done.test(), std::memory_order_relaxed); + } }}; - t.detach(); + thrd.detach(); } -void twiceThrd::setTask2(std::function task2) { - this->task2 = task2; - blocker.unlock(); - reusable = false; +void reusableThrd::addTask(std::function task) { + pq.proxyAsync(thrd.native_handle(), std::move(task)); +} +reusableThrd::~reusableThrd() { + done.test_and_set(std::memory_order_relaxed); + done.notify_one(); } diff --git a/src/global.h b/src/global.h index 58b618f..6b3fb98 100644 --- a/src/global.h +++ b/src/global.h @@ -10,14 +10,16 @@ using namespace emscripten; static pthread_t selfTID{pthread_self()}; static std::error_code tank{}; -void throwJS(const char* msg, bool err = false); +static bool OPFSOk{}; void fireEv(const char *type, const char *content, int index); int main(); -struct twiceThrd { // A minimal std::thread wrapper to run exactly 2 tasks - bool reusable{true}; - std::mutex blocker{}; - std::function task2{}; - void setTask1(std::function task1); - void setTask2(std::function task2); +struct reusableThrd { // A minimal std::thread wrapper to run exactly 2 tasks + static ProxyingQueue pq; + std::thread thrd; + std::atomic_flag blocker{}; + std::atomic_flag done{}; + reusableThrd(); + void addTask(std::function task); + ~reusableThrd(); }; diff --git a/src/model.cc b/src/model.cc index ddb1321..c54fc4b 100644 --- a/src/model.cc +++ b/src/model.cc @@ -23,7 +23,7 @@ void model::load(bool newThrd) { main(); return; } - thrd.setTask1(main); + thrd.addTask(main); } bool model::checkModelFiles() { return fs::exists("am/final.mdl", tank) && diff --git a/src/pre.js b/src/pre.js index cd06ac4..e122dd3 100644 --- a/src/pre.js +++ b/src/pre.js @@ -11,11 +11,7 @@ Module.cleanUp = () => { class Recognizer extends EventTarget { constructor() { super() - } - _init(model, sampleRate) { - this.obj = new Module.recognizer(model, sampleRate, objs.length) objs.push(this) - this.ptr = Module._malloc(512) } async getNode(ctx, channelIndex = 0) { if(typeof this.node === "undefined") { @@ -59,9 +55,8 @@ class Recognizer extends EventTarget { } } class Model extends EventTarget { - constructor(storepath, id) { + constructor(d) { super() - this.obj = new Module.model(storepath, id, objs.length) objs.push(this) } delete() { @@ -69,9 +64,8 @@ class Model extends EventTarget { } } class SpkModel extends EventTarget { - constructor(storepath, id) { + constructor() { super() - this.obj = new Module.spkModel(storepath, id, objs.length) objs.push(this) } delete() { @@ -79,7 +73,7 @@ class SpkModel extends EventTarget { } } Module.makeModel = async (url, storepath, id) => { - let mdl = new Model(storepath, id) + let mdl = new Model() return new Promise((resolve, reject) => { mdl.addEventListener("_continue", (ev) => { if(ev.detail === ".") { @@ -88,6 +82,7 @@ Module.makeModel = async (url, storepath, id) => { mdl.delete() return reject(ev.detail) }, {once : true}) + mdl.obj = new Module.model(storepath, id, objs.length) if(mdl.obj.checkModel()) { mdl.obj.load(true) return; @@ -110,7 +105,7 @@ Module.makeModel = async (url, storepath, id) => { }) } Module.makeSpkModel = async (url, storepath, id) => { - let mdl = new SpkModel(storepath, id) + let mdl = new SpkModel() return new Promise((resolve, reject) => { mdl.addEventListener("_continue", (ev) => { if(ev.detail === ".") { @@ -119,6 +114,7 @@ Module.makeSpkModel = async (url, storepath, id) => { mdl.delete() reject(ev.detail) }, {once : true}) + mdl.obj = new Module.model(storepath, id, objs.length) if(mdl.obj.checkModel()) { mdl.obj.load(true) return @@ -128,16 +124,21 @@ Module.makeSpkModel = async (url, storepath, id) => { if(!res.ok) { return reject("Unable to download model") } - let arr = await res.arrayBuffer() - let mdlMem = Module._malloc(arr.byteLength) // Will free in C++ - Module.HEAP8.set(new Int8Array(arr), mdlMem) - mdl.obj.afterFetch(mdlMem, arr.byteLength) + let wStream = await (await (await navigator.storage.getDirectory()).getFileHandle("m0dEl.tar", {create : true})).createWritable() + let tarReader = res.body.pipeThrough(dStream).getReader() + while(true) { + let readRes = await tarReader.read() + if(!readRes.done) await wStream.write(readRes.value) + else break + } + await wStream.close() + mdl.obj.afterFetch() })() }) } Module.makeRecognizer = (model, sampleRate) => { let rec = new Recognizer() - let retval = new Promise((resolve, reject) => { + return new Promise((resolve, reject) => { rec.addEventListener("_continue", (ev) => { if(ev.detail == ".") { objs.push(rec) @@ -146,9 +147,9 @@ Module.makeRecognizer = (model, sampleRate) => { rec.delete() reject(ev.detail) }, {once : true}) + rec.obj = new Module.recognizer(model, sampleRate, objs.length) + rec.ptr = Module._malloc(512) }) - rec._init(model.obj, sampleRate) - return retval } let processorUrl = URL.createObjectURL(new Blob(['(', (() => { diff --git a/src/recognizer.cc b/src/recognizer.cc index d467b84..d807e72 100644 --- a/src/recognizer.cc +++ b/src/recognizer.cc @@ -1,5 +1,9 @@ #include "recognizer.h" recognizer::recognizer(model* mdl, float sampleRate, int index) : index(index) { + if(!OPFSOk) { + fireEv("_continue", "OPFS hasn't been initialized or not available", index); + return; + } auto main{[this, mdl, sampleRate](){ rec = vosk_recognizer_new(mdl->mdl,sampleRate); if(rec == nullptr) { @@ -21,8 +25,9 @@ recognizer::recognizer(model* mdl, float sampleRate, int index) : index(index) { } } }}; - if(mdl->thrd.reusable) { - mdl->thrd.setTask2(main); + if(mdl->recognizerUsedThrd) { + mdl->thrd.addTask(main); + mdl->recognizerUsedThrd = true; return; } std::thread t{main}; diff --git a/src/spkModel.cc b/src/spkModel.cc index 141ad8b..a2bcbd6 100644 --- a/src/spkModel.cc +++ b/src/spkModel.cc @@ -27,7 +27,7 @@ void spkModel::load(bool newThrd) { main(); return; } - thrd.setTask1(main); + thrd.addTask(main); } bool spkModel::checkModelFiles() { return fs::exists("mfcc.conf", tank) && diff --git a/usage/README.md b/usage/README.md new file mode 100644 index 0000000..ab2d12e --- /dev/null +++ b/usage/README.md @@ -0,0 +1,63 @@ +# API interface +## JS ```window``` object +| Function signature | Description | +|---|---| +|```Promise loadBR()``` | Load Emscripten's Module | + +## Shared interface +| Function signature | Description | +|---|---| +| ```delete()``` | Delete this object, see [why](https://emscripten.org/docs/getting_started/FAQ.html#what-does-exiting-the-runtime-mean-why-don-t-atexit-s-run) this is neccessary. + +## ```Module``` object +| Function signature | Description | +|---|---| +| ```Promise makeModel(path: string, url: string, id: string)```

```Promise makeSpkModel(path: string, url: string, id: string)``` | Make a ```Model``` or ```SpkModel```
- If **path** contains valid model files and **id** is the same, there will not be a fetch from **url**.
- If **path** doesn't contain valid model files, or if it contains valid model files but **id** is different, there will be a fetch from **url**, and the model is stored with **id**. Model files must be directly under the model root folder, and compressed model must be in .tgz format. | +| ```Promise makeRecognizer(model: Model, sampleRate: float)``` | Make a ```Recognizer```, it will use **model**'s thread if it's the first user of **model**, else it will use a new thread. +| ```setLogLevel(lvl: int)``` | Set Vosk's log level (default: ```0```: Info)
```-2```: Error
```-1```: Warning
```1```: Verbose
```2```: More verbose
```3```: Debug | +| ```revokeURLs()``` | Revoke the Blob URLs of pthread worker and worklet processor | +| ```cleanUp()``` | A convenience function that call ```revokeURLs()``` and ```delete()``` on all objects. You should put this at the end of your program! | + +## ```Recognizer``` object +| Function signature | Description | +|---|---| +| ```Promise getNode(ctx: AudioContext, channelIndex = 0: int)``` | Get a pass-through node that recognize audio and is connectable to a processing graph. It has 1 input and 1 output, **channelIndex** must point to a 16-bit mono channel of the input | +| ```recognize(buf: AudioBuffer, channelIndex = 0: int)``` | Recognize an AudioBuffer, usually from something like ```BaseAudioContext.decodeAudioData()```, **channelIndex** must point to a 16-bit mono channel of **buf** +| ```setPartialWords(partialWords: bool)``` | Return words' information in a partialResult event (default: false) | +| ```setWords(words: bool)``` | Return words' information in a result event (default: false) | +| ```setNLSML(nlsml: bool)``` | Return result and partialResult in NLSML form (default: false) | +| ```setMaxAlternatives(alts: int)``` | Set the max number of alternatives for result event (default: false) | +| ```setGrm(grm: string)``` | Add grammar to the recognizer (default: none) | +| ```setSpkModel(mdl: SpkModel)``` | Set the speaker model of the recognizer (default: none) | + +| Event | Description | +|---|---| +| ```partialResult``` | There is a partial recognition result, check the event's "details" property | +| ```result``` | There is a full recognition result, check the event's "details" property | + +# User agent notes +## SharedArrayBuffer +Browser-recognizer require SharedArrayBuffer to share thread's data, so these response headers must be set: +- ***Cross-Origin-Embedder-Policy*** ---> ***require-corp*** +- ***Cross-Origin-Opener-Policy*** ---> ***same-origin*** + +If you can't set them, you may use a HACKY workaround at *src/addCOI.js*. + +## Origin Private Filesystem (OPFS) +Browser-recognizer needs the Emscripten WASMFS' OPFS to store its model, IDBFS was considered, but dropped because there is no direct way to read from IDBFS to C++ without copying to MEMFS (basically RAM). For safety with this, always: +- Try catch ```window.loadBR()``` to to check for OPFS availability. +- Check if there is enough space via ```navigator.storage.estimate()``` for TWICE THE MODEL SIZE before calling Module.makeModel + +# Compilation +Changing any option to non-default values requires recompilation +``` +git clone --depth=1 https://github.com/msqr1/Browser-recognizer && +cd Browser-recognizer && +[Options] make +``` +| Option | Description | Default value | +|---|---|---| +| MAX_MEMORY | Set max memory, valid suffixes: kb, mb, gb, tb or none (bytes) | ```300mb```, as [recommended](https://alphacephei.com/vosk/models) | +| MAX_THREADS | Set the max number of thread (2 min) | ```2``` (1 OPFS thread + 1 model/recognizer thread) | +| COMPILE_JOBS | Set the number of jobs (threads) when compiling | ```$(nproc)``` | +| EMSDK | Set EMSDK's path (will install EMSDK in root folder if unset) | ```../emsdk``` | \ No newline at end of file diff --git a/examples/en-model.tgz b/usage/en-model.tgz similarity index 100% rename from examples/en-model.tgz rename to usage/en-model.tgz diff --git a/examples/example.wav b/usage/example.wav similarity index 100% rename from examples/example.wav rename to usage/example.wav diff --git a/examples/fromMic.html b/usage/fromMic.html similarity index 100% rename from examples/fromMic.html rename to usage/fromMic.html diff --git a/examples/fromWav.html b/usage/fromWav.html similarity index 100% rename from examples/fromWav.html rename to usage/fromWav.html diff --git a/examples/withSpkModel.html b/usage/withSpkModel.html similarity index 100% rename from examples/withSpkModel.html rename to usage/withSpkModel.html