Important changes

This commit is contained in:
msqr1
2024-02-07 10:25:02 -08:00
parent 7951917c63
commit 76c5dbb130
15 changed files with 130 additions and 120 deletions

View File

@@ -1,7 +1,7 @@
# Overview # Overview
- A speech recognizer built on Vosk that can be run on the browser, inspired by [vosk-browser](https://github.com/ccoreilly/vosk-browser), but built from scratch and no code taken! - A speech recognizer built on Vosk that can be run on the browser, inspired by [vosk-browser](https://github.com/ccoreilly/vosk-browser), but built from scratch and no code taken!
- Designed with strong exception safety - Designed with strong exception safety
- See the examples folder for ways to use the API - See the usage folder's README.md for API documentation and important notes
- See the devel folder for the newest build (not guaranteed to work) and the JS build script - See the devel folder for the newest build (not guaranteed to work) and the JS build script
# Additions to vosk-browser: # Additions to vosk-browser:
@@ -11,67 +11,5 @@
- Smaller JS size - Smaller JS size
- Doesn't need another file when using AudioWorkletNode - Doesn't need another file when using AudioWorkletNode
# User agent notes # Basic usage
## SharedArrayBuffer
Browser-recognizer require SharedArrayBuffer to share thread's data, so these response headers must be set:
- ***Cross-Origin-Embedder-Policy*** ---> ***require-corp***
- ***Cross-Origin-Opener-Policy*** ---> ***same-origin***
If you can't set them, you may use a HACKY workaround at *src/addCOI.js*.
## Origin Private Filesystem (OPFS)
Browser-recognizer needs the Emscripten WASMFS' OPFS to store its model, IDBFS was considered, but dropped because there is no direct way to read from IDBFS to C++ without copying to MEMFS (basically RAM). For safety with this, always:
- Try catch ```window.loadBR()``` to to check for OPFS availability.
- Check if there is enough space via ```navigator.storage.estimate()``` for TWICE THE MODEL SIZE before calling Module.makeModel
# API interface
## JS ```window``` object
| Function signature | Description |
|---|---|
|```Promise<Module> loadBR()``` | Load Emscripten's Module |
## Shared interface
| Function signature | Description |
|---|---|
| ```delete()``` | Delete this object, see [why](https://emscripten.org/docs/getting_started/FAQ.html#what-does-exiting-the-runtime-mean-why-don-t-atexit-s-run) this is neccessary.
## ```Module``` object
| Function signature | Description |
|---|---|
| ```Promise<Model> makeModel(path: string, url: string, id: string)```<br><br>```Promise<SpkModel> makeSpkModel(path: string, url: string, id: string)``` | Make a ```Model``` or ```SpkModel```<br>- If **path** contains valid model files and **id** is the same, there will not be a fetch from **url**.<br>- If **path** doesn't contain valid model files, or if it contains valid model files but **id** is different, there will be a fetch from **url**, and the model is stored with **id**. Model files must be directly under the model root folder, and compressed model must be in .tgz format. |
| ```Promise<Recognizer> makeRecognizer(model: Model, sampleRate: float)``` | Make a ```Recognizer```, it will use **model**'s thread if it's the first user of **model**, else it will use a new thread.
| ```setLogLevel(lvl: int)``` | Set Vosk's log level (default: ```0```: Info) <br>```-2```: Error<br>```-1```: Warning<br>```1```: Verbose<br>```2```: More verbose<br>```3```: Debug |
| ```revokeURLs()``` | Revoke the Blob URLs of pthread worker and worklet processor |
| ```cleanUp()``` | A convenience function that call ```revokeURLs()``` and ```delete()``` on all objects. You should put this at the end of your program! |
## ```Recognizer``` object
| Function signature | Description |
|---|---|
| ```Promise<AudioWorkletNode> getNode(ctx: AudioContext, channelIndex = 0: int)``` | Get a pass-through node that recognize audio and is connectable to a processing graph. It has 1 input and 1 output, **channelIndex** must point to a 16-bit mono channel of the input |
| ```recognize(buf: AudioBuffer, channelIndex = 0: int)``` | Recognize an AudioBuffer, usually from something like ```BaseAudioContext.decodeAudioData()```, **channelIndex** must point to a 16-bit mono channel of **buf**
| ```setPartialWords(partialWords: bool)``` | Return words' information in a partialResult event (default: false) |
| ```setWords(words: bool)``` | Return words' information in a result event (default: false) |
| ```setNLSML(nlsml: bool)``` | Return result and partialResult in NLSML form (default: false) |
| ```setMaxAlternatives(alts: int)``` | Set the max number of alternatives for result event (default: false) |
| ```setGrm(grm: string)``` | Add grammar to the recognizer (default: none) |
| ```setSpkModel(mdl: SpkModel)``` | Set the speaker model of the recognizer (default: none) |
| Event | Description |
|---|---|
| ```partialResult``` | There is a partial recognition result, check the event's "details" property |
| ```result``` | There is a full recognition result, check the event's "details" property |
# Compilation
Changing any option to non-default values requires recompilation
```
git clone --depth=1 https://github.com/msqr1/Browser-recognizer &&
cd Browser-recognizer &&
[Options] make
```
| Option | Description | Default value |
|---|---|---|
| MAX_MEMORY | Set max memory, valid suffixes: kb, mb, gb, tb or none (bytes) | ```300mb```, as [recommended](https://alphacephei.com/vosk/models) |
| MAX_THREADS | Set the max number of thread (2 min) | ```2``` (1 OPFS thread + 1 model/recognizer thread) |
| COMPILE_JOBS | Set the number of jobs (threads) when compiling | ```$(nproc)``` |
| EMSDK | Set EMSDK's path (will install EMSDK in root folder if unset) | ```../emsdk``` |

View File

@@ -1,18 +1,22 @@
#include "genericModel.h" #include "genericModel.h"
genericModel::genericModel(const std::string& storepath, const std::string &id, int index) : storepath(storepath), id(id), index(index) { genericModel::genericModel(const std::string& storepath, const std::string &id, int index) : storepath(storepath), id(id), index(index) {
if(!OPFSOk) {
fireEv("_continue", "OPFS hasn't been initialized or not available", index);
return;
}
fs::current_path("/opfs", tank); fs::current_path("/opfs", tank);
if(tank.value() != 0) { if(tank.value() != 0) {
throwJS("Unable to cd OPFS root"); fireEv("_continue","Unable to cd OPFS root",index);
return; return;
} }
fs::create_directories(storepath, tank); fs::create_directories(storepath, tank);
if(tank.value() != 0) { if(tank.value() != 0) {
throwJS("Unable to create storepath"); fireEv("_continue","Unable to create storepath", index);
} }
fs::current_path(storepath, tank); fs::current_path(storepath, tank);
if(tank.value() != 0) { if(tank.value() != 0) {
throwJS("Unable to cd storepath"); fireEv("_continue", "Unable to cd storepath", index);
} }
} }
bool genericModel::checkModel() { bool genericModel::checkModel() {
@@ -27,7 +31,7 @@ bool genericModel::checkModel() {
return id.compare(oldid) == 0 ? true : false; return id.compare(oldid) == 0 ? true : false;
} }
void genericModel::afterFetch() { void genericModel::afterFetch() {
thrd.setTask1([this](){ thrd.addTask([this](){
if(!extractModel()) { if(!extractModel()) {
fs::remove("/opfs/m0dEl.tar",tank); fs::remove("/opfs/m0dEl.tar",tank);
fs::current_path("/opfs", tank); fs::current_path("/opfs", tank);
@@ -39,8 +43,8 @@ void genericModel::afterFetch() {
fs::remove("README",tank); fs::remove("README",tank);
std::ofstream idFile("id"); std::ofstream idFile("id");
if(!idFile.is_open()) { if(!idFile.is_open()) {
fs::current_path("/opfs"); fs::current_path("/opfs", tank);
fs::remove_all(storepath); fs::remove_all(storepath, tank);
fireEv("_continue", "Unable to write model ID", index); fireEv("_continue", "Unable to write model ID", index);
return; return;
} }

View File

@@ -14,7 +14,8 @@ namespace fs = std::filesystem;
struct genericModel { struct genericModel {
const std::string storepath{}; const std::string storepath{};
const std::string id{}; const std::string id{};
twiceThrd thrd{}; reusableThrd thrd{};
bool recognizerUsedThrd{};
int index{}; int index{};
static bool extractModel(); static bool extractModel();
virtual bool checkModelFiles() = 0; virtual bool checkModelFiles() = 0;

View File

@@ -1,13 +1,5 @@
#include "global.h" #include "global.h"
void throwJS(const char* msg, bool err) {
EM_ASM({
if($1) {
throw Error(UTF8ToString($0));
return;
}
throw UTF8ToString($0);
},msg, err);
}
void fireEv(const char *type, const char *content, int index) { void fireEv(const char *type, const char *content, int index) {
static ProxyingQueue pq{}; static ProxyingQueue pq{};
auto proxy{[index, type, content](){ auto proxy{[index, type, content](){
@@ -23,22 +15,26 @@ void fireEv(const char *type, const char *content, int index) {
} }
int main() { int main() {
std::thread t{[](){ std::thread t{[](){
wasmfs_create_directory("/opfs", 0777, wasmfs_create_opfs_backend()); OPFSOk = (wasmfs_create_directory("/opfs", 0777, wasmfs_create_opfs_backend()) == 0 ? true : false);
}}; }};
t.detach(); t.detach();
emscripten_exit_with_live_runtime(); emscripten_exit_with_live_runtime();
} }
void twiceThrd::setTask1(std::function<void()> task1) { ProxyingQueue reusableThrd::pq{};
blocker.lock(); reusableThrd::reusableThrd() {
std::thread t{[this, task1](){ thrd = std::thread{[this](){
task1(); while(!done.test()) {
blocker.lock(); static ProxyingQueue pq{};
task2(); pq.execute();
blocker.wait(done.test(), std::memory_order_relaxed);
}
}}; }};
t.detach(); thrd.detach();
} }
void twiceThrd::setTask2(std::function<void()> task2) { void reusableThrd::addTask(std::function<void()> task) {
this->task2 = task2; pq.proxyAsync(thrd.native_handle(), std::move(task));
blocker.unlock(); }
reusable = false; reusableThrd::~reusableThrd() {
done.test_and_set(std::memory_order_relaxed);
done.notify_one();
} }

View File

@@ -10,14 +10,16 @@ using namespace emscripten;
static pthread_t selfTID{pthread_self()}; static pthread_t selfTID{pthread_self()};
static std::error_code tank{}; static std::error_code tank{};
void throwJS(const char* msg, bool err = false); static bool OPFSOk{};
void fireEv(const char *type, const char *content, int index); void fireEv(const char *type, const char *content, int index);
int main(); int main();
struct twiceThrd { // A minimal std::thread wrapper to run exactly 2 tasks struct reusableThrd { // A minimal std::thread wrapper to run exactly 2 tasks
bool reusable{true}; static ProxyingQueue pq;
std::mutex blocker{}; std::thread thrd;
std::function<void()> task2{}; std::atomic_flag blocker{};
void setTask1(std::function<void()> task1); std::atomic_flag done{};
void setTask2(std::function<void()> task2); reusableThrd();
void addTask(std::function<void()> task);
~reusableThrd();
}; };

View File

@@ -23,7 +23,7 @@ void model::load(bool newThrd) {
main(); main();
return; return;
} }
thrd.setTask1(main); thrd.addTask(main);
} }
bool model::checkModelFiles() { bool model::checkModelFiles() {
return fs::exists("am/final.mdl", tank) && return fs::exists("am/final.mdl", tank) &&

View File

@@ -11,11 +11,7 @@ Module.cleanUp = () => {
class Recognizer extends EventTarget { class Recognizer extends EventTarget {
constructor() { constructor() {
super() super()
}
_init(model, sampleRate) {
this.obj = new Module.recognizer(model, sampleRate, objs.length)
objs.push(this) objs.push(this)
this.ptr = Module._malloc(512)
} }
async getNode(ctx, channelIndex = 0) { async getNode(ctx, channelIndex = 0) {
if(typeof this.node === "undefined") { if(typeof this.node === "undefined") {
@@ -59,9 +55,8 @@ class Recognizer extends EventTarget {
} }
} }
class Model extends EventTarget { class Model extends EventTarget {
constructor(storepath, id) { constructor(d) {
super() super()
this.obj = new Module.model(storepath, id, objs.length)
objs.push(this) objs.push(this)
} }
delete() { delete() {
@@ -69,9 +64,8 @@ class Model extends EventTarget {
} }
} }
class SpkModel extends EventTarget { class SpkModel extends EventTarget {
constructor(storepath, id) { constructor() {
super() super()
this.obj = new Module.spkModel(storepath, id, objs.length)
objs.push(this) objs.push(this)
} }
delete() { delete() {
@@ -79,7 +73,7 @@ class SpkModel extends EventTarget {
} }
} }
Module.makeModel = async (url, storepath, id) => { Module.makeModel = async (url, storepath, id) => {
let mdl = new Model(storepath, id) let mdl = new Model()
return new Promise((resolve, reject) => { return new Promise((resolve, reject) => {
mdl.addEventListener("_continue", (ev) => { mdl.addEventListener("_continue", (ev) => {
if(ev.detail === ".") { if(ev.detail === ".") {
@@ -88,6 +82,7 @@ Module.makeModel = async (url, storepath, id) => {
mdl.delete() mdl.delete()
return reject(ev.detail) return reject(ev.detail)
}, {once : true}) }, {once : true})
mdl.obj = new Module.model(storepath, id, objs.length)
if(mdl.obj.checkModel()) { if(mdl.obj.checkModel()) {
mdl.obj.load(true) mdl.obj.load(true)
return; return;
@@ -110,7 +105,7 @@ Module.makeModel = async (url, storepath, id) => {
}) })
} }
Module.makeSpkModel = async (url, storepath, id) => { Module.makeSpkModel = async (url, storepath, id) => {
let mdl = new SpkModel(storepath, id) let mdl = new SpkModel()
return new Promise((resolve, reject) => { return new Promise((resolve, reject) => {
mdl.addEventListener("_continue", (ev) => { mdl.addEventListener("_continue", (ev) => {
if(ev.detail === ".") { if(ev.detail === ".") {
@@ -119,6 +114,7 @@ Module.makeSpkModel = async (url, storepath, id) => {
mdl.delete() mdl.delete()
reject(ev.detail) reject(ev.detail)
}, {once : true}) }, {once : true})
mdl.obj = new Module.model(storepath, id, objs.length)
if(mdl.obj.checkModel()) { if(mdl.obj.checkModel()) {
mdl.obj.load(true) mdl.obj.load(true)
return return
@@ -128,16 +124,21 @@ Module.makeSpkModel = async (url, storepath, id) => {
if(!res.ok) { if(!res.ok) {
return reject("Unable to download model") return reject("Unable to download model")
} }
let arr = await res.arrayBuffer() let wStream = await (await (await navigator.storage.getDirectory()).getFileHandle("m0dEl.tar", {create : true})).createWritable()
let mdlMem = Module._malloc(arr.byteLength) // Will free in C++ let tarReader = res.body.pipeThrough(dStream).getReader()
Module.HEAP8.set(new Int8Array(arr), mdlMem) while(true) {
mdl.obj.afterFetch(mdlMem, arr.byteLength) let readRes = await tarReader.read()
if(!readRes.done) await wStream.write(readRes.value)
else break
}
await wStream.close()
mdl.obj.afterFetch()
})() })()
}) })
} }
Module.makeRecognizer = (model, sampleRate) => { Module.makeRecognizer = (model, sampleRate) => {
let rec = new Recognizer() let rec = new Recognizer()
let retval = new Promise((resolve, reject) => { return new Promise((resolve, reject) => {
rec.addEventListener("_continue", (ev) => { rec.addEventListener("_continue", (ev) => {
if(ev.detail == ".") { if(ev.detail == ".") {
objs.push(rec) objs.push(rec)
@@ -146,9 +147,9 @@ Module.makeRecognizer = (model, sampleRate) => {
rec.delete() rec.delete()
reject(ev.detail) reject(ev.detail)
}, {once : true}) }, {once : true})
rec.obj = new Module.recognizer(model, sampleRate, objs.length)
rec.ptr = Module._malloc(512)
}) })
rec._init(model.obj, sampleRate)
return retval
} }
let processorUrl = URL.createObjectURL(new Blob(['(', let processorUrl = URL.createObjectURL(new Blob(['(',
(() => { (() => {

View File

@@ -1,5 +1,9 @@
#include "recognizer.h" #include "recognizer.h"
recognizer::recognizer(model* mdl, float sampleRate, int index) : index(index) { recognizer::recognizer(model* mdl, float sampleRate, int index) : index(index) {
if(!OPFSOk) {
fireEv("_continue", "OPFS hasn't been initialized or not available", index);
return;
}
auto main{[this, mdl, sampleRate](){ auto main{[this, mdl, sampleRate](){
rec = vosk_recognizer_new(mdl->mdl,sampleRate); rec = vosk_recognizer_new(mdl->mdl,sampleRate);
if(rec == nullptr) { if(rec == nullptr) {
@@ -21,8 +25,9 @@ recognizer::recognizer(model* mdl, float sampleRate, int index) : index(index) {
} }
} }
}}; }};
if(mdl->thrd.reusable) { if(mdl->recognizerUsedThrd) {
mdl->thrd.setTask2(main); mdl->thrd.addTask(main);
mdl->recognizerUsedThrd = true;
return; return;
} }
std::thread t{main}; std::thread t{main};

View File

@@ -27,7 +27,7 @@ void spkModel::load(bool newThrd) {
main(); main();
return; return;
} }
thrd.setTask1(main); thrd.addTask(main);
} }
bool spkModel::checkModelFiles() { bool spkModel::checkModelFiles() {
return fs::exists("mfcc.conf", tank) && return fs::exists("mfcc.conf", tank) &&

63
usage/README.md Normal file
View File

@@ -0,0 +1,63 @@
# API interface
## JS ```window``` object
| Function signature | Description |
|---|---|
|```Promise<Module> loadBR()``` | Load Emscripten's Module |
## Shared interface
| Function signature | Description |
|---|---|
| ```delete()``` | Delete this object, see [why](https://emscripten.org/docs/getting_started/FAQ.html#what-does-exiting-the-runtime-mean-why-don-t-atexit-s-run) this is neccessary.
## ```Module``` object
| Function signature | Description |
|---|---|
| ```Promise<Model> makeModel(path: string, url: string, id: string)```<br><br>```Promise<SpkModel> makeSpkModel(path: string, url: string, id: string)``` | Make a ```Model``` or ```SpkModel```<br>- If **path** contains valid model files and **id** is the same, there will not be a fetch from **url**.<br>- If **path** doesn't contain valid model files, or if it contains valid model files but **id** is different, there will be a fetch from **url**, and the model is stored with **id**. Model files must be directly under the model root folder, and compressed model must be in .tgz format. |
| ```Promise<Recognizer> makeRecognizer(model: Model, sampleRate: float)``` | Make a ```Recognizer```, it will use **model**'s thread if it's the first user of **model**, else it will use a new thread.
| ```setLogLevel(lvl: int)``` | Set Vosk's log level (default: ```0```: Info) <br>```-2```: Error<br>```-1```: Warning<br>```1```: Verbose<br>```2```: More verbose<br>```3```: Debug |
| ```revokeURLs()``` | Revoke the Blob URLs of pthread worker and worklet processor |
| ```cleanUp()``` | A convenience function that call ```revokeURLs()``` and ```delete()``` on all objects. You should put this at the end of your program! |
## ```Recognizer``` object
| Function signature | Description |
|---|---|
| ```Promise<AudioWorkletNode> getNode(ctx: AudioContext, channelIndex = 0: int)``` | Get a pass-through node that recognize audio and is connectable to a processing graph. It has 1 input and 1 output, **channelIndex** must point to a 16-bit mono channel of the input |
| ```recognize(buf: AudioBuffer, channelIndex = 0: int)``` | Recognize an AudioBuffer, usually from something like ```BaseAudioContext.decodeAudioData()```, **channelIndex** must point to a 16-bit mono channel of **buf**
| ```setPartialWords(partialWords: bool)``` | Return words' information in a partialResult event (default: false) |
| ```setWords(words: bool)``` | Return words' information in a result event (default: false) |
| ```setNLSML(nlsml: bool)``` | Return result and partialResult in NLSML form (default: false) |
| ```setMaxAlternatives(alts: int)``` | Set the max number of alternatives for result event (default: false) |
| ```setGrm(grm: string)``` | Add grammar to the recognizer (default: none) |
| ```setSpkModel(mdl: SpkModel)``` | Set the speaker model of the recognizer (default: none) |
| Event | Description |
|---|---|
| ```partialResult``` | There is a partial recognition result, check the event's "details" property |
| ```result``` | There is a full recognition result, check the event's "details" property |
# User agent notes
## SharedArrayBuffer
Browser-recognizer require SharedArrayBuffer to share thread's data, so these response headers must be set:
- ***Cross-Origin-Embedder-Policy*** ---> ***require-corp***
- ***Cross-Origin-Opener-Policy*** ---> ***same-origin***
If you can't set them, you may use a HACKY workaround at *src/addCOI.js*.
## Origin Private Filesystem (OPFS)
Browser-recognizer needs the Emscripten WASMFS' OPFS to store its model, IDBFS was considered, but dropped because there is no direct way to read from IDBFS to C++ without copying to MEMFS (basically RAM). For safety with this, always:
- Try catch ```window.loadBR()``` to to check for OPFS availability.
- Check if there is enough space via ```navigator.storage.estimate()``` for TWICE THE MODEL SIZE before calling Module.makeModel
# Compilation
Changing any option to non-default values requires recompilation
```
git clone --depth=1 https://github.com/msqr1/Browser-recognizer &&
cd Browser-recognizer &&
[Options] make
```
| Option | Description | Default value |
|---|---|---|
| MAX_MEMORY | Set max memory, valid suffixes: kb, mb, gb, tb or none (bytes) | ```300mb```, as [recommended](https://alphacephei.com/vosk/models) |
| MAX_THREADS | Set the max number of thread (2 min) | ```2``` (1 OPFS thread + 1 model/recognizer thread) |
| COMPILE_JOBS | Set the number of jobs (threads) when compiling | ```$(nproc)``` |
| EMSDK | Set EMSDK's path (will install EMSDK in root folder if unset) | ```../emsdk``` |