Change base structure to use errors instead

This commit is contained in:
msqr1
2024-01-20 00:41:38 -08:00
parent 91a21271d5
commit 2a426b983c
19 changed files with 147 additions and 198 deletions

View File

@@ -1,68 +1,34 @@
# Browser-recognizer # Browser-recognizer
- A from-microphone speech recognizer built on Vosk that can be run on the browser, inspired by [vosk-browser](https://github.com/ccoreilly/vosk-browser), but built from scratch and no code taken! - A speech recognizer built on Vosk that can be run on the browser, inspired by [vosk-browser](https://github.com/ccoreilly/vosk-browser), but built from scratch and no code taken!
- Browser-recognizer can run both in the browser main thread and web workers. - Browser-recognizer can run both in the browser main thread and web workers.
## Interface ## Global and all objects' common interface
- setLogLevel: set Kaldi's log level (default: -1) | Function signature (global) | Description |
- -2: Error |---|---|
- -1: Warning | ```Promise makeModel(url, path, id)```<br>```Promise makeSpkModel(url, storepath, id)``` | - If **path** contains valid model files and **id** is the same, there will not be a fetch from **url**.<br>- If **path** doesn't contain valid model files, or if it contains valid model files but **id** is different, there will be a fetch from **url**, and the model is stored with **id**. |
- 0: Info | ```setLogLevel(level)``` | Set Vosk's log level (default: -1) <br>- 2: Error<br>- 1: Warning<br>- 0: Info <br>- 1: Verbose<br>- 2: More verbose<br>- 3: Debug |
- 1: Verbose | ```deleteAll()``` | Call ```delete()``` on all objects, it is recommended to put this at the end of the program to automatically clean up. See [here](https://emscripten.org/docs/getting_started/FAQ.html#what-does-exiting-the-runtime-mean-why-don-t-atexit-s-run).|
- 2: More verbose
- 3: Debug
### Model and SpkModel
```
model = new Model()
spkModel = new SpkModel()
// Add events listeners
model.init(url, storepath, id)
spkModel.init(url, storepath, id)
```
#### Functions
- ***constructor***: Construct the EventTarget part to enable addEventListener
- ***init*** : Initialize the internal object with an URL, storage path, and an ID.
- If **storepath** contains valid model files and **id** is the same, there will not be a fetch from **url**.
- If **storepath** doesn't contain valid model files, or if it contains valid model files but **id** is different, there will be a fetch from **url**, and the model is stored with **id**.
- ***delete***: Delete self and free resources
#### Events
- ***ready***: The model is ready to be put into a recognizer via the constructor, or setSpkModel() for SpkModel.
- ***error***: An error occured, check the event's "details" property.
### Recognizer
```
recognizer = new Recognizer()
// Add event listeners
recognizer.init(model)
```
#### Functions
- ***constructor***: Construct the EventTarget part to enable addEventListener
- ***init***: Construct the real internal object from a model
- ***start***: Start recognizing
- ***stop***: Stop recognizing
- ***setWords***: Return words' information in a result event (default: false)
- ***setPartialWords***: Return words' information in a partialResult event (default: false)
- ***setNLSML***: Return result and partialResult in NLSML form (default: false)
- ***setMaxAlternatives***: Set the max number of alternatives for result event (default: false)
- ***setGrm***: Add grammar to the recognizer (default: none)
- ***setSpkModel***: Set the speaker model of the recognizer (default: none)
- ***delete***: Call stop, delete self and free all resource
#### Events
- ***partialResult***: There is a partial recognition result, check the event's "details" property
- ***result***: There is a full recognition result, check the event's "details" property
- ***error***: An error occured, check the event's "details" property.
## Other key points
### IMPORTANT
- You MUST call delete() on objects at the end of its usage. Or put:
``` | Function signature (all objects) | Description
__genericObj__.objects.forEach(obj => obj.delete()) |---|---|
``` | ```delete()``` | Delete this Object
at the end of your program to automatically do that. We have to do this because Emscripten doesn't call destructors. See [here](https://emscripten.org/docs/getting_started/FAQ.html#what-does-exiting-the-runtime-mean-why-don-t-atexit-s-run). ## ```Recognizer``` object
- To be safe, always handle the API through events by adding all event listener before calling init() on objects. | Function signature | Description |
- Always call init on the regular Model object before calling init on the recognizer. SpkModel can be init and set later. |---|---|
### Guarantees | ```setPartialWords(partialWords)``` | Return words' information in a partialResult event (default: false) |
- If an error occurs (error event is fired), no changes was made, and no other dependent events will fire. For example, if an error occur while loading the model, the "ready" event won't fire in order to prevent executing code on a nonexistent model. | ```setWords(words)``` | Return words' information in a result event (default: false) |
### Limitations compared to vosk-browser: | ```setNLSML(nlsml)``` | Return result and partialResult in NLSML form (default: false) |
- Microphone only | ```setMaxAlternatives(alts)``` | Set the max number of alternatives for result event (default: false) |
- Fixed memory size at 300MB, changing it require recompilation | ```setGrm(grm)``` | Add grammar to the recognizer (default: none) |
| ```setSpkModel(spkmodel)``` | Set the speaker model of the recognizer (default: none) |
| Event | Description |
|---|---|
| ```partialResult``` | There is a partial recognition result, check the event's "details" property |
| ```result``` | There is a full recognition result, check the event's "details" property |
| ```error``` | An recognition occurred, check the event's "details" property |
## Other key points
- If an error occurs, no changes was made.
- Fixed memory size at 300MB, changing it require recompilation (because the use of pthread will lead)
### Additions to vosk-browser: ### Additions to vosk-browser:
- Multiple models support - Multiple models support
- Speaker model (SpkModel) support - Speaker model (SpkModel) support
@@ -79,6 +45,11 @@ recognizer.init(model)
<script src="BrowserRecognizer.js"></script> <script src="BrowserRecognizer.js"></script>
<!--> <!-->
<script> <script>
// Choose a nice, non-conflicting name for the module
const BrRec = await loadBR()
// Prepare
const model = BrRec.makeModel()
const spkmodel = BrRec.
</script> </script>
``` ```

View File

@@ -1,11 +1,7 @@
<!DOCTYPE html> <!DOCTYPE html>
<html> <html>
<head> <head>
<script src="BrowserRecognizer.js" defer></script> <script src="BrowserRecognizer.js" type="module"></script>
<script type="module" src="src/genericObj.js" defer></script>
<script src="src/model.js" defer></script>
<script src="src/spkModel.js" defer></script>
<script src="src/recognizer.js" defer></script>
</head> </head>
<script> <script>

View File

@@ -63,4 +63,4 @@ em++ -pthread -O3 -flto -I. -I$KALDI/src -I$OPENFST/include $VOSK_FILES -c &&
emar -rcs vosk.a ${VOSK_FILES//.cc/.o} && emar -rcs vosk.a ${VOSK_FILES//.cc/.o} &&
cd $SRC cd $SRC
em++ -O3 genericObj.cc genericModel.cc model.cc spkModel.cc recognizer.cc bindings.cc -sWASMFS -sWASM_BIGINT -sSUPPORT_BIG_ENDIAN -sSINGLE_FILE -sMODULARIZE -sASYNCIFY -sEXPORT_NAME=loadBR -sENVIRONMENT=web,worker -sINITIAL_MEMORY=300mb -sPTHREAD_POOL_SIZE=2 -pthread -flto -I. -I$LIBARCHIVE/include -I$VOSK/src -L$LIBARCHIVE/lib -larchive -L$ZSTD/lib -lzstd -L$KALDI/src -l:online2/kaldi-online2.a -l:decoder/kaldi-decoder.a -l:ivector/kaldi-ivector.a -l:gmm/kaldi-gmm.a -l:tree/kaldi-tree.a -l:feat/kaldi-feat.a -l:cudamatrix/kaldi-cudamatrix.a -l:lat/kaldi-lat.a -l:lm/kaldi-lm.a -l:rnnlm/kaldi-rnnlm.a -l:hmm/kaldi-hmm.a -l:nnet3/kaldi-nnet3.a -l:transform/kaldi-transform.a -l:matrix/kaldi-matrix.a -l:fstext/kaldi-fstext.a -l:util/kaldi-util.a -l:base/kaldi-base.a -L$OPENFST/lib -l:libfst.a -l:libfstngram.a -L$CLAPACK_WASM -l:CBLAS/lib/cblas.a -l:CLAPACK-3.2.1/lapack.a -l:CLAPACK-3.2.1/libcblaswr.a -l:f2c_BLAS-3.8.0/blas.a -l:libf2c/libf2c.a -L$VOSK/src -l:vosk.a -lopfs.js -lembind -lopenal -o ../BrowserRecognizer.js em++ -O3 genericObj.cc genericModel.cc model.cc spkModel.cc recognizer.cc bindings.cc -sWASMFS -sWASM_BIGINT -sSUPPORT_BIG_ENDIAN -sSINGLE_FILE -sMODULARIZE -sEXPORT_ES6 -sASYNCIFY -sEXPORT_NAME=loadBR -sENVIRONMENT=web,worker -sINITIAL_MEMORY=300mb -sPTHREAD_POOL_SIZE=2 --pre-js pre.js --extern-post-js post.js -pthread -flto -I. -I$LIBARCHIVE/include -I$VOSK/src -L$LIBARCHIVE/lib -larchive -L$ZSTD/lib -lzstd -L$KALDI/src -l:online2/kaldi-online2.a -l:decoder/kaldi-decoder.a -l:ivector/kaldi-ivector.a -l:gmm/kaldi-gmm.a -l:tree/kaldi-tree.a -l:feat/kaldi-feat.a -l:cudamatrix/kaldi-cudamatrix.a -l:lat/kaldi-lat.a -l:lm/kaldi-lm.a -l:rnnlm/kaldi-rnnlm.a -l:hmm/kaldi-hmm.a -l:nnet3/kaldi-nnet3.a -l:transform/kaldi-transform.a -l:matrix/kaldi-matrix.a -l:fstext/kaldi-fstext.a -l:util/kaldi-util.a -l:base/kaldi-base.a -L$OPENFST/lib -l:libfst.a -l:libfstngram.a -L$CLAPACK_WASM -l:CBLAS/lib/cblas.a -l:CLAPACK-3.2.1/lapack.a -l:CLAPACK-3.2.1/libcblaswr.a -l:f2c_BLAS-3.8.0/blas.a -l:libf2c/libf2c.a -L$VOSK/src -l:vosk.a -lopfs.js -lembind -lopenal -o ../BrowserRecognizer.js

View File

@@ -11,20 +11,19 @@ int main() {
} }
EMSCRIPTEN_BINDINGS() { EMSCRIPTEN_BINDINGS() {
function("setLogLevel", &vosk_set_log_level, allow_raw_pointers()); function("setLogLevel", &vosk_set_log_level, allow_raw_pointers());
class_<model>("__model__") class_<model>("model")
.constructor<std::string, std::string, std::string, int>(allow_raw_pointers()); .constructor<std::string, std::string, std::string, int>(allow_raw_pointers());
class_<spkModel>("__spkModel__") class_<spkModel>("spkModel")
.constructor<std::string, std::string, std::string, const int>(allow_raw_pointers()); .constructor<std::string, std::string, std::string, int>(allow_raw_pointers());
class_<recognizer>("__recognizer__") class_<recognizer>("recognizer")
.constructor<model*, int, int>(allow_raw_pointers()) .constructor<model*, int, int>(allow_raw_pointers())
.function("start", &recognizer::start, allow_raw_pointers())
.function("stop", &recognizer::stop, allow_raw_pointers())
.function("setWords", &recognizer::setWords, allow_raw_pointers()) .function("setWords", &recognizer::setWords, allow_raw_pointers())
.function("setPartialWords", &recognizer::setPartialWords, allow_raw_pointers()) .function("setPartialWords", &recognizer::setPartialWords, allow_raw_pointers())
.function("setGrm", &recognizer::setGrm, allow_raw_pointers()) .function("setGrm", &recognizer::setGrm, allow_raw_pointers())
.function("setNLSML", &recognizer::setNLSML, allow_raw_pointers()) .function("setNLSML", &recognizer::setNLSML, allow_raw_pointers())
.function("setSpkModel", &recognizer::setSpkModel, allow_raw_pointers()) .function("setSpkModel", &recognizer::setSpkModel, allow_raw_pointers())
.function("setMaxAlternatives", &recognizer::setMaxAlternatives, allow_raw_pointers()); .function("setMaxAlternatives", &recognizer::setMaxAlternatives, allow_raw_pointers())
.function("acceptWaveForm", &recognizer::acceptWaveForm, allow_raw_pointers());
}; };

View File

@@ -1,6 +1,6 @@
#include "genericModel.h" #include "genericModel.h"
genericModel::genericModel(const std::string &url, const std::string& storepath, const std::string &id, int index) : url(url), id(id), genericObj(index) { genericModel::genericModel(const std::string &url, const std::string& storepath, const std::string &id) : url(url), id(id) {
fs::current_path("/opfs"); fs::current_path("/opfs");
fs::create_directories(storepath); fs::create_directories(storepath);
fs::current_path(storepath); fs::current_path(storepath);
@@ -21,23 +21,23 @@ bool genericModel::loadModel(const std::string& storepath) {
char filename[] {"/opfs/XXXXXX.tzst"}; char filename[] {"/opfs/XXXXXX.tzst"};
close(mkostemps(filename, 5, O_PATH)); close(mkostemps(filename, 5, O_PATH));
if(emscripten_wget(url.c_str(),filename) == 1) { if(emscripten_wget(url.c_str(),filename) == 1) {
fireEv("error", "Unable to fetch model"); throwErr("Unable to fetch model");
return false; return false;
} }
if(!extractModel(filename)) { if(!extractModel(filename)) {
fireEv("error", "Unable to extract model"); throwErr("Unable to extract model");
return false; return false;
} }
fs::remove(filename); fs::remove(filename);
if(!checkModel()) { if(!checkModel()) {
fireEv("error", "Model URL contains invalid model files"); throwErr("Model URL contains invalid model files");
fs::current_path("/opfs"); fs::current_path("/opfs");
fs::remove_all(storepath); fs::remove_all(storepath);
return false; return false;
} }
std::ofstream idFile("id"); std::ofstream idFile("id");
if(!idFile.is_open()) { if(!idFile.is_open()) {
fireEv("error", "Unable to write new id"); throwErr("Unable to write new id");
fs::remove_all(storepath); fs::remove_all(storepath);
return false; return false;
} }

View File

@@ -15,12 +15,12 @@
namespace fs = std::filesystem; namespace fs = std::filesystem;
struct genericModel : genericObj { struct genericModel {
const std::string url{}; const std::string url{};
const std::string id{}; const std::string id{};
static bool extractModel(char *name); static bool extractModel(char *name);
static bool checkId(const std::string& id); static bool checkId(const std::string& id);
virtual bool checkModel() = 0; virtual bool checkModel() = 0;
bool loadModel(const std::string& storepath); bool loadModel(const std::string& storepath);
genericModel(const std::string &url, const std::string &storepath, const std::string &id, int index); genericModel(const std::string &url, const std::string &storepath, const std::string &id);
}; };

View File

@@ -1,11 +0,0 @@
#include "genericObj.h"
void genericObj::fireEv(const char *type, const char *content) {
EM_ASM({
if($0 === 0) {
__genericObj__.objects[$0].dispatchEvent(new Event(UTF8ToString($1)));
return;
}
__genericObj__.objects[$0].dispatchEvent(new CustomEvent(UTF8ToString($1), {"details" : UTF8ToString($2)}));
},this->index, type, content);
}

View File

@@ -2,11 +2,11 @@
#include <emscripten.h> #include <emscripten.h>
#include <emscripten/console.h> #include <emscripten/console.h>
void throwErr(const char* msg) {
struct genericObj { EM_ASM({
const int index{}; throw Error(UTF8ToString($0))
genericObj(int index) : index(index) {}; },msg);
void fireEv(const char *type, const char *content = nullptr); }
};

View File

@@ -1,2 +0,0 @@
class __genericObj__ {static objects = []}

View File

@@ -1,13 +1,12 @@
#include "model.h" #include "model.h"
model::model(const std::string &url, const std::string& storepath, const std::string& id, int index) : genericModel(url, id, storepath, index) { model::model(const std::string &url, const std::string& storepath, const std::string& id, int index) : genericModel(url, id, storepath) {
if(!loadModel(storepath)) return; if(!loadModel(storepath)) return;
mdl = vosk_model_new("."); mdl = vosk_model_new(".");
if(mdl == nullptr) { if(mdl == nullptr) {
fireEv("error", "Unable to initialize model"); throwErr("Unable to initialize model");
return; return;
} }
fireEv("ready");
}; };
model::~model() { model::~model() {
vosk_model_free(mdl); vosk_model_free(mdl);

View File

@@ -1,12 +0,0 @@
class Model extends EventTarget{
constructor() {
super()
}
init(url, storepath, id) {
this.obj = new BrowserRecognizer.__model__(url, storepath, id, __genericObj__.objects.length);
__genericObj__.objects.push(this)
}
delete() {
this.obj.delete()
}
}

1
src/post.js Normal file
View File

@@ -0,0 +1 @@
window.loadBR = loadBR

62
src/pre.js Normal file
View File

@@ -0,0 +1,62 @@
var objs = []
class recognizer extends EventTarget {
constructor(rec) {
super()
this.obj = rec
objs.push(this)
}
delete() {
this.obj.delete()
}
setWords(words) {
this.obj.setWords(words)
}
setPartialWords(partialWords) {
this.obj.setPartialWords(partialWords)
}
setGrm(grm) {
this.obj.setGrm(grm)
}
setSpkModel(model) {
this.obj.setSpkModel(model.obj)
}
setNLSML(nlsml) {
this.obj.setNLSML(nlsml)
}
setMaxAlternatives(alts) {
this.obj.setMaxAlternatives(alts)
}
}
Module.deleteAll = () => objs.forEach(obj => obj.delete())
Module.makeModel = async (url, path, id) => {
let mdl
try {
mdl = new Module.model(url, path, id)
objs.push(mdl)
}
catch(e) {
return Promise.reject(e.message)
}
return mdl
}
Module.makeSpkModel = async (url, path, id) => {
let mdl
try {
mdl = new Module.spkModel(url, path, id)
objs.push(mdl)
}
catch(e) {
return Promise.reject(e.message)
}
return mdl
}
Module.makeRecognizer = async (model, sampleRate) => {
let rec
try {
rec = recognizer(new Module.recognizer(model,sampleRate, objs.length))
}
catch(e) {
return Promise.reject(e.message)
}
return rec
}

View File

@@ -1,33 +1,30 @@
#include "./recognizer.h" #include "./recognizer.h"
void recognizer::start() { recognizer::recognizer(model* mdl, float sampleRate, int index) : index(index) {
controller.test_and_set(std::memory_order_relaxed); rec = vosk_recognizer_new(mdl->mdl,sampleRate);
controller.notify_all();
}
void recognizer::stop() {
controller.clear(std::memory_order_relaxed);
controller.notify_all();
}
recognizer::recognizer(model* mdl, int sampleRate, int index) : genericObj(index) {
mic = alcCaptureOpenDevice("Emscripten OpenAL capture",sampleRate, AL_FORMAT_MONO16, 22480);
if(alcGetError(mic) != 0) {
fireEv("error", "Unable to initialize microphone");
return;
}
rec = vosk_recognizer_new(mdl->mdl,static_cast<float>(sampleRate));
if(rec == nullptr) { if(rec == nullptr) {
fireEv("error", "Unable to construct recognizer"); throwErr("Unable to initialize recognizer");
return; return;
} }
main(); }
void recognizer::fireEv(const char *type, const char *content) {
EM_ASM({
recognizers[$0].dispatchEvent(new CustomEvent(UTF8ToString($1), {"details" : UTF8ToString($2)}));
},this->index, type, content);
} }
recognizer::~recognizer() { recognizer::~recognizer() {
done.test_and_set(std::memory_order_relaxed);
done.notify_all();
stop();
vosk_recognizer_free(rec); vosk_recognizer_free(rec);
alcCaptureCloseDevice(mic);
} }
void recognizer::acceptWaveForm() { void recognizer::acceptWaveForm(float* data, int len) {
switch(vosk_recognizer_accept_waveform_f(rec, data, len)) {
case 0:
fireEv("result", vosk_recognizer_result(rec));
break;
case 1:
fireEv("partialResult", vosk_recognizer_partial_result(rec));
break;
default:
fireEv("_error", "Recognition error, unable to recognize");
}
} }
void recognizer::setGrm(const std::string& grm) { void recognizer::setGrm(const std::string& grm) {
vosk_recognizer_set_grm(rec, grm.c_str()); vosk_recognizer_set_grm(rec, grm.c_str());

View File

@@ -16,14 +16,13 @@
#include <archive_entry.h> #include <archive_entry.h>
namespace fs = std::filesystem; namespace fs = std::filesystem;
struct recognizer : genericObj { struct recognizer {
int index{};
VoskRecognizer* rec{}; VoskRecognizer* rec{};
ALCdevice* mic{}; void acceptWaveForm(float* data, int len);
void acceptWaveForm(); recognizer(model* model, float sampleRate, int index);
recognizer(model* model, int sampleRate, int index);
~recognizer(); ~recognizer();
void start(); void fireEv(const char* type, const char* content);
void stop();
void setSpkModel(spkModel* model); void setSpkModel(spkModel* model);
void setGrm(const std::string& grm); void setGrm(const std::string& grm);
void setWords(bool words); void setWords(bool words);

View File

@@ -1,38 +0,0 @@
class Recognizer extends EventTarget {
constructor() {
super()
}
init(model) {
ctx = new (AudioContext || webkitAudioContext)()
new BrowserRecognizer.__recognizer__(model.obj,ctx.sampleRate,__genericObj__.objects.length)
ctx.close()
__genericObj__.objects.push(this)
}
start() {
this.obj.start()
}
stop() {
this.obj.stop()
}
delete() {
this.obj.delete()
}
setWords(words) {
this.obj.setWords(words)
}
setPartialWords(partialWords) {
this.obj.setPartialWords(partialWords)
}
setGrm(grm) {
this.obj.setGrm(grm)
}
setSpkModel(model) {
this.obj.setSpkModel(model.obj)
}
setNLSML(nlsml) {
this.obj.setNLSML(nlsml)
}
setMaxAlternatives(alts) {
this.obj.setMaxAlternatives(alts)
}
}

View File

@@ -1,11 +1,11 @@
#include "spkModel.h" #include "spkModel.h"
spkModel::spkModel(const std::string &url, const std::string& storepath, const std::string& id, int index) : genericModel(url, storepath, id, index) { spkModel::spkModel(const std::string &url, const std::string& storepath, const std::string& id) : genericModel(url, storepath, id) {
if(!loadModel(storepath)) return; if(!loadModel(storepath)) return;
mdl = vosk_spk_model_new("."); mdl = vosk_spk_model_new(".");
if(mdl == nullptr) { if(mdl == nullptr) {
fireEv("error", "Unable to initialize speaker model"); throwErr("Unable to initialize speaker model");
return;
} }
fireEv("ready");
}; };
spkModel::~spkModel() { spkModel::~spkModel() {
vosk_spk_model_free(mdl); vosk_spk_model_free(mdl);

View File

@@ -4,7 +4,7 @@
struct spkModel : genericModel { struct spkModel : genericModel {
bool checkModel(); bool checkModel();
VoskSpkModel* mdl{}; VoskSpkModel* mdl{};
spkModel(const std::string &url, const std::string& storepath, const std::string& id, const int index); spkModel(const std::string &url, const std::string& storepath, const std::string& id);
~spkModel(); ~spkModel();
}; };

View File

@@ -1,12 +0,0 @@
class SpkModel extends EventTarget{
constructor() {
super()
}
init(url, storepath, id) {
this.obj = new BrowserRecognizer.__spkModel__(url, storepath, id, __genericObj__.objects.length)
__genericObj__.objects.push(this)
}
delete() {
this.obj.delete()
}
}