Change base structure to use errors instead

This commit is contained in:
msqr1
2024-01-20 00:41:38 -08:00
parent 91a21271d5
commit 2a426b983c
19 changed files with 147 additions and 198 deletions

View File

@@ -1,68 +1,34 @@
# Browser-recognizer
- A from-microphone speech recognizer built on Vosk that can be run on the browser, inspired by [vosk-browser](https://github.com/ccoreilly/vosk-browser), but built from scratch and no code taken!
- A speech recognizer built on Vosk that can be run on the browser, inspired by [vosk-browser](https://github.com/ccoreilly/vosk-browser), but built from scratch and no code taken!
- Browser-recognizer can run both in the browser main thread and web workers.
## Interface
- setLogLevel: set Kaldi's log level (default: -1)
- -2: Error
- -1: Warning
- 0: Info
- 1: Verbose
- 2: More verbose
- 3: Debug
### Model and SpkModel
```
model = new Model()
spkModel = new SpkModel()
// Add events listeners
model.init(url, storepath, id)
spkModel.init(url, storepath, id)
```
#### Functions
- ***constructor***: Construct the EventTarget part to enable addEventListener
- ***init*** : Initialize the internal object with an URL, storage path, and an ID.
- If **storepath** contains valid model files and **id** is the same, there will not be a fetch from **url**.
- If **storepath** doesn't contain valid model files, or if it contains valid model files but **id** is different, there will be a fetch from **url**, and the model is stored with **id**.
- ***delete***: Delete self and free resources
#### Events
- ***ready***: The model is ready to be put into a recognizer via the constructor, or setSpkModel() for SpkModel.
- ***error***: An error occured, check the event's "details" property.
### Recognizer
```
recognizer = new Recognizer()
// Add event listeners
recognizer.init(model)
```
#### Functions
- ***constructor***: Construct the EventTarget part to enable addEventListener
- ***init***: Construct the real internal object from a model
- ***start***: Start recognizing
- ***stop***: Stop recognizing
- ***setWords***: Return words' information in a result event (default: false)
- ***setPartialWords***: Return words' information in a partialResult event (default: false)
- ***setNLSML***: Return result and partialResult in NLSML form (default: false)
- ***setMaxAlternatives***: Set the max number of alternatives for result event (default: false)
- ***setGrm***: Add grammar to the recognizer (default: none)
- ***setSpkModel***: Set the speaker model of the recognizer (default: none)
- ***delete***: Call stop, delete self and free all resource
#### Events
- ***partialResult***: There is a partial recognition result, check the event's "details" property
- ***result***: There is a full recognition result, check the event's "details" property
- ***error***: An error occured, check the event's "details" property.
## Other key points
### IMPORTANT
- You MUST call delete() on objects at the end of its usage. Or put:
## Global and all objects' common interface
| Function signature (global) | Description |
|---|---|
| ```Promise makeModel(url, path, id)```<br>```Promise makeSpkModel(url, storepath, id)``` | - If **path** contains valid model files and **id** is the same, there will not be a fetch from **url**.<br>- If **path** doesn't contain valid model files, or if it contains valid model files but **id** is different, there will be a fetch from **url**, and the model is stored with **id**. |
| ```setLogLevel(level)``` | Set Vosk's log level (default: -1) <br>- 2: Error<br>- 1: Warning<br>- 0: Info <br>- 1: Verbose<br>- 2: More verbose<br>- 3: Debug |
| ```deleteAll()``` | Call ```delete()``` on all objects, it is recommended to put this at the end of the program to automatically clean up. See [here](https://emscripten.org/docs/getting_started/FAQ.html#what-does-exiting-the-runtime-mean-why-don-t-atexit-s-run).|
```
__genericObj__.objects.forEach(obj => obj.delete())
```
at the end of your program to automatically do that. We have to do this because Emscripten doesn't call destructors. See [here](https://emscripten.org/docs/getting_started/FAQ.html#what-does-exiting-the-runtime-mean-why-don-t-atexit-s-run).
- To be safe, always handle the API through events by adding all event listener before calling init() on objects.
- Always call init on the regular Model object before calling init on the recognizer. SpkModel can be init and set later.
### Guarantees
- If an error occurs (error event is fired), no changes was made, and no other dependent events will fire. For example, if an error occur while loading the model, the "ready" event won't fire in order to prevent executing code on a nonexistent model.
### Limitations compared to vosk-browser:
- Microphone only
- Fixed memory size at 300MB, changing it require recompilation
| Function signature (all objects) | Description
|---|---|
| ```delete()``` | Delete this Object
## ```Recognizer``` object
| Function signature | Description |
|---|---|
| ```setPartialWords(partialWords)``` | Return words' information in a partialResult event (default: false) |
| ```setWords(words)``` | Return words' information in a result event (default: false) |
| ```setNLSML(nlsml)``` | Return result and partialResult in NLSML form (default: false) |
| ```setMaxAlternatives(alts)``` | Set the max number of alternatives for result event (default: false) |
| ```setGrm(grm)``` | Add grammar to the recognizer (default: none) |
| ```setSpkModel(spkmodel)``` | Set the speaker model of the recognizer (default: none) |
| Event | Description |
|---|---|
| ```partialResult``` | There is a partial recognition result, check the event's "details" property |
| ```result``` | There is a full recognition result, check the event's "details" property |
| ```error``` | An recognition occurred, check the event's "details" property |
## Other key points
- If an error occurs, no changes was made.
- Fixed memory size at 300MB, changing it require recompilation (because the use of pthread will lead)
### Additions to vosk-browser:
- Multiple models support
- Speaker model (SpkModel) support
@@ -79,6 +45,11 @@ recognizer.init(model)
<script src="BrowserRecognizer.js"></script>
<!-->
<script>
// Choose a nice, non-conflicting name for the module
const BrRec = await loadBR()
// Prepare
const model = BrRec.makeModel()
const spkmodel = BrRec.
</script>
```

View File

@@ -1,11 +1,7 @@
<!DOCTYPE html>
<html>
<head>
<script src="BrowserRecognizer.js" defer></script>
<script type="module" src="src/genericObj.js" defer></script>
<script src="src/model.js" defer></script>
<script src="src/spkModel.js" defer></script>
<script src="src/recognizer.js" defer></script>
<script src="BrowserRecognizer.js" type="module"></script>
</head>
<script>

View File

@@ -63,4 +63,4 @@ em++ -pthread -O3 -flto -I. -I$KALDI/src -I$OPENFST/include $VOSK_FILES -c &&
emar -rcs vosk.a ${VOSK_FILES//.cc/.o} &&
cd $SRC
em++ -O3 genericObj.cc genericModel.cc model.cc spkModel.cc recognizer.cc bindings.cc -sWASMFS -sWASM_BIGINT -sSUPPORT_BIG_ENDIAN -sSINGLE_FILE -sMODULARIZE -sASYNCIFY -sEXPORT_NAME=loadBR -sENVIRONMENT=web,worker -sINITIAL_MEMORY=300mb -sPTHREAD_POOL_SIZE=2 -pthread -flto -I. -I$LIBARCHIVE/include -I$VOSK/src -L$LIBARCHIVE/lib -larchive -L$ZSTD/lib -lzstd -L$KALDI/src -l:online2/kaldi-online2.a -l:decoder/kaldi-decoder.a -l:ivector/kaldi-ivector.a -l:gmm/kaldi-gmm.a -l:tree/kaldi-tree.a -l:feat/kaldi-feat.a -l:cudamatrix/kaldi-cudamatrix.a -l:lat/kaldi-lat.a -l:lm/kaldi-lm.a -l:rnnlm/kaldi-rnnlm.a -l:hmm/kaldi-hmm.a -l:nnet3/kaldi-nnet3.a -l:transform/kaldi-transform.a -l:matrix/kaldi-matrix.a -l:fstext/kaldi-fstext.a -l:util/kaldi-util.a -l:base/kaldi-base.a -L$OPENFST/lib -l:libfst.a -l:libfstngram.a -L$CLAPACK_WASM -l:CBLAS/lib/cblas.a -l:CLAPACK-3.2.1/lapack.a -l:CLAPACK-3.2.1/libcblaswr.a -l:f2c_BLAS-3.8.0/blas.a -l:libf2c/libf2c.a -L$VOSK/src -l:vosk.a -lopfs.js -lembind -lopenal -o ../BrowserRecognizer.js
em++ -O3 genericObj.cc genericModel.cc model.cc spkModel.cc recognizer.cc bindings.cc -sWASMFS -sWASM_BIGINT -sSUPPORT_BIG_ENDIAN -sSINGLE_FILE -sMODULARIZE -sEXPORT_ES6 -sASYNCIFY -sEXPORT_NAME=loadBR -sENVIRONMENT=web,worker -sINITIAL_MEMORY=300mb -sPTHREAD_POOL_SIZE=2 --pre-js pre.js --extern-post-js post.js -pthread -flto -I. -I$LIBARCHIVE/include -I$VOSK/src -L$LIBARCHIVE/lib -larchive -L$ZSTD/lib -lzstd -L$KALDI/src -l:online2/kaldi-online2.a -l:decoder/kaldi-decoder.a -l:ivector/kaldi-ivector.a -l:gmm/kaldi-gmm.a -l:tree/kaldi-tree.a -l:feat/kaldi-feat.a -l:cudamatrix/kaldi-cudamatrix.a -l:lat/kaldi-lat.a -l:lm/kaldi-lm.a -l:rnnlm/kaldi-rnnlm.a -l:hmm/kaldi-hmm.a -l:nnet3/kaldi-nnet3.a -l:transform/kaldi-transform.a -l:matrix/kaldi-matrix.a -l:fstext/kaldi-fstext.a -l:util/kaldi-util.a -l:base/kaldi-base.a -L$OPENFST/lib -l:libfst.a -l:libfstngram.a -L$CLAPACK_WASM -l:CBLAS/lib/cblas.a -l:CLAPACK-3.2.1/lapack.a -l:CLAPACK-3.2.1/libcblaswr.a -l:f2c_BLAS-3.8.0/blas.a -l:libf2c/libf2c.a -L$VOSK/src -l:vosk.a -lopfs.js -lembind -lopenal -o ../BrowserRecognizer.js

View File

@@ -11,20 +11,19 @@ int main() {
}
EMSCRIPTEN_BINDINGS() {
function("setLogLevel", &vosk_set_log_level, allow_raw_pointers());
class_<model>("__model__")
class_<model>("model")
.constructor<std::string, std::string, std::string, int>(allow_raw_pointers());
class_<spkModel>("__spkModel__")
.constructor<std::string, std::string, std::string, const int>(allow_raw_pointers());
class_<spkModel>("spkModel")
.constructor<std::string, std::string, std::string, int>(allow_raw_pointers());
class_<recognizer>("__recognizer__")
class_<recognizer>("recognizer")
.constructor<model*, int, int>(allow_raw_pointers())
.function("start", &recognizer::start, allow_raw_pointers())
.function("stop", &recognizer::stop, allow_raw_pointers())
.function("setWords", &recognizer::setWords, allow_raw_pointers())
.function("setPartialWords", &recognizer::setPartialWords, allow_raw_pointers())
.function("setGrm", &recognizer::setGrm, allow_raw_pointers())
.function("setNLSML", &recognizer::setNLSML, allow_raw_pointers())
.function("setSpkModel", &recognizer::setSpkModel, allow_raw_pointers())
.function("setMaxAlternatives", &recognizer::setMaxAlternatives, allow_raw_pointers());
.function("setMaxAlternatives", &recognizer::setMaxAlternatives, allow_raw_pointers())
.function("acceptWaveForm", &recognizer::acceptWaveForm, allow_raw_pointers());
};

View File

@@ -1,6 +1,6 @@
#include "genericModel.h"
genericModel::genericModel(const std::string &url, const std::string& storepath, const std::string &id, int index) : url(url), id(id), genericObj(index) {
genericModel::genericModel(const std::string &url, const std::string& storepath, const std::string &id) : url(url), id(id) {
fs::current_path("/opfs");
fs::create_directories(storepath);
fs::current_path(storepath);
@@ -21,23 +21,23 @@ bool genericModel::loadModel(const std::string& storepath) {
char filename[] {"/opfs/XXXXXX.tzst"};
close(mkostemps(filename, 5, O_PATH));
if(emscripten_wget(url.c_str(),filename) == 1) {
fireEv("error", "Unable to fetch model");
throwErr("Unable to fetch model");
return false;
}
if(!extractModel(filename)) {
fireEv("error", "Unable to extract model");
throwErr("Unable to extract model");
return false;
}
fs::remove(filename);
if(!checkModel()) {
fireEv("error", "Model URL contains invalid model files");
throwErr("Model URL contains invalid model files");
fs::current_path("/opfs");
fs::remove_all(storepath);
return false;
}
std::ofstream idFile("id");
if(!idFile.is_open()) {
fireEv("error", "Unable to write new id");
throwErr("Unable to write new id");
fs::remove_all(storepath);
return false;
}

View File

@@ -15,12 +15,12 @@
namespace fs = std::filesystem;
struct genericModel : genericObj {
struct genericModel {
const std::string url{};
const std::string id{};
static bool extractModel(char *name);
static bool checkId(const std::string& id);
virtual bool checkModel() = 0;
bool loadModel(const std::string& storepath);
genericModel(const std::string &url, const std::string &storepath, const std::string &id, int index);
genericModel(const std::string &url, const std::string &storepath, const std::string &id);
};

View File

@@ -1,11 +0,0 @@
#include "genericObj.h"
void genericObj::fireEv(const char *type, const char *content) {
EM_ASM({
if($0 === 0) {
__genericObj__.objects[$0].dispatchEvent(new Event(UTF8ToString($1)));
return;
}
__genericObj__.objects[$0].dispatchEvent(new CustomEvent(UTF8ToString($1), {"details" : UTF8ToString($2)}));
},this->index, type, content);
}

View File

@@ -2,11 +2,11 @@
#include <emscripten.h>
#include <emscripten/console.h>
struct genericObj {
const int index{};
genericObj(int index) : index(index) {};
void fireEv(const char *type, const char *content = nullptr);
};
void throwErr(const char* msg) {
EM_ASM({
throw Error(UTF8ToString($0))
},msg);
}

View File

@@ -1,2 +0,0 @@
class __genericObj__ {static objects = []}

View File

@@ -1,13 +1,12 @@
#include "model.h"
model::model(const std::string &url, const std::string& storepath, const std::string& id, int index) : genericModel(url, id, storepath, index) {
model::model(const std::string &url, const std::string& storepath, const std::string& id, int index) : genericModel(url, id, storepath) {
if(!loadModel(storepath)) return;
mdl = vosk_model_new(".");
if(mdl == nullptr) {
fireEv("error", "Unable to initialize model");
throwErr("Unable to initialize model");
return;
}
fireEv("ready");
};
model::~model() {
vosk_model_free(mdl);

View File

@@ -1,12 +0,0 @@
class Model extends EventTarget{
constructor() {
super()
}
init(url, storepath, id) {
this.obj = new BrowserRecognizer.__model__(url, storepath, id, __genericObj__.objects.length);
__genericObj__.objects.push(this)
}
delete() {
this.obj.delete()
}
}

1
src/post.js Normal file
View File

@@ -0,0 +1 @@
window.loadBR = loadBR

62
src/pre.js Normal file
View File

@@ -0,0 +1,62 @@
var objs = []
class recognizer extends EventTarget {
constructor(rec) {
super()
this.obj = rec
objs.push(this)
}
delete() {
this.obj.delete()
}
setWords(words) {
this.obj.setWords(words)
}
setPartialWords(partialWords) {
this.obj.setPartialWords(partialWords)
}
setGrm(grm) {
this.obj.setGrm(grm)
}
setSpkModel(model) {
this.obj.setSpkModel(model.obj)
}
setNLSML(nlsml) {
this.obj.setNLSML(nlsml)
}
setMaxAlternatives(alts) {
this.obj.setMaxAlternatives(alts)
}
}
Module.deleteAll = () => objs.forEach(obj => obj.delete())
Module.makeModel = async (url, path, id) => {
let mdl
try {
mdl = new Module.model(url, path, id)
objs.push(mdl)
}
catch(e) {
return Promise.reject(e.message)
}
return mdl
}
Module.makeSpkModel = async (url, path, id) => {
let mdl
try {
mdl = new Module.spkModel(url, path, id)
objs.push(mdl)
}
catch(e) {
return Promise.reject(e.message)
}
return mdl
}
Module.makeRecognizer = async (model, sampleRate) => {
let rec
try {
rec = recognizer(new Module.recognizer(model,sampleRate, objs.length))
}
catch(e) {
return Promise.reject(e.message)
}
return rec
}

View File

@@ -1,33 +1,30 @@
#include "./recognizer.h"
void recognizer::start() {
controller.test_and_set(std::memory_order_relaxed);
controller.notify_all();
}
void recognizer::stop() {
controller.clear(std::memory_order_relaxed);
controller.notify_all();
}
recognizer::recognizer(model* mdl, int sampleRate, int index) : genericObj(index) {
mic = alcCaptureOpenDevice("Emscripten OpenAL capture",sampleRate, AL_FORMAT_MONO16, 22480);
if(alcGetError(mic) != 0) {
fireEv("error", "Unable to initialize microphone");
return;
}
rec = vosk_recognizer_new(mdl->mdl,static_cast<float>(sampleRate));
recognizer::recognizer(model* mdl, float sampleRate, int index) : index(index) {
rec = vosk_recognizer_new(mdl->mdl,sampleRate);
if(rec == nullptr) {
fireEv("error", "Unable to construct recognizer");
throwErr("Unable to initialize recognizer");
return;
}
main();
}
void recognizer::fireEv(const char *type, const char *content) {
EM_ASM({
recognizers[$0].dispatchEvent(new CustomEvent(UTF8ToString($1), {"details" : UTF8ToString($2)}));
},this->index, type, content);
}
recognizer::~recognizer() {
done.test_and_set(std::memory_order_relaxed);
done.notify_all();
stop();
vosk_recognizer_free(rec);
alcCaptureCloseDevice(mic);
}
void recognizer::acceptWaveForm() {
void recognizer::acceptWaveForm(float* data, int len) {
switch(vosk_recognizer_accept_waveform_f(rec, data, len)) {
case 0:
fireEv("result", vosk_recognizer_result(rec));
break;
case 1:
fireEv("partialResult", vosk_recognizer_partial_result(rec));
break;
default:
fireEv("_error", "Recognition error, unable to recognize");
}
}
void recognizer::setGrm(const std::string& grm) {
vosk_recognizer_set_grm(rec, grm.c_str());

View File

@@ -16,14 +16,13 @@
#include <archive_entry.h>
namespace fs = std::filesystem;
struct recognizer : genericObj {
struct recognizer {
int index{};
VoskRecognizer* rec{};
ALCdevice* mic{};
void acceptWaveForm();
recognizer(model* model, int sampleRate, int index);
void acceptWaveForm(float* data, int len);
recognizer(model* model, float sampleRate, int index);
~recognizer();
void start();
void stop();
void fireEv(const char* type, const char* content);
void setSpkModel(spkModel* model);
void setGrm(const std::string& grm);
void setWords(bool words);

View File

@@ -1,38 +0,0 @@
class Recognizer extends EventTarget {
constructor() {
super()
}
init(model) {
ctx = new (AudioContext || webkitAudioContext)()
new BrowserRecognizer.__recognizer__(model.obj,ctx.sampleRate,__genericObj__.objects.length)
ctx.close()
__genericObj__.objects.push(this)
}
start() {
this.obj.start()
}
stop() {
this.obj.stop()
}
delete() {
this.obj.delete()
}
setWords(words) {
this.obj.setWords(words)
}
setPartialWords(partialWords) {
this.obj.setPartialWords(partialWords)
}
setGrm(grm) {
this.obj.setGrm(grm)
}
setSpkModel(model) {
this.obj.setSpkModel(model.obj)
}
setNLSML(nlsml) {
this.obj.setNLSML(nlsml)
}
setMaxAlternatives(alts) {
this.obj.setMaxAlternatives(alts)
}
}

View File

@@ -1,11 +1,11 @@
#include "spkModel.h"
spkModel::spkModel(const std::string &url, const std::string& storepath, const std::string& id, int index) : genericModel(url, storepath, id, index) {
spkModel::spkModel(const std::string &url, const std::string& storepath, const std::string& id) : genericModel(url, storepath, id) {
if(!loadModel(storepath)) return;
mdl = vosk_spk_model_new(".");
if(mdl == nullptr) {
fireEv("error", "Unable to initialize speaker model");
throwErr("Unable to initialize speaker model");
return;
}
fireEv("ready");
};
spkModel::~spkModel() {
vosk_spk_model_free(mdl);

View File

@@ -4,7 +4,7 @@
struct spkModel : genericModel {
bool checkModel();
VoskSpkModel* mdl{};
spkModel(const std::string &url, const std::string& storepath, const std::string& id, const int index);
spkModel(const std::string &url, const std::string& storepath, const std::string& id);
~spkModel();
};

View File

@@ -1,12 +0,0 @@
class SpkModel extends EventTarget{
constructor() {
super()
}
init(url, storepath, id) {
this.obj = new BrowserRecognizer.__spkModel__(url, storepath, id, __genericObj__.objects.length)
__genericObj__.objects.push(this)
}
delete() {
this.obj.delete()
}
}