Fix licence and restructure, prepare to add AudioWorklet

This commit is contained in:
msqr1
2024-01-20 17:37:47 -08:00
parent 6bc5f4061d
commit 6da9a662a1
14 changed files with 94 additions and 104 deletions

25
LICENSE
View File

@@ -174,28 +174,3 @@
of your accepting any such warranty or additional liability. of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

15
NOTICE Normal file
View File

@@ -0,0 +1,15 @@
Browser Recognizer
Copyright 2024 Rylex Phan
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@@ -1,12 +1,12 @@
# Browser-recognizer # Browser-recognizer-
- A speech recognizer built on Vosk that can be run on the browser, inspired by [vosk-browser](https://github.com/ccoreilly/vosk-browser), but built from scratch and no code taken! - A speech recognizer built on Vosk that can be run on the browser, inspired by [vosk-browser](https://github.com/ccoreilly/vosk-browser), but built from scratch and no code taken!
- Browser-recognizer can run both in the browser main thread and web workers. - Browser-recognizer can run both in the browser main thread and web workers.
## Global and all objects' common interface ## Global and all objects' common interface
| Function signature (global) | Description | | Function signature (global) | Description |
|---|---| |---|---|
| ```Promise makeModel(url, path, id)```<br>```Promise makeSpkModel(url, storepath, id)``` | - If **path** contains valid model files and **id** is the same, there will not be a fetch from **url**.<br>- If **path** doesn't contain valid model files, or if it contains valid model files but **id** is different, there will be a fetch from **url**, and the model is stored with **id**. | | ```Promise makeModel(path: string, url: string, id: string)```<br><br>```Promise makeSpkModel(path: string, url: string, id: string)``` | - If **path** contains valid model files and **id** is the same, there will not be a fetch from **url**.<br>- If **path** doesn't contain valid model files, or if it contains valid model files but **id** is different, there will be a fetch from **url**, and the model is stored with **id**. |
| ```setLogLevel(level)``` | Set Vosk's log level (default: -1) <br>- 2: Error<br>- 1: Warning<br>- 0: Info <br>- 1: Verbose<br>- 2: More verbose<br>- 3: Debug | | ```setLogLevel(lvl: int)``` | Set Vosk's log level (default: -1) <br>- 2: Error<br>- 1: Warning<br>- 0: Info <br>- 1: Verbose<br>- 2: More verbose<br>- 3: Debug |
| ```deleteAll()``` | Call ```delete()``` on all objects, it is recommended to put this at the end of the program to automatically clean up. See [here](https://emscripten.org/docs/getting_started/FAQ.html#what-does-exiting-the-runtime-mean-why-don-t-atexit-s-run).| | ```deleteAll()``` | Call ```delete()``` on all objects, it is recommended to put this at the end of the program to automatically clean up. See [why](https://emscripten.org/docs/getting_started/FAQ.html#what-does-exiting-the-runtime-mean-why-don-t-atexit-s-run).|
| Function signature (all objects) | Description | Function signature (all objects) | Description
|---|---| |---|---|
@@ -14,18 +14,18 @@
## ```Recognizer``` object ## ```Recognizer``` object
| Function signature | Description | | Function signature | Description |
|---|---| |---|---|
| ```setPartialWords(partialWords)``` | Return words' information in a partialResult event (default: false) | | ```processAudio(audio: AudioBuffer)``` | Recognize an audio chunk,
| ```setWords(words)``` | Return words' information in a result event (default: false) | | ```setPartialWords(partialWords: bool)``` | Return words' information in a partialResult event (default: false) |
| ```setNLSML(nlsml)``` | Return result and partialResult in NLSML form (default: false) | | ```setWords(words: bool)``` | Return words' information in a result event (default: false) |
| ```setMaxAlternatives(alts)``` | Set the max number of alternatives for result event (default: false) | | ```setNLSML(nlsml: bool)``` | Return result and partialResult in NLSML form (default: false) |
| ```setGrm(grm)``` | Add grammar to the recognizer (default: none) | | ```setMaxAlternatives(alts: int)``` | Set the max number of alternatives for result event (default: false) |
| ```setSpkModel(spkmodel)``` | Set the speaker model of the recognizer (default: none) | | ```setGrm(grm: string)``` | Add grammar to the recognizer (default: none) |
| ```setSpkModel(mdl: spkmodel)``` | Set the speaker model of the recognizer (default: none) |
| Event | Description | | Event | Description |
|---|---| |---|---|
| ```partialResult``` | There is a partial recognition result, check the event's "details" property | | ```partialResult``` | There is a partial recognition result, check the event's "details" property |
| ```result``` | There is a full recognition result, check the event's "details" property | | ```result``` | There is a full recognition result, check the event's "details" property |
| ```error``` | An recognition occurred, check the event's "details" property |
## Other key points ## Other key points
- If an error occurs, no changes was made. - If an error occurs, no changes was made.
- Fixed memory size at 300MB, changing it require recompilation (because the use of pthread will lead) - Fixed memory size at 300MB, changing it require recompilation (because the use of pthread will lead)
@@ -45,21 +45,21 @@
<script src="BrowserRecognizer.js" type="module"></script> <script src="BrowserRecognizer.js" type="module"></script>
<!--> <!-->
<script> <script>
// Select name // Select name
const BrRec = await loadBR() const BrRec = await loadBR()
// Prepare // Prepare
const model = await BrRec.makeModel(") const model = await BrRec.makeModel(")
const recognizer = await BrRec.makeRecognizer(model) const recognizer = await BrRec.makeRecognizer(model)
recognizer.addEventListener("result", e => { recognizer.addEventListener("result", e => {
console.log("Result: ",e.details) console.log("Result: ",e.details)
} })
recognizer.addEventListener("partialResult", e => { recognizer.addEventListener("partialResult", e => {
console.log("Partial result: ",e.details) console.log("Partial result: ",e.details)
} })
// Recognize // Recognize
media = await navigator.mediaDevices.getUserMedia({ media = await navigator.mediaDevices.getUserMedia({
video: false, video: false,
audio: { audio: {
echoCancellation: true, echoCancellation: true,
@@ -67,7 +67,7 @@ media = await navigator.mediaDevices.getUserMedia({
channelCount: 1, channelCount: 1,
sampleRate: 16000 sampleRate: 16000
}, },
}); });
</script> </script>

View File

@@ -62,5 +62,5 @@ VOSK_FILES="recognizer.cc language_model.cc model.cc spk_model.cc vosk_api.cc" &
em++ -pthread -O3 -flto -I. -I$KALDI/src -I$OPENFST/include $VOSK_FILES -c && em++ -pthread -O3 -flto -I. -I$KALDI/src -I$OPENFST/include $VOSK_FILES -c &&
emar -rcs vosk.a ${VOSK_FILES//.cc/.o} && emar -rcs vosk.a ${VOSK_FILES//.cc/.o} &&
cd $SRC cd $SRC &&
em++ -O3 genericObj.cc genericModel.cc model.cc spkModel.cc recognizer.cc bindings.cc -sWASMFS -sWASM_BIGINT -sSUPPORT_BIG_ENDIAN -sSINGLE_FILE -sMODULARIZE -sEXPORT_ES6 -sASYNCIFY -sEXPORT_NAME=loadBR -sENVIRONMENT=web,worker -sINITIAL_MEMORY=300mb -sPTHREAD_POOL_SIZE=2 --pre-js pre.js --extern-post-js post.js -pthread -flto -I. -I$LIBARCHIVE/include -I$VOSK/src -L$LIBARCHIVE/lib -larchive -L$ZSTD/lib -lzstd -L$KALDI/src -l:online2/kaldi-online2.a -l:decoder/kaldi-decoder.a -l:ivector/kaldi-ivector.a -l:gmm/kaldi-gmm.a -l:tree/kaldi-tree.a -l:feat/kaldi-feat.a -l:cudamatrix/kaldi-cudamatrix.a -l:lat/kaldi-lat.a -l:lm/kaldi-lm.a -l:rnnlm/kaldi-rnnlm.a -l:hmm/kaldi-hmm.a -l:nnet3/kaldi-nnet3.a -l:transform/kaldi-transform.a -l:matrix/kaldi-matrix.a -l:fstext/kaldi-fstext.a -l:util/kaldi-util.a -l:base/kaldi-base.a -L$OPENFST/lib -l:libfst.a -l:libfstngram.a -L$CLAPACK_WASM -l:CBLAS/lib/cblas.a -l:CLAPACK-3.2.1/lapack.a -l:CLAPACK-3.2.1/libcblaswr.a -l:f2c_BLAS-3.8.0/blas.a -l:libf2c/libf2c.a -L$VOSK/src -l:vosk.a -lopfs.js -lembind -lopenal -o ../BrowserRecognizer.js em++ -O3 genericModel.cc model.cc spkModel.cc recognizer.cc bindings.cc -sWASMFS -sWASM_BIGINT -sSUPPORT_BIG_ENDIAN -sSINGLE_FILE -sEMBIND_AOT -sWASM_WORKER -sAUDIO_WORKLET -sEMBIND_STD_STRING_IS_UTF8 -sSUPPORT_LONGJMP=0 -sMODULARIZE -sEXPORT_NAME=loadBR -sEXPORT_ES6 -sENVIRONMENT=web,worker -sINITIAL_MEMORY=300mb -sASYNCIFY -sPTHREAD_POOL_SIZE=2 --pre-js pre.js --extern-post-js post.js -pthread -flto -I. -I$LIBARCHIVE/include -I$VOSK/src -L$LIBARCHIVE/lib -larchive -L$ZSTD/lib -lzstd -L$KALDI/src -l:online2/kaldi-online2.a -l:decoder/kaldi-decoder.a -l:ivector/kaldi-ivector.a -l:gmm/kaldi-gmm.a -l:tree/kaldi-tree.a -l:feat/kaldi-feat.a -l:cudamatrix/kaldi-cudamatrix.a -l:lat/kaldi-lat.a -l:lm/kaldi-lm.a -l:rnnlm/kaldi-rnnlm.a -l:hmm/kaldi-hmm.a -l:nnet3/kaldi-nnet3.a -l:transform/kaldi-transform.a -l:matrix/kaldi-matrix.a -l:fstext/kaldi-fstext.a -l:util/kaldi-util.a -l:base/kaldi-base.a -L$OPENFST/lib -l:libfst.a -l:libfstngram.a -L$CLAPACK_WASM -l:CBLAS/lib/cblas.a -l:CLAPACK-3.2.1/lapack.a -l:CLAPACK-3.2.1/libcblaswr.a -l:f2c_BLAS-3.8.0/blas.a -l:libf2c/libf2c.a -L$VOSK/src -l:vosk.a -lopfs.js -lembind -lopenal -o ../BrowserRecognizer.js

View File

@@ -1,7 +1,17 @@
#include "spkModel.h" #include "spkModel.h"
#include "model.h" #include "model.h"
#include "recognizer.h" #include "recognizer.h"
#include <emscripten/bind.h>
using namespace emscripten; using namespace emscripten;
void throwJS(const char* msg, bool err = false) {
EM_ASM({
if($1) {
throw Error(UTF8ToString)
return
}
throw UTF8ToString($0)
},msg, err);
}
int main() { int main() {
//vosk_set_log_level(-1); //vosk_set_log_level(-1);
std::thread t{[](){ std::thread t{[](){
@@ -12,13 +22,13 @@ int main() {
EMSCRIPTEN_BINDINGS() { EMSCRIPTEN_BINDINGS() {
function("setLogLevel", &vosk_set_log_level, allow_raw_pointers()); function("setLogLevel", &vosk_set_log_level, allow_raw_pointers());
class_<model>("model") class_<model>("model")
.constructor<std::string, std::string, std::string, int>(allow_raw_pointers()); .constructor<std::string, std::string, std::string>(allow_raw_pointers());
class_<spkModel>("spkModel") class_<spkModel>("spkModel")
.constructor<std::string, std::string, std::string, int>(allow_raw_pointers()); .constructor<std::string, std::string, std::string>(allow_raw_pointers());
class_<recognizer>("recognizer") class_<recognizer>("recognizer")
.constructor<model*, int, int>(allow_raw_pointers()) .constructor<model*, float, int>(allow_raw_pointers())
.function("setWords", &recognizer::setWords, allow_raw_pointers()) .function("setWords", &recognizer::setWords, allow_raw_pointers())
.function("setPartialWords", &recognizer::setPartialWords, allow_raw_pointers()) .function("setPartialWords", &recognizer::setPartialWords, allow_raw_pointers())
.function("setGrm", &recognizer::setGrm, allow_raw_pointers()) .function("setGrm", &recognizer::setGrm, allow_raw_pointers())

View File

@@ -21,23 +21,23 @@ bool genericModel::loadModel(const std::string& storepath) {
char filename[] {"/opfs/XXXXXX.tzst"}; char filename[] {"/opfs/XXXXXX.tzst"};
close(mkostemps(filename, 5, O_PATH)); close(mkostemps(filename, 5, O_PATH));
if(emscripten_wget(url.c_str(),filename) == 1) { if(emscripten_wget(url.c_str(),filename) == 1) {
throwErr("Unable to fetch model"); throwJS("Unable to fetch model");
return false; return false;
} }
if(!extractModel(filename)) { if(!extractModel(filename)) {
throwErr("Unable to extract model"); throwJS("Unable to extract model");
return false; return false;
} }
fs::remove(filename); fs::remove(filename);
if(!checkModel()) { if(!checkModel()) {
throwErr("Model URL contains invalid model files"); throwJS("Model URL contains invalid model files");
fs::current_path("/opfs"); fs::current_path("/opfs");
fs::remove_all(storepath); fs::remove_all(storepath);
return false; return false;
} }
std::ofstream idFile("id"); std::ofstream idFile("id");
if(!idFile.is_open()) { if(!idFile.is_open()) {
throwErr("Unable to write new id"); throwJS("Unable to write new id");
fs::remove_all(storepath); fs::remove_all(storepath);
return false; return false;
} }

View File

@@ -1,6 +1,4 @@
#pragma once #pragma once
#include "genericObj.h"
#include <string> #include <string>
#include <filesystem> #include <filesystem>
#include <fstream> #include <fstream>
@@ -11,8 +9,8 @@
#include <archive.h> #include <archive.h>
#include <archive_entry.h> #include <archive_entry.h>
#include <emscripten/wasmfs.h> #include <emscripten/wasmfs.h>
#include <emscripten/bind.h> #include <emscripten.h>
extern void throwJS(const char* msg, bool err = false);
namespace fs = std::filesystem; namespace fs = std::filesystem;
struct genericModel { struct genericModel {

View File

@@ -1,12 +0,0 @@
#pragma once
#include <emscripten.h>
#include <emscripten/console.h>
void throwErr(const char* msg) {
EM_ASM({
throw Error(UTF8ToString($0))
},msg);
}

View File

@@ -1,11 +1,10 @@
#include "model.h" #include "model.h"
model::model(const std::string &url, const std::string& storepath, const std::string& id, int index) : genericModel(url, id, storepath) { model::model(const std::string &url, const std::string& storepath, const std::string& id) : genericModel(url, storepath, id) {
if(!loadModel(storepath)) return; if(!loadModel(storepath)) return;
mdl = vosk_model_new("."); mdl = vosk_model_new(".");
if(mdl == nullptr) { if(mdl == nullptr) {
throwErr("Unable to initialize model"); throwJS("Unable to initialize model");
return;
} }
}; };
model::~model() { model::~model() {

View File

@@ -4,7 +4,7 @@
struct model : genericModel { struct model : genericModel {
bool checkModel(); bool checkModel();
VoskModel* mdl{}; VoskModel* mdl{};
model(const std::string &url, const std::string& storepath, const std::string& id, int index); model(const std::string &url, const std::string& storepath, const std::string& id);
~model(); ~model();
}; };

View File

@@ -5,6 +5,11 @@ class recognizer extends EventTarget {
this.obj = rec this.obj = rec
objs.push(this) objs.push(this)
} }
processAudio(buffer) {
if(buffer.numberOfChannels < 1) throw Error("Buffer has ",buffer.numberOfChannels, " channel")
let data = buffer.getChannelData(0);
if(!(data instanceof Float32Array)) throw Error("Channel data isn't a Float32Array");
}
delete() { delete() {
this.obj.delete() this.obj.delete()
} }
@@ -32,31 +37,34 @@ Module.makeModel = async (url, path, id) => {
let mdl let mdl
try { try {
mdl = new Module.model(url, path, id) mdl = new Module.model(url, path, id)
objs.push(mdl)
} }
catch(e) { catch(e) {
return Promise.reject(e.message) mdl.delete()
return Promise.reject(e)
} }
objs.push(mdl)
return mdl return mdl
} }
Module.makeSpkModel = async (url, path, id) => { Module.makeSpkModel = async (url, path, id) => {
let mdl let mdl
try { try {
mdl = new Module.spkModel(url, path, id) mdl = new Module.spkModel(url, path, id)
objs.push(mdl)
} }
catch(e) { catch(e) {
return Promise.reject(e.message) mdl.delete()
return Promise.reject(e)
} }
objs.push(mdl)
return mdl return mdl
} }
Module.makeRecognizer = async (model, sampleRate) => { Module.makeRecognizer = async (model, sampleRate, ctx) => {
let rec let rec
try { try {
rec = recognizer(new Module.recognizer(model,sampleRate, objs.length)) rec = new Module.recognizer(model,sampleRate, objs.length)
} }
catch(e) { catch(e) {
return Promise.reject(e.message) rec.delete()
return Promise.reject(e)
} }
return rec return new recognizer(rec)
} }

View File

@@ -1,19 +1,18 @@
#include "./recognizer.h" #include "recognizer.h"
recognizer::recognizer(model* mdl, float sampleRate, int index) : index(index) { recognizer::recognizer(model* mdl, float sampleRate, int index) : index(index) {
rec = vosk_recognizer_new(mdl->mdl,sampleRate); rec = vosk_recognizer_new(mdl->mdl,sampleRate);
if(rec == nullptr) { if(rec == nullptr) {
throwErr("Unable to initialize recognizer"); throwJS("Unable to initialize recognizer");
return;
} }
} }
recognizer::~recognizer() {
vosk_recognizer_free(rec);
}
void recognizer::fireEv(const char *type, const char *content) { void recognizer::fireEv(const char *type, const char *content) {
EM_ASM({ EM_ASM({
recognizers[$0].dispatchEvent(new CustomEvent(UTF8ToString($1), {"details" : UTF8ToString($2)})); recognizers[$0].dispatchEvent(new CustomEvent(UTF8ToString($1), {"details" : UTF8ToString($2)}));
},this->index, type, content); },this->index, type, content);
} }
recognizer::~recognizer() {
vosk_recognizer_free(rec);
}
void recognizer::acceptWaveForm(float* data, int len) { void recognizer::acceptWaveForm(float* data, int len) {
switch(vosk_recognizer_accept_waveform_f(rec, data, len)) { switch(vosk_recognizer_accept_waveform_f(rec, data, len)) {
case 0: case 0:
@@ -23,7 +22,7 @@ void recognizer::acceptWaveForm(float* data, int len) {
fireEv("partialResult", vosk_recognizer_partial_result(rec)); fireEv("partialResult", vosk_recognizer_partial_result(rec));
break; break;
default: default:
fireEv("_error", "Recognition error, unable to recognize"); throwJS("acceptWaveForm error (from C++)", true);
} }
} }
void recognizer::setGrm(const std::string& grm) { void recognizer::setGrm(const std::string& grm) {

View File

@@ -1,27 +1,26 @@
#pragma once #pragma once
#include "model.h" #include "model.h"
#include "spkModel.h" #include "spkModel.h"
#include "genericObj.h"
#include <filesystem> #include <filesystem>
#include <atomic> #include <atomic>
#include <thread> #include <thread>
#include <emscripten/bind.h>
#include <emscripten/wasmfs.h> #include <emscripten/wasmfs.h>
#include <emscripten/console.h> #include <emscripten/webaudio.h>
#include <AL/al.h> #include <AL/al.h>
#include <AL/alc.h> #include <AL/alc.h>
#include <archive.h> #include <archive.h>
#include <archive_entry.h> #include <archive_entry.h>
extern void throwJS(const char* msg, bool err = false);
namespace fs = std::filesystem; namespace fs = std::filesystem;
struct recognizer { struct recognizer {
int index{}; int index{};
VoskRecognizer* rec{}; VoskRecognizer* rec{};
void acceptWaveForm(float* data, int len);
recognizer(model* model, float sampleRate, int index); recognizer(model* model, float sampleRate, int index);
~recognizer(); ~recognizer();
void acceptWaveForm(float* data, int len);
void fireEv(const char* type, const char* content); void fireEv(const char* type, const char* content);
void setSpkModel(spkModel* model); void setSpkModel(spkModel* model);
void setGrm(const std::string& grm); void setGrm(const std::string& grm);

View File

@@ -3,8 +3,7 @@ spkModel::spkModel(const std::string &url, const std::string& storepath, const s
if(!loadModel(storepath)) return; if(!loadModel(storepath)) return;
mdl = vosk_spk_model_new("."); mdl = vosk_spk_model_new(".");
if(mdl == nullptr) { if(mdl == nullptr) {
throwErr("Unable to initialize speaker model"); throwJS("Unable to initialize speaker model");
return;
} }
}; };
spkModel::~spkModel() { spkModel::~spkModel() {