Speaker Recognition System based on Triplet State Loss Function
DOI:
https://doi.org/10.54691/sjt.v5i8.5496Keywords:
ResCNN; Triplet Loss Function; Fbank Features; Cosine Similarity.Abstract
The purpose of this paper is to build a model and design a speaker recognition system by comprehensively summarizing and learning the research data of speaker speech recognition models at home and abroad, and adopting a research method based on deep machine learning theory. Its main contents and proposed methods are as follows: For data processing, firstly, select and download the public data set from official website, preprocess each voice in the data set, extract Fbank features, convert it into. npy, store it in a file, process the voice into a format suitable for model input, and wait for subsequent input into the model. In practice, a ResCNN architecture based on convolution neural network is used to build a model. The model uses triplet loss function training to map speech to hyperplane, so cosine similarity is directly used to characterize the distance between two speakers. Speaker verification function provides three different ways to obtain speech, input the two acquired speech into the model, judge the similarity of the two speech and give the judgment result. For the speaker recognition model, three different ways can also be used to obtain the speech and determine which speaker the speech is in the corpus. For the speaker confirmation model, a speech is randomly played, and a speaker is randomly selected to judge whether the speech is the speaker's voice.
Downloads
References
Li C, Ma X, Jiang B ,et al. Deep Speaker: an End-to-End Neural Speaker Embedding System [J]. arXiv, 2017. DOI:10.48550/arXiv.1705.02304.
Li, Mu, and Others. Dive into Deep Learning, 2020, http://d2l.ai/.
He, Kaiming, et al. “Deep residual learning for image recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.




