Speaker Recognition System based on Triplet State Loss Function

Authors

  • Tingyu Li

DOI:

https://doi.org/10.54691/sjt.v5i8.5496

Keywords:

ResCNN; Triplet Loss Function; Fbank Features; Cosine Similarity.

Abstract

The purpose of this paper is to build a model and design a speaker recognition system by comprehensively summarizing and learning the research data of speaker speech recognition models at home and abroad, and adopting a research method based on deep machine learning theory. Its main contents and proposed methods are as follows: For data processing, firstly, select and download the public data set from official website, preprocess each voice in the data set, extract Fbank features, convert it into. npy, store it in a file, process the voice into a format suitable for model input, and wait for subsequent input into the model. In practice, a ResCNN architecture based on convolution neural network is used to build a model. The model uses triplet loss function training to map speech to hyperplane, so cosine similarity is directly used to characterize the distance between two speakers. Speaker verification function provides three different ways to obtain speech, input the two acquired speech into the model, judge the similarity of the two speech and give the judgment result. For the speaker recognition model, three different ways can also be used to obtain the speech and determine which speaker the speech is in the corpus. For the speaker confirmation model, a speech is randomly played, and a speaker is randomly selected to judge whether the speech is the speaker's voice.

Downloads

Download data is not yet available.

References

Li C, Ma X, Jiang B ,et al. Deep Speaker: an End-to-End Neural Speaker Embedding System [J]. arXiv, 2017. DOI:10.48550/arXiv.1705.02304.

Li, Mu, and Others. Dive into Deep Learning, 2020, http://d2l.ai/.

He, Kaiming, et al. “Deep residual learning for image recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

Downloads

Published

2023-08-22

Issue

Section

Articles

How to Cite

Li, T. (2023). Speaker Recognition System based on Triplet State Loss Function. Scientific Journal of Technology, 5(8), 39-46. https://doi.org/10.54691/sjt.v5i8.5496