Speaker recognition

Speaker recognition is the identification of a person from characteristics of voices.^[1] It is used to answer the question "Who is speaking?" The term voice recognition^[2]^[3]^[4]^[5]^[6] can refer to speaker recognition or speech recognition. Speaker verification (also called speaker authentication) contrasts with identification, and speaker recognition differs from speaker diarisation (recognizing when the same speaker is speaking).

Recognizing the speaker can simplify the task of

anatomy

and learned behavioral patterns.

Verification versus identification

There are two major applications of speaker recognition technologies and methodologies. If the speaker claims to be of a certain identity and the voice is used to verify this claim, this is called verification or authentication. On the other hand, identification is the task of determining an unknown speaker's identity. In a sense, speaker verification is a 1:1 match where one speaker's voice is matched to a particular template whereas speaker identification is a 1:N match where the voice is compared against multiple templates.

From a security perspective, identification is different from verification. Speaker verification is usually employed as a "gatekeeper" in order to provide access to a secure system. These systems operate with the users' knowledge and typically require their cooperation. Speaker identification systems can also be implemented covertly without the user's knowledge to identify talkers in a discussion, alert automated systems of speaker changes, check if a user is already enrolled in a system, etc.

In forensic applications, it is common to first perform a speaker identification process to create a list of "best matches" and then perform a series of verification processes to determine a conclusive match. Working to match the samples from the speaker to the list of best matches helps figure out if they are the same person based on the amount of similarities or differences. The prosecution and defense use this as evidence to determine if the suspect is actually the offender.^[7]

Training

One of the earliest training technologies to commercialize was implemented in Worlds of Wonder's 1987 Julie doll. At that point, speaker independence was an intended breakthrough, and systems required a training period. A 1987 ad for the doll carried the tagline "Finally, the doll that understands you." - despite the fact that it was described as a product "which children could train to respond to their voice."^[8] The term voice recognition, even a decade later, referred to speaker independence.^[9]^{[clarification needed]}

Variants of speaker recognition

Each speaker recognition system has two phases: enrollment and verification. During enrollment, the speaker's voice is recorded and typically a number of features are extracted to form a voice print, template, or model. In the verification phase, a speech sample or "utterance" is compared against a previously created voice print. For identification systems, the utterance is compared against multiple voice prints in order to determine the best match(es) while verification systems compare an utterance against a single voice print. Because of the process involved, verification is faster than identification.

Speaker recognition systems fall into two categories: text-dependent and text-independent.

speech analysis techniques are used.^[12]

Technology

Speaker recognition is a

speech verification.^{[citation needed}

]

two-factor authentication products is expected to increase. Voice changes due to ageing may impact system performance over time. Some systems adapt the speaker models after each successful verification to capture such long-term changes in the voice, though there is debate regarding the overall security impact imposed by automated adaptation^{[citation needed}

]

Legal implications

Due to the introduction of legislation like the General Data Protection Regulation in the European Union and the California Consumer Privacy Act in the United States, there has been much discussion about the use of speaker recognition in the work place. In September 2019 Irish speech recognition developer Soapbox Labs warned about the legal implications that may be involved.^[14]

Applications

The first international patent was filed in 1983, coming from the telecommunication research in CSELT^[15] (Italy) by Michele Cavazza and Alberto Ciaramella as a basis for both future telco services to final customers and to improve the noise-reduction techniques across the network.

Between 1996 and 1998, speaker recognition technology was used at the Scobey–Coronach Border Crossing to enable enrolled local residents with nothing to declare to cross the Canada–United States border when the inspection stations were closed for the night.^[16] The system was developed for the U.S. Immigration and Naturalization Service by Voice Strategies of Warren, Michigan.^{[citation needed]}

In 2013

Siri technology. 93% of customers gave the system at "9 out of 10" for speed, ease of use and security.^[18]

Speaker recognition may also be used in criminal investigations, such as those of the 2014 executions of, amongst others, James Foley and Steven Sotloff.^[19]

In February 2016 UK high-street bank HSBC and its internet-based retail bank First Direct announced that it would offer 15 million customers its biometric banking software to access online and phone accounts using their fingerprint or voice.^[20]

In 2023 Vice News and The Guardian separately demonstrated they could defeat standard financial speaker-authentication systems using AI-generated voices generated from about five minutes of the target's voice samples.^[21]^[22]

Notes

ISSN 2047-4938
.

ISBN 978-0-8422-5149-5
.

ISSN 0095-4470
.

^ "VOICE RECOGNITION (noun) definition and synonyms". macmillandictionary.com. January 23, 2010. Archived from the original on March 27, 2023. Retrieved October 13, 2023.{{cite web}}: CS1 maint: unfit URL (link)

^ "What is voice recognition? definition and meaning". businessdictionary.com. October 6, 2008. Archived from the original on December 3, 2011.

^ "The Mailbag LG #114". Linux Gazette. March 28, 2005.

ISSN 1748-8893
.

^ Pinola, Melanie (November 2, 2011). "Speech Recognition Through the Decades: How We Ended Up With Siri". PCWorld.

^ Rosen, Cheryl (March 3, 1997). "Voice Recognition To Ease Travel Bookings". Business Travel News. The earliest applications of speech recognition software were dictation ... Four months ago, IBM introduced a "continual dictation product" designed to ... debuted at the National Business Travel Association trade show in 1994.

^ "Speaker Verification: Text-Dependent vs. Text-Independent". Microsoft Research. June 19, 2017. text-dependent and text-independent speaker .. both equal error rate and detection ..

ISSN 2522-8692
. task .. verification or identification

^ Myers, Lisa (July 25, 2004). "An Exploration of Voice Biometrics". SANS Institute.

ISSN 1051-2004
.

^ "Speech recognition expert raises concerns around voice technology in the workplace". Independent.ie. September 29, 2019. Retrieved September 30, 2019.

^ US4752958 A, Michele Cavazza, Alberto Ciaramella, "Device for speaker's verification" http://www.google.com/patents/US4752958?hl=it&cl=en

^ Meyer, Barb (June 12, 1996). "Automated Border Crossing". Television news report. Meyer Television News.

^ International Banking (December 27, 2013). "Voice Biometric Technology in Banking | Barclays". Wealth.barclays.com. Retrieved February 21, 2016.

^ Matt Warman (May 8, 2013). "Say goodbye to the pin: voice recognition takes over at Barclays Wealth". Retrieved June 5, 2013.

^ Ewen MacAskill. "Did 'Jihadi John' kill Steven Sotloff? | Media". The Guardian. Retrieved February 21, 2016.

^ Julia Kollewe (February 19, 2016). "HSBC rolls out voice and touch ID security for bank customers | Business". The Guardian. Retrieved February 21, 2016.

^ "How I Broke into a Bank Account with an AI-Generated Voice". February 23, 2023.

^ Evershed, Nick; Taylor, Josh (March 16, 2023). "AI can fool voice recognition used to verify identity by Centrelink and Australian tax office". The Guardian. Retrieved June 16, 2023.

References

Homayoon Beigi (2011), "Fundamentals of Speaker Recognition", Springer-Verlag, Berlin, 2011,
ISBN 978-0-387-77591-3
.

"Biometrics from the movies" –National Institute of Standards and Technology

Elisabeth Zetterholm (2003), Voice Imitation. A Phonetic Study of Perceptual Illusions and Acoustic Success, Phd thesis, Lund University.

Md Sahidullah (2015), Enhancement of Speaker Recognition Performance Using Block Level, Relative and Temporal Information of Subband Energies, PhD thesis,
Indian Institute of Technology Kharagpur
.

External links

Circumventing Voice Authentication Archived June 10, 2008, at the Wayback Machine The PLA Radio podcast recently featured a simple way to fool rudimentary voice authentication systems.

Speaker recognition – Scholarpedia

Voice recognition benefits and challenges in access control

Software

bob.bio.spear

ALIZE

v
t
e
Biometrics

Biometric passports

Biometric points

Biometrics in schools

Biometric tokenization

Biometric voter registration

Eye vein verification

Face recognition

Forensic podiatry

Gait analysis

Hand geometry

Handwritten biometric recognition

Iris recognition

Keystroke dynamics

Mouse tracking

Private biometrics

Retinal scan

Signature recognition

Speaker recognition

Soft biometrics

Vein matching

Authority control databases: National

Israel

United States

Czech Republic

Retrieved from "https://en.wikipedia.org/w/index.php?title=Speaker_recognition&oldid=1201633861"

[1] ISSN 2047-4938
.

[Experimental_phonetics-2] ISBN 978-0-8422-5149-5
.

[Familiar_voice_recognition:_Patterns_and_parameters._Part_I:_Recognition_of_backward_voices-3] ISSN 0095-4470
.

[Macmillan_Brit._def_of_voice_recognition-4] "VOICE RECOGNITION (noun) definition and synonyms". macmillandictionary.com. January 23, 2010. Archived from the original on March 27, 2023. Retrieved October 13, 2023.{{cite web}}: CS1 maint: unfit URL (link)

[Voice_recognition,_definition-5] "What is voice recognition? definition and meaning". businessdictionary.com. October 6, 2008. Archived from the original on December 3, 2011.

[mail_bag,_gazette-6] "The Mailbag LG #114". Linux Gazette. March 28, 2005.

[7] ISSN 1748-8893
.

[PCW.Siri-8] Pinola, Melanie (November 2, 2011). "Speech Recognition Through the Decades: How We Ended Up With Siri". PCWorld.

[9] Rosen, Cheryl (March 3, 1997). "Voice Recognition To Ease Travel Bookings". Business Travel News. The earliest applications of speech recognition software were dictation ... Four months ago, IBM introduced a "continual dictation product" designed to ... debuted at the National Business Travel Association trade show in 1994.

[10] "Speaker Verification: Text-Dependent vs. Text-Independent". Microsoft Research. June 19, 2017. text-dependent and text-independent speaker .. both equal error rate and detection ..

[11] ISSN 2522-8692
. task .. verification or identification

[12] Myers, Lisa (July 25, 2004). "An Exploration of Voice Biometrics". SANS Institute.

[13] ISSN 1051-2004
.

[14] "Speech recognition expert raises concerns around voice technology in the workplace". Independent.ie. September 29, 2019. Retrieved September 30, 2019.

[15] US4752958 A, Michele Cavazza, Alberto Ciaramella, "Device for speaker's verification" http://www.google.com/patents/US4752958?hl=it&cl=en

[16] Meyer, Barb (June 12, 1996). "Automated Border Crossing". Television news report. Meyer Television News.

[17] International Banking (December 27, 2013). "Voice Biometric Technology in Banking | Barclays". Wealth.barclays.com. Retrieved February 21, 2016.

[18] Matt Warman (May 8, 2013). "Say goodbye to the pin: voice recognition takes over at Barclays Wealth". Retrieved June 5, 2013.

[19] Ewen MacAskill. "Did 'Jihadi John' kill Steven Sotloff? | Media". The Guardian. Retrieved February 21, 2016.

[20] Julia Kollewe (February 19, 2016). "HSBC rolls out voice and touch ID security for bank customers | Business". The Guardian. Retrieved February 21, 2016.

[21] "How I Broke into a Bank Account with an AI-Generated Voice". February 23, 2023.

[22] Evershed, Nick; Taylor, Josh (March 16, 2023). "AI can fool voice recognition used to verify identity by Centrelink and Australian tax office". The Guardian. Retrieved June 16, 2023.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[12]

[14]

[15]

[16]

[18]

[19]

[20]

[21]

[22]

Verification versus identification

Training

Variants of speaker recognition

Technology

Legal implications

Applications

See also

Notes

References

External links

Software