See how I use Python to create a magic with baby (one play can play for a day)?

TrueDei 2020-12-05 01:19:34
use python create magic baby

One 、 Finally found a reliable oral English teacher for the children

“ Education can't be poor again , No more pain, no child ”, As the baby's parents , It's not just hard work and material support , We should pay more attention to the children's learning situation , And I'm afraid of babies all the time “ Lost on the starting line ”, But , There are too many starting lines for children now , English 、 All kinds of artistic specialties , Even skipping rope , I'm very busy . However, parents are not all rounders , see , My sister has been worrying about her daughter's spoken English recently , My pronunciation is not accurate , I don't know which one is reliable , The child is going to lag behind his partner , Knowing this situation , I take out my English textbook , Think of yourself every time 60 My English grades are flying by , Put it back again , Took my weapon —— Code .

 Insert picture description here

In recent years, natural language processing has been applied in many fields , The cost of intelligent voice assessment has long been affordable by the public . According to the need to correct pronunciation for baby , I finally chose to call a reliable factory , There is wisdom in saying that API To develop a simple Voice evaluation program , Or call it —— Intelligent speaking teacher !

Two 、 preparation

First , It is necessary to create instances on Youdao Zhiyun's personal page 、 Create an 、 Binding applications and instances , Get the application's id And the key . Specific personal registration process and application creation process are detailed in the article Share a batch file translation development process

 Insert picture description here

3、 ... and 、 The development process is introduced in detail

The following describes the specific code development process .

First, study Official documents Given API Input and output specifications . The API use https Means of communication , Simply speaking , It is to encode and process the pre recorded sound file , Sign and submit to API, analysis API Back to json You can get the score result .

Address of the interface :

https Interface :

API Input the required parameters in the table below :

Field name type meaning Required remarks
q text The audio file to be evaluated Base64 Encoded string True Must be Base64 code
text text The text corresponding to the audio file to be evaluated True have a good day
langType text source language True Support language
appKey text application ID True Can be found in Application management see
salt text UUID True UUID
curtime text Time stamp ( second ) True TimeStamp
sign text Signature , adopt sha256( application ID+input+salt+curtime+ Application key ) Generate ;input See the notes under the table for the generation rules of True sha256( application ID+input+salt+curtime+ Application key )
signType text Signature type True v2
format text The format of the voice file ,wav true wav
rate text Sampling rate , recommend 16000 Adoption rate true 16000
channel text Track number , Only mono support , Please fill in the fixed value 1 true 1
type text Upload the type , Support only base64 Upload , Please fill in the fixed value 1 true 1

The signature sign The generation method is as follows :
signType=v2; sign=sha256( application ID+input+salt+curtime+ Application key ).
What needs to be noted here is input The calculation method of is :input=q front 10 Characters + q length + q after 10 Characters ( When q Longer than 20) or input=q character string ( When q Length less than or equal to 20).

The output parameters of the interface are as follows :

Field meaning
errorCode Identification result error code , There must be . Details can be found in Error code list
refText The text of the request
start Sentence start time in audio , The unit is seconds
end Sentence end time in audio , The unit is seconds
integrity Sentence integrity score
fluency Sentence fluency score
pronunciation Sentence accuracy score
speed The speed , word / minute
overall Sentence comprehensive score
words Word score array
-word word
-start Word start time , The unit is seconds
-end Word end time , The unit is seconds
-pronunciation Word accuracy score
-phonemes Phonetic array
–phoneme Phonetic symbols
–start Phonetic start time , The unit is seconds
–end Phonetic end time , The unit is seconds
–judge To judge whether the phoneme is wrong ,true To pronounce correctly ,false It's a mistake in pronunciation , meanwhile calibration Give hints
–calibration If the pronunciation is wrong , Prompt the user what the pronunciation looks like
–prominence The degree of stress , The higher the score , The more likely the current phonetic symbol is to be stressed , Fraction in [0 100]
–stress_ref Vowel stress reference / The standard answer , If true, The vowel should be stressed , There is no meaning in consonants
–stress_detect In a word , The user pronounces the phonetic symbol as stress

( One )Demo Development :

This demo Use python3 Development , Include,, Three files , Respectively demo The interface of 、 Recording and other logic processing and intelligent voice evaluation interface call method encapsulation .

** 1. Interface part :**

UI The part is divided into three parts , Article processing area 、 Recording area and rating display area .

 Insert picture description here

The layout code is as follows :

root.title("youdao ise test")
frm = tk.Frame(root)
frm.grid(padx='50', pady='50')
# Select the article 
btn_get_file_path=tk.Button(frm,text=' Choose the text :',command=get_file)
text1=tk.Text(frm,width='70', height='2')
# Article content display 
text2=tk.Text(frm,width='70', height='5')
# Start and stop recording 
btn_start_rec=tk.Button(frm,text=' sound recording ',command=start_rec,width=10)
lb_Status = tk.Label(frm, text='Ready', anchor='w', fg='green')
btn_stop_rec=tk.Button(frm,text=" End of the tape ",command=stop_rec)
# Scoring button and result display 
btn_score=tk.Button(frm,text=" score ",command=start_score,width=10)
text3=tk.Text(frm,width='70', height='10')

And the start button btn_score Binding events for start_score() To collect all the text files , Start synthesis , And print the running results :

def start_score():
for r in result:

** 2、**
Here mainly realizes the file processing 、 Recording and processing interface return function . So let's define a Audio_model

class Audio_model():
def __init__(self, audio_path,is_recording):
self.current_file='' # The original path of the current recording 
self.is_recording=is_recording # Recording status identification 
self.audio_chunk_size=1600 # The following are necessary parameters for recording 

record_and_save() Method to record and save to the project record In the path , The recording file name is the same as the original file name , Easy to correspond to .

 def record_and_save(self):
self.is_recording = True

get_score() Method implements the call The function of encapsulating the tool in and parsing the return value :

 def get_score(self,dict):
for path in dict:
# Processing results , Add to result set 
result.append( score_result)
return result

3、 The Chinese are and ask for wisdom API Some directly related methods , The bottom line is this connect() Method , Integrated API The required parameters , And call the method to execute the request do_request(), Then according to UI The exhibition needs of , Handle API And concatenate the string .

def connect(audio_file_path,audio_text):
audio_file_path = audio_file_path
lang_type = 'en' # Currently only English is supported 
extension = audio_file_path[audio_file_path.rindex('.')+1:]
if extension != 'wav':
print(' Unsupported audio type ')
wav_info =, 'rb')
sample_rate = wav_info.getframerate()
nchannels = wav_info.getnchannels()
with open(audio_file_path, 'rb') as file_wav:
q = base64.b64encode('utf-8')
data = {
data['text'] = audio_text
curtime = str(int(time.time()))
data['curtime'] = curtime
salt = str(uuid.uuid1())
signStr = APP_KEY + truncate(q) + salt + curtime + APP_SECRET
sign = encrypt(signStr)
data['appKey'] = APP_KEY
data['q'] = q
data['salt'] = salt
data['sign'] = sign
data['signType'] = "v2"
data['langType'] = lang_type
data['rate'] = sample_rate
data['format'] = 'wav'
data['channel'] = nchannels
data['type'] = 1
# Process return value 
response = do_request(data)
j = json.loads(str(response.content, encoding="utf-8"))
# Sentence integrity 
contextIntegrity=" Sentence integrity :"+str( round(j["integrity"], 2))+" "
pronunciation=" Pronunciation accuracy :"+str(round(j["pronunciation"],2))+" "
fluency=" fluency :"+str(round(j["fluency"],2))+" "
speed=" The speed :" +str(round(j["speed"],2))+" "
recordAndResult=recordname+" "+contextIntegrity+pronunciation+fluency+speed+"\n"
return recordAndResult

( Two ) Effect display

Show me my pure ”chinenglish“ The operation of the program after recording ( It doesn't matter how much you score , What's important is its objective evaluation :P )

Let's first introduce the operation method :

  • 1) Click on “ Choose the article ”, Select the articles to be evaluated ;

  • 2) Click on “ sound recording ”,“ End of the tape ” Button , Do voice recording ;

  • 3) If you need to evaluate more than one article , repeat 1)、2) Step by step

  • 4) Click on “ score “, Intelligent voice assessment , And show the rating results , At the same time, the scoring results will be detailed , Stored in the path of this code result Under the table of contents .

 Insert picture description here

Effect display

Interface part : It shows Sentence integrity 、 Accuracy of pronunciation 、 Fluency score , And the speed of speaking :

 Insert picture description here

The documentation section : Each voice is evaluated separately , And will return the detailed results with json There is a form of result Under the folder .
 Insert picture description here

The output shows :


’integrity‘: 100,// Sentence integrity 
'refText’: "Are you ok? ",// The text corresponding to the speech to be evaluated 
'pronunciation': 67.108101,// Sentence pronunciation accuracy 
'start': 0.030000,// Audio start time , second 
'words': [{
 // List of word information 
'pronunciation': 50.640327, // Word accuracy score 
'start': 0.73, // Word start time , second 
'end': 0.76,// Word end time , second 
'word': 'Are', // Word text 
'phonemes': [{
 // List of phonetic information 
'stress_ref': False, // Vowel stress reference ( Standard stress ), If true, The vowel should be stressed , There is no meaning in consonants 
'pronunciation': 50.640331, // Sound standard accuracy score 
'stress_detect': False,// In a word , The user does not pronounce the phonetic symbol as stress 
'phoneme': 'ɝ', // Phonetic name 
'start': 0.73, // Phonetic start time , second 
'end': 0.76, // Phonetic end time , second 
'judge': True, // Judge whether the phonetic symbol is wrong ,true To pronounce correctly ,false It's a mistake in pronunciation , meanwhile calibration Give hints 
'calibration': 'ɝ', // Judge whether the phonetic symbol is wrong ,true To pronounce correctly ,false It's a mistake in pronunciation , meanwhile calibration Give hints 
'prominence': 1 // The degree of stress , The more likely the current phonetic symbol is to be stressed , Score range [0 100]
}, {

'pronunciation': 76.810608,
'start': 0.77,
'end': 1.08,
'word': 'you',
'phonemes': [{

'stress_ref': False,
'pronunciation': 79.084282,
'stress_detect': False,
'phoneme': 'j',
'start': 0.77,
'end': 0.86,
'judge': True,
'calibration': 'j',
'prominence': 0.944885
}, {

'stress_ref': True,
'pronunciation': 74.536934,
'stress_detect': True,
'phoneme': 'u',
'start': 0.87,
'end': 1.08,
'judge': True,
'calibration': 'u',
'prominence': 1
}, {

'pronunciation': 66.129013,
'start': 1.14,
'end': 1.8,
'word': 'ok',
'phonemes': [{

'stress_ref': True,
'pronunciation': 69.046341,
'stress_detect': True,
'phoneme': 'o',
'start': 1.14,
'end': 1.27,
'judge': True,
'calibration': 'o',
'prominence': 1
}, {

'stress_ref': False,
'pronunciation': 65.357841,
'stress_detect': False,
'phoneme': 'k',
'start': 1.28,
'end': 1.42,
'judge': True,
'calibration': 'k',
'prominence': 0.838557
}, {

'stress_ref': True,
'pronunciation': 63.982838,
'stress_detect': True,
'phoneme': 'e',
'start': 1.43,
'end': 1.8,
'judge': True,
'calibration': 'e',
'prominence': 0.956448
'fluency': 83.554047, // Sentence fluency 
'overall': 83.885124,// Sentence comprehensive score 
'errorCode': '0', // Identification result error code , There must be 
'end': 1.8,// Sentence end time , second 
'speed': 55.555557 // Sentence speed ( word / minute )

Four 、 summary

Intelligent voice evaluation of Youdao Zhiyun API Documents are clear , There is no hole in the call process , The development experience is very friendly , The scoring results are objective and fair , It is of great reference value , So I want to study and improve with my little niece !

Project address :


