author | Li Qiujian
Produce |AI Technology base (ID:rgznai100)
introduction
EEG is a kind of EEG activity wave with spatiotemporal characteristics formed by the synchronous discharge of local neurons in the brain . German doctor Hans · Berg (Hans Berger) stay 1924 EEG was first recorded on human skull in (electroencephalography,EEG). Psychological research shows that , Human cognition and perception can be expressed by brain waves . When the brain's sense of smell 、 auditory 、 Vision 、 When the taste and tactile nerves are stimulated , The stimulus response signal can be expressed by brain waves , So as to reveal the psychological correlation between senses and personnel . A large number of studies have shown the feasibility of using EEG signals to continuously determine personal comfort , And more objective data can be obtained . Recent studies have shown that tactile stimulation is related to brain waves θ,α,β These three frequency bands are related .
Traditional emotion recognition is mainly based on facial features 、 Research on body movements and speech , These external features are easy to disguise , It doesn't reflect real emotions , EEG signals can reflect the neuroelectrophysiological activities of the brain when processing emotions , It can make up for the defects of traditional research methods .
Therefore, the project uses python Language construction KNN Machine learning algorithm to achieve EEG Emotion analysis and classification of EEG data . The final effect is shown in the figure below :
No.1 “ Basic introduction ”
Environmental requirements
This environment uses python3.6.5+windows platform . The main libraries used are :
csv modular .CSV The library is used here to read CSV Dataset file . among CSV The file is a comma separated value file , Is a common text format , To store table data , Including numbers or characters . Many programs will encounter when processing data csv A file in this format , It is widely used .
scipy modular .scipy As an advanced scientific computing library : and numpy Very close contact ,scipy It's usually manipulation numpy Array for scientific calculation 、 Statistical analysis , So it can be said that it is based on numpy On top of the .scipy There are many sub modules that can deal with different applications , For example, interpolation , Optimization algorithm and so on .scipy It is in numpy More powerful based on , Scientific computing packages with a wider range of applications . That's why ,scipy Need to rely on numpy Support for installation and operation . With Python On the basis of scipy Another benefit of , It also provides a powerful programming language , It can be used to develop complex programs and special applications . Use scipy Our scientific applications benefit from additional modules developed by developers around the world in many niche areas of the software field .
pathlib modular . This module provides some classes that use semantic expressions to represent file system paths , These classes are suitable for a variety of operating systems .
Pickle modular .python Of pickle The module implements the basic data sequence and deserialization . adopt pickle Module serialization operation we can save the object information running in the program to a file , Permanent storage ; adopt pickle Deserialization of modules , We were able to create objects from the file that were last saved by the program .
And other common modules , Such as OpenCV Can't wait to introduce them one by one .
KNN Algorithm is introduced
among k Nearest neighbor algorithm (k Nearest Neighbor,kNN) It is a mature intelligent algorithm in theory , By the first Cover and Hart On 1968 in .kNN The algorithm is simple and intuitive : For a given test sample , Based on a certain distance measure, find the closest k Training samples , And based on that k individual “ neighbor ” To make predictions .
kNN Method can realize classification task and regression task . In the classification task , In general use “ laws and regulations governing balloting ”, That is to say k individual “ a near neighbor ” The most species markers in the sample are used as the prediction results . In the return mission , In general use “ Average method ”, About this k The average of the real value output markers of samples is used as the prediction result . After defining the similarity evaluation criteria , Weighted voting and weighted average can also be carried out based on the size of similarity . The greater the similarity, the greater the weight of the sample . among KNN The flow chart of algorithm classification is as follows :
No.2 “ Model structures, ”
Data set preparation
First, we use the official EEG Data sets , place data Under the folder :
Data feature extraction
First, we use the official EEG Data sets , place data Under the folder :
By using pickle Realize to dat Reading of data file , Get the feature vector in each data file , And in each channel fft.
Where input : Dimension is N × M Channel data ,N Is the number of channels ,M The number of EEG data for each channel .
Output : Dimension for N x M Of FFT result .N Indicates the number of channels ,M Represents the of each channel FFT Number of data .
The key codes are as follows :
for file in os.listdir("../data"): fname = Path("../data/"+file) x = cPickle.load(open(fname, 'rb'), encoding="bytes") for i in range(40): num+=1 eeg_realtime = x[b'data'][i] label = x[b'labels'][i] if label[0] >6: val_v = 3 elif label[0] < 4: val_v = 1 else: val_v = 2 if label[1] > 6: val_a = 3 elif label[1] < 4: val_a = 1 else: val_a = 2 if i < 39: va_label.write(str(val_v) + ",") ar_label.write(str(val_a) + ",") if num==1280: va_label.write(str(val_v) ) ar_label.write(str(val_v) ) eeg_raw = np.reshape(eeg_realtime, (40, 8064)) eeg_raw = eeg_raw[:32, :] eeg_feature_arr = self.get_feature(eeg_raw) for f in range(160): if f == 159: fout_data.write(str(eeg_feature_arr[f])) else: fout_data.write(str(eeg_feature_arr[f]) + ",") fout_data.write("\n") print(file + " Video watched")
Then calculate fft Get the frequencies of all channels . Input : Dimension is N × M Channel data ,N Is the number of channels ,M The number of EEG data for each channel . Output : Frequency band of each channel :Delta, Theta, Alpha, Beta and Gamma.
# Length data channel L = len(all_channel_data[0]) # Sampling frequency Fs = 128 # Get fft data data_fft = self.do_fft(all_channel_data) # Compute frequencymotio frequency = map(lambda x: abs(x // L), data_fft) frequency = map(lambda x: x[: L // 2 + 1] * 2, frequency) f1, f2, f3, f4, f5 = itertools.tee(frequency, 5) # List frequency delta = np.array(list(map(lambda x: x[L * 1 // Fs - 1: L * 4 // Fs], f1))) theta = np.array(list(map(lambda x: x[L * 4 // Fs - 1: L * 8 // Fs], f2))) alpha = np.array(list(map(lambda x: x[L * 5 // Fs - 1: L * 13 // Fs], f3))) beta = np.array(list(map(lambda x: x[L * 13 // Fs - 1: L * 30 // Fs], f4))) gamma = np.array(list(map(lambda x: x[L * 30 // Fs - 1:L * 50 // Fs], f5)))
Model to predict
From the characteristics arousal Values and valence value . among Valence-Arousal They represent the positive and negative degree of emotion and the degree of excitement , Based on these two dimensions, we can form an emotional plane space , Any emotional state can be through specific Valence-Arousal The number , Mapping to VA A specific point in plane space .
Input : Characteristics from all frequency bands and channels ( Standard deviation and mean ), Dimension for 1 × M( Number of features ).
Output : By each arousal and valence Produced 1 to 3 Level emotion .1 Means low ,2 It means neutral ,3 High .
self.train_arousal = self.train_arousal[40*(index-1):40*index] self.train_valence = self.train_valence[40*(index-1):40*index] self.class_arousal = np.array([self.class_arousal[0][40*(index-1):40*index]]) self.class_valence = np.array([self.class_valence [0][40*(index-1):40*index]]) distance_ar = list(map(lambda x: ss.distance.canberra(x, feature), self.train_arousal)) # Compute canberra with valence training data distance_va = list(map(lambda x: ss.distance.canberra(x, feature), self.train_valence)) # Compute 3 nearest index and distance value from arousal idx_nearest_ar = np.array(np.argsort(distance_ar)[:3]) val_nearest_ar = np.array(np.sort(distance_ar)[:3]) # Compute 3 nearest index and distance value from arousal idx_nearest_va = np.array(np.argsort(distance_va)[:3]) val_nearest_va = np.array(np.sort(distance_va)[:3]) # Compute comparation from first nearest and second nearest distance.、If comparation less or equal than 0.7, then take class from the first nearest distance. Else take frequently class. # Arousal comp_ar = val_nearest_ar[0] / val_nearest_ar[1]
Then get the emotion class from the feature .
Input : Characteristics from all frequency bands and channels ( standard deviation ), Size is 1 × M( Number of features ).
Output : according to plex Model , The mood is 1 To 5 Between categories .
class_ar, class_va = self.predict_emotion(feature,fname) print(class_ar, class_va) if class_ar == 2.0 or class_va == 2.0: emotion_class = 5 if class_ar == 3.0 and class_va == 1.0: emotion_class = 1 elif class_ar == 3.0 and class_va == 3.0: emotion_class = 2 elif class_ar == 1.0 and class_va == 3.0: emotion_class = 3 elif class_ar == 1.0 and class_va == 1.0: emotion_class = 4
The prediction results are shown in the figure below :
Complete code
link :
https://pan.baidu.com/s/1BX58sJv037eIJx9A8jWt3g
Extraction code :rwpe
Author's brief introduction
Li Qiujian ,CSDN Blogger ,CSDN Author of talent course . Master's degree in China University of mining and Technology , Development has taptap Competition awards, etc .
This article is from WeChat official account. - AI Technology base (rgznai100) , author : Li Qiujian
The source and reprint of the original text are detailed in the text , If there is any infringement , Please contact the [email protected] Delete .
Original publication time : 2021-08-05
Participation of this paper Tencent cloud media sharing plan , You are welcome to join us , share .