little girl 2022-08-06 06:33:04 阅读数:436
The data collected with a certain boss iscsv格式的,Haven't dealt with it beforecsv格式的数据.When I used it to write neural network training, I stepped on a lot of pits,这里记录一下,It is also convenient for later people to learn.
处理csvThere should be quite a few packages of files,这里就做一个pandas的教程了（其他的没用过hhhh）.Here I take one of my data as an example to demonstrate some common processing methods.
origin_data = pd.read_csv("origin_data.csv", na_values=" NaN")
"NaN".In this way, if you need to manually filter out missing values later, you can index to the position.之前试过,如果不设置这个参数,缺失值不是False、0、"NaN"中的任何一个.
pandas读进来的csvThe data will be encapsulated into a calldataframe的格式,This format can be converted to numpy数组.Let's see how it works firstdataframe.
data.nameto index a column by label.
delKeyword tagging removes a column
del origin_data["Weight change"]
对于缺失值,In general, interpolation can be used to complete or directly discard the data.这里以删除NaNThe row where the value is located is an example to demonstrate.
.dropna()方法,Delete by defaultNaN值的行.可以设置
.dropna(axis=1)删除有NaN值的列.Other usages can be consulted by yourself.This usage is the most common.
origin_data = origin_data.dropna()
After doing some processing on the data,The index of the data is likely to be messed up directly.比如这里：We deleted some lines,So the index is discontinuous.At this time, if we traverse the data according to the index, an error will be reported.Therefore, it is generally necessary to reset the index after the data is processed.
drop参数.drop参数为TrueIndicates that it is not necessary to drop the index column directly,Then reset the order.drop参数为FalseIndicates to reset the index,and keep the index column.
origin_data = origin_data.reset_index(drop=True)
We are doing data preprocessing,Need to convert some non-numeric values to numbers.比如性别、省市等.Here is an example of gender,我希望把M/F转化为0/1,for the neural network to process.
.loc[row, flag]Get the data that needs to be indexed,The value is then modified by conditional judgment
for i in range(len(origin_data)): origin_data.loc[i, 'Sex'] = 1 if origin_data.loc[i, 'Sex'] == "F" else 0
版权声明：本文为[little girl]所创，转载请带上原文链接，感谢。 https://pythonmana.com/2022/218/202208060519291274.html