После выполнения этого кода:
col_names = ['Project', 'OrderDate', 'orderid', 'ClientID','IsRepeat','IsBlocked','IsManual','AutoDecision','ManualApprove','IsLoan','ShortTermAmount','ShortTermPeriod','LongTermAmount','LongTermPeriod','RequestedAmount','RequestedPeriod','LoanSum','Period','ShortTermScore','LongTermScore']
#dtypes={"Project": bool, "OrderDate": 'str', "orderid": int, "ClientID" : int,"IsRepeat" :bool,"IsBlocked":bool,"IsManual":bool,"AutoDecision":bool,"ManualApprove":bool,"IsLoan":bool,"ShortTermAmount":"Int64","ShortTermPeriod":"Int64","LongTermAmount":"Int64","LongTermPeriod":"Int64","RequestedAmount":"Int64","RequestedPeriod":"Int64","LoanSum":"Int64","Period":"Int64","ShortTermScore":"float64","LongTermScore":"float64"}
test = pd.read_csv("/home/man/Test_task.csv",sep=',', thousands =',', header = 0, decimal='.', names=col_names, usecols=col_names).apply(pd.to_numeric, errors='coerce')#.astype('Float64')#,'Period': lambda x: np.float64(x) if x != '-' else np.nan})#'orderid': lambda x: int(x.replace(',','')) if x != '-' else np.nan, .fillna(0)
df = pd.DataFrame(data=test)
df['ShortTermAmount'] = pd.read_csv("/home/man/Test_task.csv",sep=',', thousands =',', dtype = {'ShortTermAmount':"str"}, converters = {'ShortTermAmount': lambda x: x if x != '-' else np.nan}).fillna(0).replace(',', '').apply(pd.to_numeric, errors='coerce')
df.head()
Я получил такую таблицу:
Project OrderDate orderid ClientID IsRepeat IsBlocked IsManual AutoDecision ManualApprove IsLoan ShortTermAmount ShortTermPeriod LongTermAmount LongTermPeriod RequestedAmount RequestedPeriod LoanSum Period ShortTermScore LongTermScore
0 1.0 NaN 1794004 1040307 NaN 1.0 NaN NaN NaN NaN 1.0 NaN NaN NaN 6700 12 NaN NaN 0.085 0.085
1 1.0 NaN 1794005 1305335 NaN 1.0 NaN NaN NaN NaN 1.0 NaN NaN NaN 1900 26 NaN NaN 0.017 0.017
2 1.0 NaN 1794021 1174614 1.0 NaN NaN 1.0 NaN 1.0 1.0 40.0 NaN NaN 4500 20 NaN 20.0 0.926 0.926
3 1.0 NaN 1794032 1356306 1.0 NaN NaN 1.0 NaN 1.0 1.0 40.0 NaN NaN 6000 40 NaN 40.0 0.679 0.679
4 1.0 NaN 1794057 1120819 1.0 NaN NaN 1.0 NaN 1.0 1.0 40.0 NaN NaN 70000 168 NaN 40.0 0.737
При том, что в моём изначальном csv файле (
https://drive.google.com/file/d/1Oseh4KnE98tC3-jRy...) в некоторых колонках стоят другие значения. Так, в колонке ShortTermAmount, стоят целые числа, у которых тысячный разряд выделен запятой. При этом, числа из колонки RequestedAmount грузятся нормально, без запятой, и им потом присвайвается тип int64, а ShortTermAmount присваивается тип float64.
Как это исправить?