+ 1

indexing a column with same values -> ERROR

Hi everyone! I am having a problem indexing a column, date, because it has 2 dates that are equal. how can i solve this? -> ValueError: cannot reindex from a duplicate axis PS: The purpose of turning the date column into the index it's because i want to interpolate some missing dates in that column and if i don´t have it indexed it will give me another error -> ValueError: time-weighted interpolation only works on Series or DataFrames with a DatetimeIndex

31st Mar 2021, 7:22 PM
Bruno
Bruno - avatar
6 Antworten
+ 1
Ok, now I understand. That shouldn't be too hard. I did a quick excel calculation but it should be straight forward in python/pandas as well. Steps: Define date when KMS were read out. I took the current date but it is probably another date or even different for each entry. Lets call it readingDate. Take difference between that date and service entry date for each entry. In days it is between 5254 and 8369. Put it in a new column. Lets call it serviceDays. Divide kms by serviceDays for each entry and put into new column, kmPerDay. Values between 20.6 and 27.2, mean 24.76, standard deviation 2.91. Good correlation, well usable for interpolation I'd say. Calculate estimated service entry date as readingDate - kms / meanKmPerDay. I got 2009-07-29 for the mising date. I hope it is understandable
31st Mar 2021, 11:13 PM
Benjamin Jürgens
Benjamin Jürgens - avatar
+ 1
You obviously can't use a column as index if the values aren't unique. Can you show your code or explain in more detail what you mean by interpolating missing dates? If the column has missing values that's another reason preventing the use as index
31st Mar 2021, 7:52 PM
Benjamin Jürgens
Benjamin Jürgens - avatar
+ 1
First of all thank you for your answer. so, i have a dataset with missing values and i have to complet it. in the colunm date i have on value empty and i was trying to fill that by doing an interpolation with a value from another column. For ex: service entry date KMS (...) 0 2006-11-12 142723 1 1998-05-03 221382 2 NaT 105567 3 2005-01-02 122595 4 (...) (...) As you can see, there is a missing value in 2 and i want to interpolated based on the value KMS. how can i do it?
31st Mar 2021, 9:06 PM
Bruno
Bruno - avatar
+ 1
Thank you Benjamin. I understood your suggestion and i already apply it with success. I was trying to avoid that route, and trying to do something more "automatic" by using the function df.interpolate(method='time') but i couldn't make it work. i was wasting to much time and i had to move on. By the way, just for learning purposes, do you know if i could have applied df.interpolate(method='time') in this data? this is why i was trying to index the column date in the first place (and having all the errors above mentioned)
1st Apr 2021, 1:26 PM
Bruno
Bruno - avatar
0
So what should be the value for 2 in that example? Same as 1 because of same KMS? What does KMS stand for and is there a mathematical relation between entry date and KMS?
31st Mar 2021, 9:16 PM
Benjamin Jürgens
Benjamin Jürgens - avatar
0
sorry, having the same values was a copying mistake. KMS -> kilometers. this values correspond to different cars (so each row represents a different car). the date tells when it started "working" and kms the travelled distance. i was hopping to discover an aproximated date based on the KMS value
31st Mar 2021, 9:29 PM
Bruno
Bruno - avatar