+ 1
indexing a column with same values -> ERROR
Hi everyone! I am having a problem indexing a column, date, because it has 2 dates that are equal. how can i solve this? -> ValueError: cannot reindex from a duplicate axis PS: The purpose of turning the date column into the index it's because i want to interpolate some missing dates in that column and if i don´t have it indexed it will give me another error -> ValueError: time-weighted interpolation only works on Series or DataFrames with a DatetimeIndex
6 ответов
+ 1
Ok, now I understand. That shouldn't be too hard. I did a quick excel calculation but it should be straight forward in python/pandas as well. Steps:
Define date when KMS were read out. I took the current date but it is probably another date or even different for each entry. Lets call it readingDate.
Take difference between that date and service entry date for each entry. In days it is between 5254 and 8369. Put it in a new column. Lets call it serviceDays.
Divide kms by serviceDays for each entry and put into new column, kmPerDay. Values between 20.6 and 27.2, mean 24.76, standard deviation 2.91. Good correlation, well usable for interpolation I'd say.
Calculate estimated service entry date as readingDate - kms / meanKmPerDay. I got 2009-07-29 for the mising date.
I hope it is understandable
+ 1
You obviously can't use a column as index if the values aren't unique.
Can you show your code or explain in more detail what you mean by interpolating missing dates?
If the column has missing values that's another reason preventing the use as index
+ 1
First of all thank you for your answer.
so, i have a dataset with missing values and i have to complet it. in the colunm date i have on value empty and i was trying to fill that by doing an interpolation with a value from another column. For ex:
service entry date KMS (...)
0 2006-11-12 142723
1 1998-05-03 221382
2 NaT 105567
3 2005-01-02 122595
4 (...) (...)
As you can see, there is a missing value in 2 and i want to interpolated based on the value KMS. how can i do it?
+ 1
Thank you Benjamin. I understood your suggestion and i already apply it with success.
I was trying to avoid that route, and trying to do something more "automatic" by using the function df.interpolate(method='time') but i couldn't make it work. i was wasting to much time and i had to move on.
By the way, just for learning purposes, do you know if i could have applied df.interpolate(method='time') in this data? this is why i was trying to index the column date in the first place (and having all the errors above mentioned)
0
So what should be the value for 2 in that example? Same as 1 because of same KMS? What does KMS stand for and is there a mathematical relation between entry date and KMS?
0
sorry, having the same values was a copying mistake. KMS -> kilometers.
this values correspond to different cars (so each row represents a different car). the date tells when it started "working" and kms the travelled distance. i was hopping to discover an aproximated date based on the KMS value