0
How do I deal with non numeric column in Python?
I'm new to Python and machine learning. I'm trying to make use of data that i have. so far all of my columns are numerical so there's no problem handling them but then I have a column with non numeric data. it's a list of layered structure so each row in that column may contain different number of layer with different material. does anyone have any idea on how to deal with this? should i give them each a unique number?
5 odpowiedzi
+ 6
Then it might be wise to treat them separately. You can split all entries into separate columns - pvk, ito, al, etc. and mark each of them with True/False - depending on whether a particular materials is present or not.
Or, instead of making so many columns - put all the materials in order - in a list for example and increment the numeric value in the 'layers' column by 2**i, where i marks an index of the list element, representing that material - like this:
mat = ['pvk', 'ito', 'al', 'met']
if you spot an entry of 'pvk/met', the number representing this combination will be: 2**i.index('pvk') + 2**i.index('met') = 2**0 + 2**3 = 1 + 8 = 9
No other combination returns 9, so you are safe :)
And this way you only store integers in a single column, instead of tons of redundant data.
+ 2
Is this a descriptive kind of data or any that can be turned into a numeric value?
+ 2
hmm... I'd assign a bitnum to every material possible and store those entries as decimal representations, to save some space.
Unless, of course, the layers are not unique...
First try to determine if this field's composition may affect the data in each record - that is, for example, if the number and combination of layers affects the material strength or density or similar parameter?
0
its not a description. its a structure of materials layered. here's an example: ito/tapc/pvk/Aluminum. all ito, tapc,pvk and Al are material names. but the structure can have different number of layer from 4 to 13.
0
yes. the number and arrangement do have some effect on the performance. btw can you explain more about bitnum you mentioned earlier?