+ 1

There is a module and I do not know how to solve it, you can give an example on any DataSet in Python

Task: Data preprocessing and highlighting of significant attributes For each data set (by year), it is supposed to apply cluster analysis methods to create groups of similar objects. It is necessary to determine which attributes have the greatest influence on the definition of such cluster groups, and leave only them for later training. It is also necessary to justify the choice of additional attributes and the reason for excluding any data from the original set.

18th Mar 2024, 9:51 AM
Алексей Чигинцев
6 Respuestas
+ 2
I recommend you to look at the scikit-learn project documentation. They have code examples as well. https://scikit-learn.org/stable/modules/clustering.html#overview-of-clustering-methods To solve such a task, you should probably attend a data science course first, and practice how to use these libraries. Sololearn did have a DS course, but I am not sure if it is still availble for new users. Supposedly the course content is undergoing modernization. https://www.sololearn.com/en/learn/courses/le-data-science
18th Mar 2024, 3:18 PM
Tibor Santa
Tibor Santa - avatar
+ 2
I don't really get your point. You are trying to compete against others, while you are aware that you don't have the necessary knowledge, and you are trying to save yourself from learning by asking for ready-made solutions? In my interpretation that sounds a lot like cheating. This forum is not really suited to give you a rapid "power-up" and suddenly make you a pro data scientist. If you could not digest the subject in two months, it just means you need some more time and put in a bit more effort. If you have a specific question about a piece of code, maybe someone will be able to help you, but what you are asking seems to be unreasonable.
19th Mar 2024, 9:14 PM
Tibor Santa
Tibor Santa - avatar
+ 2
Sorry if I misunderstood your intent. Rather than relying on forums and youtube, you should really base your research on understanding the established methodology and scientific principles. I think this is a really good summary about clustering, and the 3rd step is about selecting the variables used for the analysis. In truth, such a decision is not mathematically exact, more like an informed choice in the general understanding of the business domain or subject area, and the analysis of the statistical attributes and correlations between variables. https://inseaddataanalytics.github.io/INSEADAnalytics/CourseSessions/Sessions45/ClusterAnalysisReading.html
20th Mar 2024, 9:03 AM
Tibor Santa
Tibor Santa - avatar
+ 1
Tibor Santa Perhaps I explained the essence to you a little wrong. The fact is that in our country such a championship is held among professionals. Where schoolchildren/students gather from all over the country to, so to speak, brainstorm and check whether a person can prepare for such a complex task from scratch in such a time frame, the tasks in this championship are such that even an experienced person can get stuck in it . So I’m here to at least tell me how to solve this, where to start, and so on; many forums don’t explain this, and the training videos on YouTube contain some kind of nonsense Sorry, I'm just communicating through a translator, so you may not understand my point a little
20th Mar 2024, 4:13 AM
Алексей Чигинцев
+ 1
Tibor Santa Thank you very much for the advice and sources provided, it was a pleasure talking with you.
21st Mar 2024, 8:19 AM
Алексей Чигинцев
0
Tibor Santa The problem, unfortunately, is not even this, but the fact that I have a championship, for which I had to prepare in 2 months, I have come quite far, but it is almost impossible to learn such a large topic, so I’m asking for help here.
19th Mar 2024, 8:37 PM
Алексей Чигинцев