+ 1

Server core and thread allocation.

Hello, I am making a server for data aggregation, I have the majority of the server's functionality done, but I am unsure how many cores/threads to allocate to the server. My server downloads data from different sources into jsons; my server, ideally, should be able to scale up to downloads from at least 500 sources asynchronously all of which needs to be checked constantly, forever; however I am unsure how to split up the workload. Should I allocate a core per 100 sources? In that case I'd need to use 106 threads for one core and that feels like it's too much (I know the number weird, but I need at least 4-6 threads per core for different programs to run per section, the 100 are the threads per source). How many threads can a core work with in action? Idk, it's my first serious project; just throw an idea and I'll know how to make it dynamic, I won't use 5 cores for just 3 sources or anything like that, just wanna know how the work load should be split for that 500 sources work load 'cuz I have no idea.

12th Sep 2022, 7:49 PM
Odyel
Odyel - avatar
4 Answers
+ 2
What you do after the download, or how many threads you spawn for processing the downloaded data, should be secondary. You can run the downloader in just 4 threads and spawn many more afterwards, it really depends on bandwidth/latency. Less parallel downloading is often better. Also, without knowing what you are doing at all, have a look at the producer/consumer pattern. If you're smart about it you can get 100% cpu usage with very little code. With ready-made libraries you might not need to manually spawn threads all. Spawning threads takes time and pooling them usually makes a lot of sense, and many libraries can do it for you.
12th Sep 2022, 9:18 PM
Schindlabua
Schindlabua - avatar
+ 2
When downloading many small files it makes sense to do it in parallel, because a lot of the download is spent not actually downloading any data (rather, setting up a connection and whatnot) Keep in mind that you'll probably have a gigabit of bandwith max (idk what server you are on), so you can't just download 500 files in parallel either. Your biggest limiting factor is how much bandwidth the destination servers will give you (and maybe your hard drive). If the downloads happen at 2MB/s (==16Mbit/s) and you have a gigabit downstream then you can do probably 50-100 files in parallel. Maths it out :p You don't need to think in terms of cores either. If you're running C# code you can just do Parallel.ForEach or something and let the runtime figure it out. Even running 100 asynchronous tasks in a single javascript thread will probably be fine. You aren't trying to saturate cores, just the network. The tl;dr here is probably, just try it and see what gives you the fastest results. :)
12th Sep 2022, 9:11 PM
Schindlabua
Schindlabua - avatar
+ 1
Schindlabua I forgot I posted this here, I asked this in a programming discord server as well, wagering that there'd be a more abundance of seasoned programmers, and actually got some ideas on how to make it work with using far, FAR less than 500 dedicated threads or whatever, and much less code. You actually hit on a different point by making me think about the network bandwidth and implementing thread pooling. Sorry for being vague in my post, I want to keep anon on my project. Thanks for the help!
13th Sep 2022, 2:08 AM
Odyel
Odyel - avatar
0
Also, the downloads can be up tp 100-200mb per source, so I can't just quickly go through them, which is why I'm thinking of dedicating a thread per source.
12th Sep 2022, 7:50 PM
Odyel
Odyel - avatar