I love challanges and due to a new perl script I now do have a new.
In this case I have a script which checks about 600k lines – each a URL to a folder or a file on a webserver. The file was made via shell with „find .“ on a storage with a lot of folders (empty and not empty), different types of data (almost mp3, mp4, aac and pdf).
To the most of thes files I have a database with a path on our webserver but unfortunately not to about 1TB of data. So now I want to use multithreading to check these files if they are in the database (only to AV-Files) and on our webserver (we periodically delete files, but only if we know they exist).
For http-check I might use multicurl but that doesnt check my database, so why not multithreading? My script now works perfectly single-threaded but it takes about 5 days for a run.
Unfortunately my programming skills are verry moderaty and I haven’t done something like this yet.
I’ve found something like an introduction to this here: http://wiki.bc.net/atl-conf/pages/viewpage.action?pageId=20548191
This doesnt work for my well because the webserver I want to check only allows 1023 active connections from an IP (I dont want to modify that) and I really dont want to start 600k (600.000) Threads at the same time.
This took me to this site (who had the same thoughts as I do):
So guys, wish me luck that I get this into my head and code working.
If you wish to get the code for your duty feel free to ask. I dont bite 😉