Message boards :
Number crunching :
Much longer Runtime?!
Message board moderation
Author | Message |
---|---|
Send message Joined: 17 Feb 21 Posts: 11 Credit: 108,090 RAC: 0 |
Name PERF_TESTS_0D_T1615356901_2595467_2_30_0 Arbeitspaket 81274067 Erstellt 10 Mar 2021, 6:15:19 UTC Gesendet 10 Mar 2021, 9:20:09 UTC Ablaufdatum 13 Mar 2021, 9:20:09 UTC Empfangen 10 Mar 2021, 10:00:43 UTC Serverstatus Abgeschlossen Resultat Erfolgreich Clientstatus Fertig Endstatus 0 (0x00000000) Computer ID 41151 Laufzeit 39 min. 57 sek. CPU Zeit Prüfungsstatus Gültig Punkte 2.00 max. FLOPS des Gerätes 5.05 GFLOPS Anwendungsversion iThena PERF v1.03 windows_intelx86 Peak working set size 10.41 MB Peak swap size 6.25 MB Peak disk usage 10.87 MB |
Send message Joined: 24 Sep 19 Posts: 108 Credit: 931,910 RAC: 1,082 |
I have a CNODE linux work unit that has been running for over 10 1/2 hours, it is at 22.5% and still has over 36 hours to go. If it ends I hope to get a lot more than 2 credits for it. A few hundred would be more what it is worth. My PERF work units have gone from 136 seconds to 2,400 seconds for the same 2.0 credits. That needs to change. I decided to abort the work unit and hope the next one works better. Conan |
Send message Joined: 25 Aug 19 Posts: 409 Credit: 1,240,185 RAC: 0 |
Most likely the topic is in correlation with the incident: Twitter: https://twitter.com/OVHcloud/status/1369609720005267460/ Please be patient... |
Send message Joined: 4 Apr 20 Posts: 25 Credit: 746,554 RAC: 444 |
I can understand how the fire would affect the CNode processes, but i don't understand how it affects Perf run times. I, too, am experiencing a factor of 10 increase in run times for Perf WUs. |
Send message Joined: 12 Nov 19 Posts: 34 Credit: 114,631,636 RAC: 63,365 |
I agree with the other commenters. Between 1 AM and 2 AM UMT on March 10, 2021, run times on PERF tasks jumped from around 133 seconds to around 2400 seconds. But still only giving the same 2 credits per task. This was an instant change (not a gradual increase) in run times, and has been consistently high on all tasks since then, and it is happening on all of my computers that I checked. Also, my run times for CNODE tasks on Linux machines have NOT changed. So maybe there was a mistake in the outage/fire announcement and it actually affected the PERF server and not the CNODE server? |
Send message Joined: 25 Aug 19 Posts: 409 Credit: 1,240,185 RAC: 0 |
Yes. After analysis, we found that the SBG incident also negatively impacted the iThena PERF app. The issues in iThena CNODE are also there but they are not necessarily very visible to the user. |
Send message Joined: 24 Sep 19 Posts: 108 Credit: 931,910 RAC: 1,082 |
Yes. Well they are visible to this user, I aborted a previous CNODE work unit as stated below after 10 1/2 hours, the replacement work unit is currently running over 9 hours and is at 100% but still running with no end in sight. The PERF work units now take over 1 hour to run, on Windows or Linux. I have not returned any valid CNODE work for a day now. Conan |
Send message Joined: 22 May 20 Posts: 11 Credit: 1,701,186 RAC: 264 |
I agree with the other commenters. Between 1 AM and 2 AM UMT on March 10, 2021, run times on PERF tasks jumped from around 133 seconds to around 2400 seconds. But still only giving the same 2 credits per task. So the 2 sub-projects are now in the same ballgame. Still cheap credits ! |
Send message Joined: 24 Sep 19 Posts: 108 Credit: 931,910 RAC: 1,082 |
Yes. My CNODE work unit has now passed 12 hours run time and my PERF work unit on the same Linux machine has passed 3 hours. Both are at 100% and have been for quite a while now. I have my doubts about them finishing. The Windows PERF work units do finish with much shorter run times than Linux. Should I abort these work units? Conan |
Send message Joined: 24 Sep 19 Posts: 108 Credit: 931,910 RAC: 1,082 |
Yes. The PERF finished about 3 hours 15 minutes and then downloaded another one (all wu's that I have had since all run over 3 hours but do finish), this is on Linux. The CNODE is not going to finish and I am aborting this one as well after nearly 16 1/2 hours run time. I will see what the next does. Conan |
Send message Joined: 25 Aug 19 Posts: 409 Credit: 1,240,185 RAC: 0 |
We will try to eliminate the CNODE problem as soon as possible. According to OVH predictions, SBG3 will not be back in operation until March 19 (next Friday?). However, I hope to make the required fixes sooner. Quick local tests will probably be required. We completely did not anticipate such a big problem with SBG.... It is a good learning experience for all... (another such example historically)... |
Send message Joined: 24 Sep 19 Posts: 108 Credit: 931,910 RAC: 1,082 |
Yes. After more than 18 hours I will also abort another CNODE, have not returned a valid one in 2 days now. The PERFS are still taking 3 hours but finish and get validated for just 2 credits. Again I will see how the next one runs. Conan |
Send message Joined: 25 Aug 19 Posts: 409 Credit: 1,240,185 RAC: 0 |
I have already done the first local tests. This weekend I will try to solve the problem (app iThena CNODE). |
Send message Joined: 13 Nov 19 Posts: 4 Credit: 1,390,900 RAC: 1,917 |
I have three workunits on my Linux machines which have been running for nearly three days now, and which I suspect will finish when they can make contact to the datacenter again. To see what will happen I will just leave them running. Rysiu, all the best for your efforts in holding the project together during the recent problems! - - - - - - - - - - Greetings, Jens |
Send message Joined: 25 Aug 19 Posts: 409 Credit: 1,240,185 RAC: 0 |
Today, I will publish a new version of iThena CNODE v1.18 here for local testing. Version v1.18 of iThena CNODE introduces a fix for the major incident in SBG datacenters area fire and such consequences in the future. I'll post the links for testing on the forum ;) |
Send message Joined: 25 Aug 19 Posts: 409 Credit: 1,240,185 RAC: 0 |
I am attaching a new version of iThena CNODE for local testing: v1.18b1: Arch x86_64-pc-linux-gnu: http://cybercomplex.net/bins/ithena_cnode_v1.18b1/ithena_cnode_v1.18b1_x86_64-pc-linux-gnu.zip Arch i686-pc-linux-gnu: http://cybercomplex.net/bins/ithena_cnode_v1.18b1/ithena_cnode_v1.18b1_i686-pc-linux-gnu.zip You can extract the ZIP file in some directory. You can run: on x86_64-pc-linux-gnu: time ./ithena_cnode_v1.18_x86_64-pc-linux-gnu or on i686-pc-linux-gnu: time ./ithena_cnode_v1.18_i686-pc-linux-gnu The program generates the following files: stderr.txt ithena_cnode_out0 ithena_cnode_out1 The above result files can be emailed to me: lswierczewski at cybercomplex dot net. I am waiting for info ;) |
Send message Joined: 22 May 20 Posts: 11 Credit: 1,701,186 RAC: 264 |
My CNODE tasks run for about 24 hours with a couple of minutes CPU. Is this normal ? |
Send message Joined: 24 Sep 19 Posts: 108 Credit: 931,910 RAC: 1,082 |
This problem is back. My PERF tasks are taking from 40 minutes to over an Hour to run, not 3 to 4 minutes as normal. My CNODE takes are running for nearly 2 hours so far and none have finished and reported yet. Who knows how long an OONI work unit will run seeing as 6 hours is normal. Conan |
Send message Joined: 24 Sep 19 Posts: 108 Credit: 931,910 RAC: 1,082 |
This problem is back. Back to normal again now, some bad parameters perhaps? Conan |
Send message Joined: 24 Sep 19 Posts: 108 Credit: 931,910 RAC: 1,082 |
It's back again, all my CNODE work units are taking up to 2 hours or more to run, not 30 minutes. OONI Probe does not seem to be affected, however an occasional PERF does run long as well but not all. Been happening last few days. Conan |
© 2019-2024 iThena. All rights reserved. | Private Policy