Much longer Runtime?!

Message boards : Number crunching : Much longer Runtime?!
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Supracore

Send message
Joined: 17 Feb 21
Posts: 11
Credit: 108,090
RAC: 0
Germany
Message 1069 - Posted: 10 Mar 2021, 10:04:42 UTC

Name PERF_TESTS_0D_T1615356901_2595467_2_30_0
Arbeitspaket 81274067
Erstellt 10 Mar 2021, 6:15:19 UTC
Gesendet 10 Mar 2021, 9:20:09 UTC
Ablaufdatum 13 Mar 2021, 9:20:09 UTC
Empfangen 10 Mar 2021, 10:00:43 UTC
Serverstatus Abgeschlossen
Resultat Erfolgreich
Clientstatus Fertig
Endstatus 0 (0x00000000)
Computer ID 41151
Laufzeit 39 min. 57 sek.
CPU Zeit
Prüfungsstatus Gültig
Punkte 2.00
max. FLOPS des Gerätes 5.05 GFLOPS
Anwendungsversion iThena PERF v1.03
windows_intelx86
Peak working set size 10.41 MB
Peak swap size 6.25 MB
Peak disk usage 10.87 MB
ID: 1069 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 24 Sep 19
Posts: 106
Credit: 827,542
RAC: 422
Australia
Message 1070 - Posted: 10 Mar 2021, 11:47:32 UTC
Last modified: 10 Mar 2021, 11:58:31 UTC

I have a CNODE linux work unit that has been running for over 10 1/2 hours, it is at 22.5% and still has over 36 hours to go.
If it ends I hope to get a lot more than 2 credits for it. A few hundred would be more what it is worth.

My PERF work units have gone from 136 seconds to 2,400 seconds for the same 2.0 credits. That needs to change.

I decided to abort the work unit and hope the next one works better.

Conan
ID: 1070 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Rysiu
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 25 Aug 19
Posts: 409
Credit: 1,240,185
RAC: 0
Poland
Message 1071 - Posted: 10 Mar 2021, 12:39:46 UTC

Most likely the topic is in correlation with the incident:

Twitter: https://twitter.com/OVHcloud/status/1369609720005267460/

Please be patient...
ID: 1071 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Steve Dodd

Send message
Joined: 4 Apr 20
Posts: 25
Credit: 607,418
RAC: 501
United States
Message 1073 - Posted: 10 Mar 2021, 18:19:53 UTC - in response to Message 1071.  

I can understand how the fire would affect the CNode processes, but i don't understand how it affects Perf run times. I, too, am experiencing a factor of 10 increase in run times for Perf WUs.
ID: 1073 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Fardringle

Send message
Joined: 12 Nov 19
Posts: 34
Credit: 110,739,392
RAC: 21,155
United States
Message 1074 - Posted: 10 Mar 2021, 19:10:52 UTC

I agree with the other commenters. Between 1 AM and 2 AM UMT on March 10, 2021, run times on PERF tasks jumped from around 133 seconds to around 2400 seconds. But still only giving the same 2 credits per task.

This was an instant change (not a gradual increase) in run times, and has been consistently high on all tasks since then, and it is happening on all of my computers that I checked.

Also, my run times for CNODE tasks on Linux machines have NOT changed. So maybe there was a mistake in the outage/fire announcement and it actually affected the PERF server and not the CNODE server?
ID: 1074 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Rysiu
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 25 Aug 19
Posts: 409
Credit: 1,240,185
RAC: 0
Poland
Message 1076 - Posted: 10 Mar 2021, 19:49:34 UTC
Last modified: 10 Mar 2021, 20:01:10 UTC

Yes.
After analysis, we found that the SBG incident also negatively impacted the iThena PERF app.

The issues in iThena CNODE are also there but they are not necessarily very visible to the user.
ID: 1076 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 24 Sep 19
Posts: 106
Credit: 827,542
RAC: 422
Australia
Message 1077 - Posted: 10 Mar 2021, 21:12:22 UTC - in response to Message 1076.  
Last modified: 10 Mar 2021, 21:13:47 UTC

Yes.
After analysis, we found that the SBG incident also negatively impacted the iThena PERF app.

The issues in iThena CNODE are also there but they are not necessarily very visible to the user.


Well they are visible to this user, I aborted a previous CNODE work unit as stated below after 10 1/2 hours, the replacement work unit is currently running over 9 hours and is at 100% but still running with no end in sight.

The PERF work units now take over 1 hour to run, on Windows or Linux.

I have not returned any valid CNODE work for a day now.

Conan
ID: 1077 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile JohnMD

Send message
Joined: 22 May 20
Posts: 11
Credit: 1,656,620
RAC: 209
Denmark
Message 1078 - Posted: 10 Mar 2021, 21:12:44 UTC - in response to Message 1074.  

I agree with the other commenters. Between 1 AM and 2 AM UMT on March 10, 2021, run times on PERF tasks jumped from around 133 seconds to around 2400 seconds. But still only giving the same 2 credits per task.

This was an instant change (not a gradual increase) in run times, and has been consistently high on all tasks since then, and it is happening on all of my computers that I checked.

Also, my run times for CNODE tasks on Linux machines have NOT changed. So maybe there was a mistake in the outage/fire announcement and it actually affected the PERF server and not the CNODE server?

So the 2 sub-projects are now in the same ballgame. Still cheap credits !
ID: 1078 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 24 Sep 19
Posts: 106
Credit: 827,542
RAC: 422
Australia
Message 1079 - Posted: 11 Mar 2021, 0:09:39 UTC - in response to Message 1077.  

Yes.
After analysis, we found that the SBG incident also negatively impacted the iThena PERF app.

The issues in iThena CNODE are also there but they are not necessarily very visible to the user.


Well they are visible to this user, I aborted a previous CNODE work unit as stated below after 10 1/2 hours, the replacement work unit is currently running over 9 hours and is at 100% but still running with no end in sight.

The PERF work units now take over 1 hour to run, on Windows or Linux.

I have not returned any valid CNODE work for a day now.

Conan


My CNODE work unit has now passed 12 hours run time and my PERF work unit on the same Linux machine has passed 3 hours. Both are at 100% and have been for quite a while now.
I have my doubts about them finishing.

The Windows PERF work units do finish with much shorter run times than Linux.

Should I abort these work units?

Conan
ID: 1079 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 24 Sep 19
Posts: 106
Credit: 827,542
RAC: 422
Australia
Message 1080 - Posted: 11 Mar 2021, 4:23:01 UTC - in response to Message 1079.  

Yes.
After analysis, we found that the SBG incident also negatively impacted the iThena PERF app.

The issues in iThena CNODE are also there but they are not necessarily very visible to the user.


Well they are visible to this user, I aborted a previous CNODE work unit as stated below after 10 1/2 hours, the replacement work unit is currently running over 9 hours and is at 100% but still running with no end in sight.

The PERF work units now take over 1 hour to run, on Windows or Linux.

I have not returned any valid CNODE work for a day now.

Conan


My CNODE work unit has now passed 12 hours run time and my PERF work unit on the same Linux machine has passed 3 hours. Both are at 100% and have been for quite a while now.
I have my doubts about them finishing.

The Windows PERF work units do finish with much shorter run times than Linux.

Should I abort these work units?

Conan


The PERF finished about 3 hours 15 minutes and then downloaded another one (all wu's that I have had since all run over 3 hours but do finish), this is on Linux.

The CNODE is not going to finish and I am aborting this one as well after nearly 16 1/2 hours run time.

I will see what the next does.

Conan
ID: 1080 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Rysiu
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 25 Aug 19
Posts: 409
Credit: 1,240,185
RAC: 0
Poland
Message 1081 - Posted: 11 Mar 2021, 6:09:24 UTC
Last modified: 11 Mar 2021, 6:11:04 UTC

We will try to eliminate the CNODE problem as soon as possible.

According to OVH predictions, SBG3 will not be back in operation until March 19 (next Friday?).

However, I hope to make the required fixes sooner.

Quick local tests will probably be required.

We completely did not anticipate such a big problem with SBG....
It is a good learning experience for all... (another such example historically)...
ID: 1081 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 24 Sep 19
Posts: 106
Credit: 827,542
RAC: 422
Australia
Message 1082 - Posted: 11 Mar 2021, 22:33:05 UTC - in response to Message 1080.  

Yes.
After analysis, we found that the SBG incident also negatively impacted the iThena PERF app.

The issues in iThena CNODE are also there but they are not necessarily very visible to the user.


Well they are visible to this user, I aborted a previous CNODE work unit as stated below after 10 1/2 hours, the replacement work unit is currently running over 9 hours and is at 100% but still running with no end in sight.

The PERF work units now take over 1 hour to run, on Windows or Linux.

I have not returned any valid CNODE work for a day now.

Conan


My CNODE work unit has now passed 12 hours run time and my PERF work unit on the same Linux machine has passed 3 hours. Both are at 100% and have been for quite a while now.
I have my doubts about them finishing.

The Windows PERF work units do finish with much shorter run times than Linux.

Should I abort these work units?

Conan


The PERF finished about 3 hours 15 minutes and then downloaded another one (all wu's that I have had since all run over 3 hours but do finish), this is on Linux.

The CNODE is not going to finish and I am aborting this one as well after nearly 16 1/2 hours run time.

I will see what the next does.

Conan


After more than 18 hours I will also abort another CNODE, have not returned a valid one in 2 days now.
The PERFS are still taking 3 hours but finish and get validated for just 2 credits.

Again I will see how the next one runs.

Conan
ID: 1082 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Rysiu
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 25 Aug 19
Posts: 409
Credit: 1,240,185
RAC: 0
Poland
Message 1083 - Posted: 12 Mar 2021, 6:02:26 UTC

I have already done the first local tests.
This weekend I will try to solve the problem (app iThena CNODE).
ID: 1083 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gemini8

Send message
Joined: 13 Nov 19
Posts: 4
Credit: 1,243,086
RAC: 897
Germany
Message 1084 - Posted: 12 Mar 2021, 22:47:13 UTC

I have three workunits on my Linux machines which have been running for nearly three days now, and which I suspect will finish when they can make contact to the datacenter again.
To see what will happen I will just leave them running.

Rysiu, all the best for your efforts in holding the project together during the recent problems!
- - - - - - - - - -
Greetings, Jens
ID: 1084 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Rysiu
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 25 Aug 19
Posts: 409
Credit: 1,240,185
RAC: 0
Poland
Message 1085 - Posted: 13 Mar 2021, 8:59:56 UTC

Today, I will publish a new version of iThena CNODE v1.18 here for local testing.

Version v1.18 of iThena CNODE introduces a fix for the major incident in SBG datacenters area fire and such consequences in the future.

I'll post the links for testing on the forum ;)
ID: 1085 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Rysiu
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 25 Aug 19
Posts: 409
Credit: 1,240,185
RAC: 0
Poland
Message 1086 - Posted: 14 Mar 2021, 7:29:19 UTC

I am attaching a new version of iThena CNODE for local testing:

v1.18b1:

Arch x86_64-pc-linux-gnu:
http://cybercomplex.net/bins/ithena_cnode_v1.18b1/ithena_cnode_v1.18b1_x86_64-pc-linux-gnu.zip

Arch i686-pc-linux-gnu:
http://cybercomplex.net/bins/ithena_cnode_v1.18b1/ithena_cnode_v1.18b1_i686-pc-linux-gnu.zip

You can extract the ZIP file in some directory.

You can run:

on x86_64-pc-linux-gnu:

time ./ithena_cnode_v1.18_x86_64-pc-linux-gnu


or on i686-pc-linux-gnu:

time ./ithena_cnode_v1.18_i686-pc-linux-gnu


The program generates the following files:

stderr.txt
ithena_cnode_out0
ithena_cnode_out1


The above result files can be emailed to me: lswierczewski at cybercomplex dot net.

I am waiting for info ;)
ID: 1086 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile JohnMD

Send message
Joined: 22 May 20
Posts: 11
Credit: 1,656,620
RAC: 209
Denmark
Message 1143 - Posted: 17 Apr 2021, 23:40:39 UTC

My CNODE tasks run for about 24 hours with a couple of minutes CPU. Is this normal ?
ID: 1143 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 24 Sep 19
Posts: 106
Credit: 827,542
RAC: 422
Australia
Message 1371 - Posted: 8 Nov 2021, 11:45:57 UTC
Last modified: 8 Nov 2021, 11:46:28 UTC

This problem is back.

My PERF tasks are taking from 40 minutes to over an Hour to run, not 3 to 4 minutes as normal.

My CNODE takes are running for nearly 2 hours so far and none have finished and reported yet.

Who knows how long an OONI work unit will run seeing as 6 hours is normal.

Conan
ID: 1371 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 24 Sep 19
Posts: 106
Credit: 827,542
RAC: 422
Australia
Message 1373 - Posted: 8 Nov 2021, 21:54:50 UTC - in response to Message 1371.  

This problem is back.

My PERF tasks are taking from 40 minutes to over an Hour to run, not 3 to 4 minutes as normal.

My CNODE takes are running for nearly 2 hours so far and none have finished and reported yet.

Who knows how long an OONI work unit will run seeing as 6 hours is normal.

Conan


Back to normal again now, some bad parameters perhaps?

Conan
ID: 1373 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 24 Sep 19
Posts: 106
Credit: 827,542
RAC: 422
Australia
Message 1377 - Posted: 15 Nov 2021, 22:17:13 UTC

It's back again, all my CNODE work units are taking up to 2 hours or more to run, not 30 minutes.

OONI Probe does not seem to be affected, however an occasional PERF does run long as well but not all.

Been happening last few days.

Conan
ID: 1377 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Much longer Runtime?!

© 2019-2024 iThena. All rights reserved. | Private Policy

Page generated on 28 Mar 2024, 22:49:34 UTC in 0.1381 seconds.