Technical Coordination Weekly
→
Europe/Stockholm
Skype
Skype
Description
Technical coordination group members and invited persons
26 August 2016, 14:10 - 15:12
Present: Balazs, Mattias, Jon, Aleksandr, Anders (last 2 min), Oxana
Apologies: David
= Urgent issues
Some unexplained crashes at NSC (segfaults) and LUNARC (huge infoprovider logs), probably due to some corrupted files. Still internal to NeIC, no reports filled yet.
= Bugs
= Release status
A bugfix release will be needed. To be fixed: 3470 (watchdog, would be nice), not clear when it'll be ready.
Jon will send a warning that a bugfix release is being planned.
Next major release would require a meeting.
= Coming meetings
September 10: back-ends FTF in Copenhagen
September 29: NeIC NT1 all-hands in Ljubljana
NordForsk project kick-off some time in October
= A.O.B.
Anders has some issues with linthian warnings (e.g. man pages for old arcproxy and canl++), something needs to be done.
Present: Balazs, Mattias, Jon, Aleksandr, Anders (last 2 min), Oxana
Apologies: David
= Urgent issues
Some unexplained crashes at NSC (segfaults) and LUNARC (huge infoprovider logs), probably due to some corrupted files. Still internal to NeIC, no reports filled yet.
= Bugs
- 3468 arex excessive logging when infoproviders timeout expires - no progress yet
- 3210 CPU time isn't measured correctly for some jobs (e.g. ALICE); Ake Sandgren had an idea how to address it, some ago
- 3470 Watchdog did not restart arched after segfault - no reason found yet
- 3163 Infosystem showing incorrect info on multicore jobs with condor backend - tetsbed is set up, but no tests ran yet
- 2036 infosys not scalable for ~100k jobs - requires a serious re-writing
- 3384 Support for per-queue authorisation configuration and publishing - a dramatic change, triggers a major release
- 3433 Publish authorised VOs per queue - related to the 3384 above
- 3486 External helper log file location is hardcoded to controldir/job.helper.errors - Aleksandr can easily fix it
- 3432 bdii-update.log fills up with complaints about dn suffix (REOPENED) - Mattias still to look into it, not easy to reproduce
- 3457 Accounting problem with PBS/torque for multi-core jobs (REOPENED) - no progress yet
- 3503 PBS scan not parse node information - probably related to 3457 above, to be clarified
- 3505 ACIX produces not only acix-cache.log, but also twistd.log - another specimen for the log zoo
- 3506 PBS scan does not handle job IDs without suffix - patch exists, some disagreements on style
- 3504 openldap 2.4.40 crashed after few minutes with ARC 5.x (MAJOR) - probably not our problem
- 3497 Skip heavily loaded delivery servers - David's todo list
- 3499 SGE LRMS inforprovider should properly detect GLUE2 OSName,OSVersion,OSFamily - a minor feature request
- 3500 LL LRMS inforprovider should properly detect GLUE2 OSName,OSVersion,OSFamily - twin brother of the above
- 3502 bulk arcls - David's todo list
= Release status
A bugfix release will be needed. To be fixed: 3470 (watchdog, would be nice), not clear when it'll be ready.
Jon will send a warning that a bugfix release is being planned.
Next major release would require a meeting.
= Coming meetings
September 10: back-ends FTF in Copenhagen
September 29: NeIC NT1 all-hands in Ljubljana
NordForsk project kick-off some time in October
= A.O.B.
Anders has some issues with linthian warnings (e.g. man pages for old arcproxy and canl++), something needs to be done.
There are minutes attached to this event.
Show them.