Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
load average far exceeds allowed value
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
javeree
Guru
Guru


Joined: 29 Jan 2006
Posts: 406

PostPosted: Tue Jun 23, 2020 3:38 pm    Post subject: load average far exceeds allowed value Reply with quote

I am running an emerge -udU @world on an Intel Atom CPU N280 with 1G RAM, so nearly 15-20 year old Hardware.
I don't mind that some emerges may take several days, as long as the PC does not get completely bogged down. Normally this all works fine.

In the last few emerges however, I see load average going up to values: "load average: 25.26, 25.65, 24.82", and even higher, even though I emerge with --jobs=2 --load-average=2. I can understand that the emerge would get bogged down itself by swapping, but this indicates cpu load, not disk.
In the last run, I have seen this behaviour while it was emerging rclone, and I just broke off the emerging of opencv, so we're not talking about the infamous 'huge' packages such as firefox, qtwebkit, ...

for information, here is emerge -info: https://pastebin.com/dfKShXub
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7530
Location: almost Mile High in the USA

PostPosted: Tue Jun 23, 2020 3:49 pm    Post subject: Reply with quote

Load average is not CPU load specifically, it's an average over a period of time of how many processes that are currently requesting or using CPU time.

With 1GB RAM I suspect you may be running out of RAM and thrashing the disk. Each one of those processes that are waiting for their pages to come back from disk swap adds one to the load average count.

You may need to do updates outside of the GUI if you're using one. I have an N270 with 2GiB RAM, and proceeds through most merges at a fairly consistent rate as I tried to make sure it minimizes swapping. Programs have simply gotten bigger and bigger, using more and more RAM.

However it looks like you have 2GB RAM and 0 anonymous swap. Try adding swap. This will speed things up considerably.
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
javeree
Guru
Guru


Joined: 29 Jan 2006
Posts: 406

PostPosted: Tue Jun 23, 2020 4:20 pm    Post subject: Reply with quote

OK, that makes sense. I always misinterpreted the meaning of load-average. However, if I have load-average set to 2, my case here would actually mean that there are 23 non-emerge related processes requesting CPU time, or is this figure that is shown during emerge only relating to the emerge itself ?

You are right, I don't have currently any swap as shown by free, whereas /etc/init.d/swap reports it is started.
Upon deeper inspection, I found that on a recent disk reorganization, I have now swap on a different drive than what fstab says, so I can now retry emerging and see the effect.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7530
Location: almost Mile High in the USA

PostPosted: Tue Jun 23, 2020 4:36 pm    Post subject: Reply with quote

Correct, load the computer needs like running your GUI, daemons, interactive shells, etc. all count towards load average if they need to run at the time of snapshot. When viewing load average with 'top' or 'uptime' this is of all processes and not just emerge.

The tough part is especially when emerging with --jobs 2 -- both jobs will contribute to load average, and well, anything else running will too. Having the max load average set at 2 will probably keep the build throughput really low...

I use jobs and load mostly to try to optimize distcc use, though even that is very situational.
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 15623

PostPosted: Wed Jun 24, 2020 1:38 am    Post subject: Reply with quote

You have --jobs 2 load-average 2 on both emerge and MAKEOPTS, so you may have two packages building in parallel, each of which gets that MAKEOPTS value. Not all build systems understand --load-average, so it is possible that you built something which saw only the --jobs value, and spawned 2 jobs regardless of the impact on load. Further, some badly behaved build systems get confused and pass these options down to nested build tasks inside the package, which might then create even more jobs. To understand what went wrong, it would help to see which packages were active at the time and what they were running.
Back to top
View user's profile Send private message
Goverp
l33t
l33t


Joined: 07 Mar 2007
Posts: 877

PostPosted: Wed Jun 24, 2020 9:41 am    Post subject: Reply with quote

IIUC --jobs only controls whether a job is spawned; once it's started, it gets -j threads. As far as I can see, portage will start more jobs provided the current load is less than --jobs, so if one job is mostly waiting (e.g. fetching a huge distfile) it won't count towards load, and portage foolishly starts another job. Then the wait finishes, and kaboom. AFAIK, portage can't suspend a job once it's started, so you just have to wait until the jobs finish enough to get the load down to something reasonable
_________________
Greybeard
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7530
Location: almost Mile High in the USA

PostPosted: Wed Jun 24, 2020 3:13 pm    Post subject: Reply with quote

--jobs (emerge) and -j (make) are opportunistic. It will create more jobs if:

* the scheduler (emerge or make) has a job that it needs to run
* there are no dependencies to that job
* it does not break the LOAD average restriction specified by --load-average (emerge) and -l (make).

Indeed you need to be careful when using --jobs and -j at the same time. However I ran into one situation that you must use both in order to fully utilize your system**, but if you use both you can potentially run into situations where you get more jobs than your computer can handle, and your load average will shoot up.

On the other hand --load-average and -l should be used to limit your load average when it makes sense. Again I found situations where --load-average and -l will bottleneck your system unnecessarily.

What would be really nice is if one could modify all four of --jobs/-j and --load-average/-l on the fly while make/emerge are running depending on what you see is happening on your machine. But this is a pipe dream as setting such up would be extremely ugly, code wise...

** "your system" means "the multicore/multithread computer you want to run emerge on, and the distccd machines you have access to" ... Now this can get complicated :)
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Rad
Guru
Guru


Joined: 11 Feb 2004
Posts: 397
Location: Berne, Switzerland

PostPosted: Wed Jun 24, 2020 7:39 pm    Post subject: Reply with quote

eccerr0r wrote:
But this is a pipe dream as setting such up would be extremely ugly, code wise...

It's possible that you could do this in a somewhat less ugly way with cgroups.

AFAIK there was no particular helper functionality for configuring cgroup resource management in Portage itself, so you'll have to do something like try using cgexec like in this bug report or use a wrapper that changes settings in /sys/fs/cgroup/.
Back to top
View user's profile Send private message
Ant P.
Watchman
Watchman


Joined: 18 Apr 2009
Posts: 6637

PostPosted: Wed Jun 24, 2020 9:30 pm    Post subject: Reply with quote

eccerr0r wrote:
What would be really nice is if one could modify all four of --jobs/-j and --load-average/-l on the fly while make/emerge are running depending on what you see is happening on your machine. But this is a pipe dream as setting such up would be extremely ugly, code wise...

What emerge should really be doing is running a global jobserver that uses the /proc/pressure interface to ratelimit instead of loadavg, which lags by up to a minute. You wouldn't need to mess with MAKEOPTS at all, it could have distcc-awareness, and everything would work close to optimally.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7530
Location: almost Mile High in the USA

PostPosted: Wed Jun 24, 2020 9:42 pm    Post subject: Reply with quote

Will cgroups actually stop a make or emerge process from submitting more jobs?

As far as I know, it will continue to run jobs as long as the conditions allow it (have a job, no dependencies, load average). However if the job is submitted, it will start allocating memory... will cgroups kill that job? Then make/emerge will have to start that job over from scratch. Will it automatically suspend that new job? This sort of can cause deadlock conditions so this is not a proper solution over actually controlling the job supervisor from submitting more jobs. It likely will just cause it to run slowly but this requires the corresponding pages in RAM which will still cause memory pressure...
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7530
Location: almost Mile High in the USA

PostPosted: Wed Jun 24, 2020 9:47 pm    Post subject: Reply with quote

Ant P. wrote:
What emerge should really be doing is running a global jobserver that uses the /proc/pressure interface to ratelimit instead of loadavg, which lags by up to a minute. You wouldn't need to mess with MAKEOPTS at all, it could have distcc-awareness, and everything would work close to optimally.

Nice we want another scheduler to work with existing schedulers which could still submit more jobs because make still relies on load averages... The idea is we still need to somehow force the underlying make/ninja/waf/... to stop submitting jobs, plain and simple, when under pressure - without completely suspending it, which may cause deadlocks...
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 15623

PostPosted: Thu Jun 25, 2020 12:57 am    Post subject: Reply with quote

The Make jobserver concept seems well-suited here. The top-level dispatcher process would monitor load average, free space in the build filesystem, free RAM, etc. and only place tokens in the jobserver queue when it decides the system can handle more load. Individual package-level job dispatchers (make, ninja, waf, etc.) would request a token from the queue before starting a job. Whenever the top-level dispatcher process decides the system is too heavily loaded, it stops adding tokens to the queue (and may even start trying to steal tokens from the queue to discourage the packages from replacing tasks that have finished). Any attempt to start new jobs blocks until the top-level dispatcher reverses course and adds more tokens. This does have the potential for some suboptimal results, such as running one job each in several different packages, rather than several jobs all in one package. The latter could be more efficient due to greater locality. Jobs from the same package would be more likely to read the same files, and running jobs all in one package would bring that package to completion sooner. Once it was completed, it could be installed and its temporary area deleted.

The big question is whether all the major build tools can agree on a protocol (whether Make jobserver or something else), and whether they are all implemented in a way that failures cannot accidentally cause starvation.
Back to top
View user's profile Send private message
Ant P.
Watchman
Watchman


Joined: 18 Apr 2009
Posts: 6637

PostPosted: Thu Jun 25, 2020 1:37 am    Post subject: Reply with quote

Maybe this'll all end up obsoleted by something that can manage cgroups by having fork() block in them. No need to fix all the build systems separately if you can leverage the OS.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7530
Location: almost Mile High in the USA

PostPosted: Thu Jun 25, 2020 3:09 am    Post subject: Reply with quote

Now the question is, should it completely block when the fork() occurs and break posix which normal operation should return right away, or should it noblock as posix says and silently queue the fork() - and if it does actually fail, now state is messed up between the parent and child?

Don't know, there's a lot of issues here faking out userland at the OS level. This is a lot like "hey, let's fix this software problem in hardware!" ...
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Ant P.
Watchman
Watchman


Joined: 18 Apr 2009
Posts: 6637

PostPosted: Thu Jun 25, 2020 3:22 am    Post subject: Reply with quote

Unwavering worship of standards and toolchains written for hardware designed 40 years ago is why we have loadavg 26 on a single-core machine, no? It's time to do better.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7530
Location: almost Mile High in the USA

PostPosted: Thu Jun 25, 2020 4:04 am    Post subject: Reply with quote

Yeah, we need to replace the almost 30 year old Linux kernel and the even older SysV/BSD and replace it with something different...
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Ant P.
Watchman
Watchman


Joined: 18 Apr 2009
Posts: 6637

PostPosted: Thu Jun 25, 2020 4:27 am    Post subject: Reply with quote

That's exactly what things like BPF and io_uring are going to do. Linux is transitioning to a message-passing microkernel, eventually glibc will just be an emulation layer on top of that.

In the meantime, Gentoo could stand to solve the real actual problem this thread is about.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7530
Location: almost Mile High in the USA

PostPosted: Thu Jun 25, 2020 1:26 pm    Post subject: Reply with quote

Except that this is the real problem. Are we asking the kernel to actually limit jobs being submitted or are we simply trying to throttle jobs after they're submitted. Throttling jobs does not decrease load average. Stopping jobs from forking in the behavior they've been doing for years is breaking userspace.

The only other thing is if cgroups will lie to the process group what the true load average is -- THIS perhaps is a "kernel" solution to the userspace problem.

IMHO this should be solved in userspace not kernel, just that it involves breaking pretty much every build tool out there.

BTW unless your system has a *lot* of cores and RAM including distcc cores, say at least 10 and have the RAM for each core, I'd suggest not using emerge's --jobs option and leave it at 1. The load average options and MAKEOPTS -j X option should be adjusted accordingly to your local machine's configuration.
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Rad
Guru
Guru


Joined: 11 Feb 2004
Posts: 397
Location: Berne, Switzerland

PostPosted: Fri Jun 26, 2020 8:29 pm    Post subject: Reply with quote

eccerr0r wrote:
Will cgroups actually stop a make or emerge process from submitting more jobs?

No. It will just make sure it doesn't take more than the allocated ratio of resources, without any specific reliance on emerge, make, go, ghc, scalac, ninja, cmake, [...] cooperating in any specific way.

eccerr0r wrote:
However if the job is submitted, it will start allocating memory... will cgroups kill that job?

Configurable. It can also suspend the process. This is obviously mainly meaningful if you can wait for swap/RAM shares to be freed by other processes or such.

eccerr0r wrote:
This sort of can cause deadlock conditions so this is not a proper solution over actually controlling the job supervisor from submitting more jobs.

I think you already trivially avoid deadlocks with regards to your usage by just having ratios [and upper limits if needed] for all the resources (CPU, IO, RAM, swap, network, ...) that need to be managed.

A job supervisor -apart from having the issue of having to support all these build tools and languages- still will have the issue that invidual large jobs can already be very bad for anything else that is supposed to run.

eccerr0r wrote:
Are we asking the kernel to actually limit jobs being submitted or are we simply trying to throttle jobs after they're submitted. Throttling jobs does not decrease load average.

While the discussion surely has branched out, I was thinking to achieve "I don't mind that some emerges may take several days, as long as the PC does not get completely bogged down." Resource ratio/absolute limits and throttling with cgroupsv2 can generally do this. Basically put Portage and everything it runs [and maybe any other compiles] into a cgroup with limits on IO / memory / CPU ... so that neither swapping nor memory nor CPU use really interferes with what other things you do.
Back to top
View user's profile Send private message
Rad
Guru
Guru


Joined: 11 Feb 2004
Posts: 397
Location: Berne, Switzerland

PostPosted: Fri Jun 26, 2020 8:56 pm    Post subject: Reply with quote

Ant P. wrote:
Maybe this'll all end up obsoleted by something that can manage cgroups by having fork() block in them.

Isn't that just the existing PID controller (CONFIG_CGROUP_PIDS in the kernel)?

I think if you wanted to use this, you might set a memory limit at which the process suspends for PID limiting. Then you'd watch memory.oom_control via eventfd and set the maximum PID limit to the current value + raise the memory limit to the actual maximum limit so the process continues (maybe also enabling OOM killing at the same time).

That said, it's another thing I haven't done yet - I just remembered that there is a PIDs module, the rest is from the docs.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7530
Location: almost Mile High in the USA

PostPosted: Fri Jun 26, 2020 9:54 pm    Post subject: Reply with quote

I think I answered my own questions in the post, they were thought provoking questions.

Rad wrote:
eccerr0r wrote:
However if the job is submitted, it will start allocating memory... will cgroups kill that job?

Configurable. It can also suspend the process. This is obviously mainly meaningful if you can wait for swap/RAM shares to be freed by other processes or such.

That is the problem. Cgroups must NOT kill the job. If it kills the job, the underlying scheduler will abort the build/make/emerge/...
Rad wrote:
eccerr0r wrote:
This sort of can cause deadlock conditions so this is not a proper solution over actually controlling the job supervisor from submitting more jobs.

I think you already trivially avoid deadlocks with regards to your usage by just having ratios [and upper limits if needed] for all the resources (CPU, IO, RAM, swap, network, ...) that need to be managed.

A job supervisor -apart from having the issue of having to support all these build tools and languages- still will have the issue that invidual large jobs can already be very bad for anything else that is supposed to run.

The kernel scheduler/superscheduler/cgroups does not deadlock, but what about the user level scheduler (make, ninja, emerge, waf, whatnot?). If you start messing with the running of these tasks you may end up deadlocking the userlevel jobs. Granted it won't take down the machine, but forward progress can be lost ("Livelock" to the computer, "deadlock" to the make job).

Still think the solution involves having to feed back to the user level scheduler that it may not continue to submit jobs when under load pressure. So I think lying to 'make' about the CPU status is probably the easiest way to feed back this information, but if it decides to not honor it, we're back to square 1.
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Rad
Guru
Guru


Joined: 11 Feb 2004
Posts: 397
Location: Berne, Switzerland

PostPosted: Sat Jun 27, 2020 6:35 am    Post subject: Reply with quote

eccerr0r wrote:
That is the problem. Cgroups must NOT kill the job. If it kills the job, the underlying scheduler will abort the build/make/emerge/...

Tastes about whether to OOM kill or not already go either way without Cgroups. But the ability to choose any of the old options isn't being affected even if you use Cgroups- you just get more options, such as suspending tasks until other groups use less resources and the chosen ratio becomes possible, or keeping the minimum amount / a ratio of all relevant resources free for uses other than compiling, making them effectively never really interfere with how well the system runs otherwise.

You seem to be more on the side of "get the compiles done no matter what", in which case I guess you wouldn't OOM kill. You still could enforce ratio constraints so you have some IO and RAM left when the compiles start swapping or equivalently loading more from disk repeatedly into RAM and then freeing it. Should still maintain a lot better responsiveness vs. the usual really bad situation where full RAM leads to a ton of swapping/IO/CPU IO wait and so on and everything becomes slow. Of course you're taking this from compiling faster and of course whatever resources your compile is allowed to have will be occupied longer on average, but that's typically not even "the problem".

eccerr0r wrote:
Still think the solution involves having to feed back to the user level scheduler that it may not continue to submit jobs when under load pressure.

I don't. Too many build systems, compilers, linkers and so on don't really support much of this. And upstream definitely generally won't care to reorganize code and build tools to keep individual threads/tasks small so they're even actually responsively controllable with just launching more or less jobs or "lying about CPU status" or IO status or anything else that you can already actually quite reliably handle with cgroups.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7530
Location: almost Mile High in the USA

PostPosted: Sat Jun 27, 2020 1:21 pm    Post subject: Reply with quote

Rad wrote:
Tastes about whether to OOM kill or not already go either way without Cgroups. But the ability to choose any of the old options isn't being affected even if you use Cgroups- you just get more options, such as suspending tasks until other groups use less resources and the chosen ratio becomes possible, or keeping the minimum amount / a ratio of all relevant resources free for uses other than compiling, making them effectively never really interfere with how well the system runs otherwise.

You seem to be more on the side of "get the compiles done no matter what", in which case I guess you wouldn't OOM kill. You still could enforce ratio constraints so you have some IO and RAM left when the compiles start swapping or equivalently loading more from disk repeatedly into RAM and then freeing it. Should still maintain a lot better responsiveness vs. the usual really bad situation where full RAM leads to a ton of swapping/IO/CPU IO wait and so on and everything becomes slow. Of course you're taking this from compiling faster and of course whatever resources your compile is allowed to have will be occupied longer on average, but that's typically not even "the problem".

The whole idea is that you want both:
- Interactive session still usable
- Highest job throughput
I mean, why not just do -jobs 1 and don't bother with parallelization. Your load average will not go very high, solves all "apparent" problems -- except cpu resource is NOT being most effectively used.
Rad wrote:
eccerr0r wrote:
Still think the solution involves having to feed back to the user level scheduler that it may not continue to submit jobs when under load pressure.

I don't. Too many build systems, compilers, linkers and so on don't really support much of this. And upstream definitely generally won't care to reorganize code and build tools to keep individual threads/tasks small so they're even actually responsively controllable with just launching more or less jobs or "lying about CPU status" or IO status or anything else that you can already actually quite reliably handle with cgroups.

So you're saying the idea is to let user level job schedulers go ahead and submit ALL their jobs at once and let the OS choose which ones between them to run and which ones shouldn't - and it should always know which processes need to be run to prevent user-level deadlocks? Or are you saying it's the underlying job scheduler's responsibility to not overload the kernel with jobs, which defeats the purpose of the kernel trying to manage multiple processes?

We're not talking about fork bombs here, this is a real set of jobs that the computer must complete. Really, The idea of load average is actually meaningless. It's just a number after all, the point is that the computer needs to finish all the tasks it needs to do, ideally automatically with the highest throughput and without killing the performance of the user interface. The kernel knowing which processes are serialized in the user interface is the problem here - it can't tell. This is the underlying problem, all of these "workarounds" are just trying to mitigate the fact the kernel cannot tell.
This problem has been around for ages, it's not a new problem. Here I think people are abusing cgroups as a bandage to ensure the user interface responsiveness does not suffer when someone (or multiple persons) tells the computer to do more than what the computer can possibly do, where it was meant to more fairly share the resources among sets of jobs.

Seems you're just wanting the computer to fail the job if it even smells like it could overload the computer and force the user to restart possibly long running processes wasting time while it was running - forcing the user to checkpoint, again hinting at user level work to deal with the kernel randomly killing their work, and not even counting the time when the process is dead is wasted cpu time.
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Rad
Guru
Guru


Joined: 11 Feb 2004
Posts: 397
Location: Berne, Switzerland

PostPosted: Sun Jun 28, 2020 12:40 pm    Post subject: Reply with quote

eccerr0r wrote:
I mean, why not just do -jobs 1 and don't bother with parallelization. Your load average will not go very high, solves all "apparent" problems -- except cpu resource is NOT being most effectively used.

Yes, with -j1 your CPU and IO resources are probably not be effectively used. Plus there can be single jobs that are too taxing to retain interactivity anyhow. And then on top of all this you rely on the build system and compiler etc. actually accepting some "-jobs 1" parameter - which, again, really isn't the case.

eccerr0r wrote:
So you're saying the idea is to let user level job schedulers go ahead and submit ALL their jobs at once and let the OS choose which ones between them to run and which ones shouldn't - and it should always know which processes need to be run to prevent user-level deadlocks?

Nothing here should deadlock though unless you completely misconfigure it?

There is no deadlock if you have a 80MB/s read limit on emerge's group hierarchy. There is no deadlock if the same hierarchy can have 1GB RAM and 3GB swap or else gets OOM killed. [In either instance, the system gets the rest and the usual scheduling applies].

eccerr0r wrote:
the point is that the computer needs to finish all the tasks it needs to do, ideally automatically with the highest throughput and [i]without killing the performance of the user interface.

Yes, and this is a pretty reliable way to do this in terms of keeping he performance of the user interface. You just observe how much resources you approximately need for the system to be interactive enough, and then you mostly simply limit portage to not take up that amount resources if they're needed elsewhere.

eccerr0r wrote:
tells the computer to do more than what the computer can possibly do, where it was meant to more fairly share the resources among sets of jobs.

Yep, it's not mainly about "fairly" sharing every resource but about telling Portage to yield to more important things such as interactive user web browser forum posts if these other uses are claiming shares. Or even telling it to never claim x resources.

eccerr0r wrote:
Seems you're just wanting the computer to fail the job if it even smells like it could overload the computer

Users would set the actual limits. Maybe you OOM kill within the cgroup at 800MB RAM usage after swap is full. Or never at all. Sure enough, I have my own preferences for individual machines and for some I do not want portage to noticeably interfere with anything else, yes.

eccerr0r wrote:
force the user to restart possibly long running processes wasting time while it was running [...] randomly killing their work

Given that the enabled settings are would have been explicitly configured by you, you wanted it that way.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7530
Location: almost Mile High in the USA

PostPosted: Sun Jun 28, 2020 1:57 pm    Post subject: Reply with quote

Rad wrote:
Users would set the actual limits. Maybe you OOM kill within the cgroup at 800MB RAM usage after swap is full. Or never at all. Sure enough, I have my own preferences for individual machines and for some I do not want portage to noticeably interfere with anything else, yes.

Well you just summed up this whole argument: it's totally up to the USER, and user level schedulers to not overload the computer and there's no such special software, cgroups or whatnot, that can prevent overloading of jobs if the user decides poorly or does not know what the appropriate limit is.

It really sounds like you're more of coming from a sysadmin POV that you don't care about the user jobs and "saving the computer"/GUI is the most important thing. I view the computer as a tool and saving progress on the job given to the computer is the computer's main priority, and killing (or even suspending - which is the main deadlock concern) is of the last resort. I've frequently killed the GUI whenever possible to save forward progress of actual work jobs as the GUI is of lesser value, and sure do not want the kernel to make that choice, especially if the job is a child of the GUI and not properly disowned. But once again this is a user choice and nothing a kernel should ever choose.

A key note that is neglected: there's nothing wrong with swapping. People have a bad connotation with swapping for some reason. Thrashing on the other hand is a problem, and historically kernels equally have a hard time knowing (without a huge overhead that could go towards actually giving the user more memory to do real work) that it's thrashing and what's an "appropriate" level of swapping. Having a job arbitrarily killed while just because it's swapping is a horrible idea, and people do themselves a disservice by deliberately not having anonymous swap simply to force the computer to kill their emerge jobs when it comes under the slightest bit memory pressure, and then having the Linux kernel foil their plans when it does text page swapping...
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum