It has come to light that sometimes GPU RAM is not released immediately after exiting Ollama. Users have noted that it can sometimes take several minutes for RAM to be freed, which can be frustrating if you’re trying to conserve resources for other applications. One workaround is to adjust the settings of the Ollama service to set a lower timeout for idle states, such as
, which controls how long the model remains in memory. Even better, the most recent updates have introduced more enhancements in this area, allowing users to dictate how resource management is handled.