Linux Kernel Parameters and Tuning
Types of options/parameters:
- Kernel Configuration Options - They control which features and modules are included in the kernel binary.
- Sysctl Parameters - Used for adjusting kernel behavior at runtime.
- Kernel Command Line Parameters - Configure the kernel during the boot process. Passed by bootloader and can’t be changed at runtime.
Sysctl Parameters
sysctl -a # show all kernel parameters
sysctl net.ipv4.tcp_fastopen # check specific parameter
sudo sysctl -w net.ipv4.tcp_fastopen=3 # temporarily change param, lost after reboot
Persistently set kernel parameters:
/etc/sysctl.confnet.ipv4.tcp_fastopen = 3 vm.nr_hugepages = 1024
Apply changes in config file:
sudo sysctl -p
Separate file for custom settings:
echo "net.core.somaxconn = 1024" | sudo tee /etc/sysctl.d/99-custom.conf
Apply changes:
sudo sysctl --system
Network
fs.file-max = 2097152 # File Descriptor Limits - files, sockets, and pipes
# system wide, not per process like ulimit settings
net.core.rmem_max = 268435456 # Increase Maximum Memory Buffers for Networking
net.core.wmem_max = 268435456 # Increase Maximum Memory Buffers for Networking
net.core.rmem_default = 67108864 # Increase Maximum Memory Buffers for Networking
net.core.wmem_default = 67108864 # Increase Maximum Memory Buffers for Networking
# increase for bandwidth, decrease to save memory
net.ipv4.tcp_rmem = 4096 87380 6291456 # Defines the minimum, default, and maximum buffer sizes for TCP receive buffers.
net.ipv4.tcp_wmem = 4096 65536 16777216 # Defines the minimum, default, and maximum buffer sizes for TCP send buffers.
net.core.somaxconn = 1024 # max connections queued or acceptance by a listening socket
# ( good for web or app server ex. 65535 )
net.ipv4.tcp_max_syn_backlog = 8192 # max syn backlog ( half open connections waiting )
# ( good for web or app server ex. 65535 )
net.ipv4.tcp_fin_timeout=30 # time to wait for closed conn clean up, reduce to free resources faster
net.ipv4.tcp_keepalive_time=600 # timer for keepalive probes on connection
net.ipv4.tcp_fastopen = 3 # Enable Fast Open for TCP, reduce delays
net.ipv4.tcp_low_latency = 1 # Enable low-latency mode for TCP connections, reduce delays
net.ipv4.tcp_timestamps = 0 # Disable TCP Timestamps (Slightly Improves Latency)
net.ipv4.ip_local_port_range = 1024 65000 # Allows a larger range of ephemeral ports for faster network connections.
net.ipv4.ip_forward=1 # enable IP forwarding ( when setting up a router )
Security
net.ipv4.tcp_syncookies = 1 # Enable SYN Flood Protection, Helps mitigate DDoS attacks.
net.ipv4.conf.all.rp_filter = 1 # Enable Reverse Path Filtering (Prevents IP Spoofing)
net.ipv4.conf.all.accept_redirects = 0 # Disable ICMP Redirects
net.ipv4.conf.all.send_redirects = 0 # Disable ICMP Redirects
kernel.randomize_va_space=2 # ASLR - randomize address space to protect memory location attacks ( 0: disable, 1: conservative, 2: full )
CPU / Process / other
kernel.sched_min_granularity_ns=15000000 # min time slice a process can run for before switching
kernel.pid_max=65535 # max value a PID can have
kernel.sched_wakeup_granularity_ns=25000000 # min time slice for process to be woken up
inotify.max_user_watches=524288 # max files that inotify can watch
Memory
kernel.shmmax=2147483648 # max size for single shared memory segment
kernel.sem="250 32000 32 128" # parameters for semaphore arrays - SEMMSL SEMMNS SEMOPM SEMMNI
# SEMMSL: Maximum number of semaphores per semaphore set.
# SEMMNS: Total number of semaphores system-wide.
# SEMOPM: Maximum number of operations per semop system call.
# SEMMNI: Maximum number of semaphore sets.
vm.dirty_background_ratio=20 # percentage of system memory that can be filled with dirty pages before writing in bg
vm.dirty_ratio=30 # percentage of system memory that can be filled with dirty pages before writing
vm.max_map_count=262144 # max number of memory map areas that a process can have
vm.overcommit_memory=1 # memory overcommit behavior - 0: heuristic, 1: always, 2: never
Swappiness
- Swappiness is one of the factors in the swap algorithm that determines when to swap.
Disabling swap:
- Don’t swap unless necessary
- Reduces reliance on swap memory if you have enough RAM.
- May use OOM Killer instead of swapping
- May fail to allocate memory and crash
Disable swap:
sudo sysctl -w vm.swappiness=0
Check swappiness:
cat /proc/sys/vm/swappiness
Swappiness values:
0 | Avoid swapping as much as possible. |
100 | Aggressively swap data to disk. |
60 | on my Ubuntu desktop, default on many systems, balanced approach for general-purpose servers and desktops, mix of work loads |
10 | closer to this for large mem intensive apps ( ex. Redis ) |
80-100 | lot of swap but little RAM |
Increase Swappiness (e.g., 80-100):
- when preventing OOM errors is critical
-
ample swap space and minimal RAM
Decrease Swappiness (e.g., 10-30):
- memory-intensive applications like databases or VMs
- minimize disk I/O and swapping for performance reasons
Kernel Boot Parameters
Example config:
/etc/default/grubGRUB_CMDLINE_LINUX_DEFAULT="quiet splash console=tty1"
Apply Config:
sudo update-grub
sudo reboot
Some boot parameters that could be used:
quiet | Suppresses most boot messages to show a cleaner boot process. |
splash | Displays a graphical boot splash screen. |
console=tty1 | Specify console for kernel messages |
debug | No debugging |
noht | Disable hyper threading ( intel only ) - OLDER |
nosmt | Disable Simultaneous Multithreading ( multiple arch like Intel, ARM, RISC-V ) - NEWER |
nomodeset | Disables kernel mode setting, useful for avoiding display issues during boot (particularly useful for graphics-related issues). |
noapic | Disables the Advanced Programmable Interrupt Controller (APIC), used for interrupt management. This may help resolve boot issues related to multi-processor systems. |
acpi=off | Disables the ACPI (Advanced Configuration and Power Interface). This may be helpful if you’re facing issues related to power management or hardware compatibility. |
nolapic | Disables Local APIC support, which can be useful for troubleshooting on some systems with multi-core processors. |
selinux=0 | Disable selinux |
apparmor=0 | Disable app armor |
init=/bin/bash | Emergency recovery |
fastboot | |
transparent_hugepage=always | Transparent Huge Pages (THP) |
Disable Hyper-Threading (HT)
Avoids context switching overhead and interference from processes using the same core but different threads.
- Can also be disabled in BIOS/Firmware
- Use nosmt or noht kernel parameters ( see above )
Temporarily at run time:
echo off > /sys/devices/system/cpu/smt/control
Check:
cat /sys/devices/system/cpu/smt/active # 0 disabled, 1 enabled
Huge Pages ( Static and Transparent )
Transparent Huge Pages (THP)
- Prob improve performance if you have the mem, can cause issues if you don’t.
- Reduces TLB misses, improving memory access speed.
- Pages will be 2MB or higher instead of standard 4kb
- Great for things that use a lot of memory
- Great for databases, HFT, AI, VMs and containerized environments
- May cause (or avoid) memory fragmentation
- Might negatively affect latency-sensitive applications
- Can waste memory
- Excessive compaction when searching for pages can increase CPU usage
- Managed by OS
Configure at boot with kernel parameter:
transparent_hugepage=always
Configure in runtime:
echo always > /sys/kernel/mm/transparent_hugepage/enabled
always | THP is enabled and used when possible. |
madvise | THP is used only when explicitly requested by applications using madvise(). |
never | THP is disabled. |
Deframmentation with THP
- Two setting to control whether or not defragmentation will be used with THP.
Sysctl parameter:
vm.transparent_hugepage_defrag=defer
Kernel parameter ( at boot ):
GRUB_CMDLINE_LINUX="transparent_hugepage=defer"
Huge Pages ( Static Huge Pages )
- Pre-allocated pool - Allocated by administrator
- Requested by application
- Consistent / Predictable
- Great for DBs
This belongs above in the sysctl section:
sudo sysctl -w vm.nr_hugepages = 1024 # reserve this many huge pages
sudo sysctl -w vm.nr_hugepages = 1 # allocate imediately
sudo sysctl -w vm.nr_hugepages = 0 # deallocate immediately
grep -i huge /proc/meminfo # check
- NOTE:
- Can’t set sethugepagesz or default size at runtime, only at boot. Can still control with /sys FS
- 2MB and 1GB are common, 2 MB is most common and controlled with nr_hugepages
cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages # 2 MB
cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages # 1 GB
echo 1024 | sudo tee /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages # 2 MB
echo 4 | sudo tee /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages # 1 GB
Static huge pages requested from pre-allocated pool with something like this:
- shmget() with SHM_HUGETLB
- mmap() with MAP_HUGETLB
default_hugepagesz=2M # default size when app doesn't specify
hugepagesz=2M # specific size to allocate
hugepages=1024 # number of pages to allocate
Allocate different sizes:
default_hugepagesz=2M hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=1024