kernel

To manipulate the kernel modules we need to make sure the kmod package is installed in our system:

# [ yum | dnf ] install kmod

The lsmod command parses the contents of /proc/modules in a more human readable way.

We can dig a bit more info on particular modules with the modinfo command:

# modinfo ip_tables

filename: /lib/modules/3.10.0-123.el7.x86_64/kernel/net/ipv4/netfilter/ip_tables.ko

description: IPv4 packet filter

author: Netfilter Core Team <coreteam@netfilter.org>

license: GPL

srcversion: 44A16130862F8CA2ECA59D9

depends:

intree: Y

vermagic: 3.10.0-123.el7.x86_64 SMP mod_unload modversions

signer: Fermi National Accelerator Laboratory: Scientific Linux kernel signing key

sig_key: 0E:88:DF:6B:94:F4:EB:C4:DC:8D:B7:7E:13:B0:6F:6C:C5:18:30:C6

sig_hashalgo: sha256

.

# modinfo e1000e | grep "^parm:"

parm: debug:Debug level (0=none,...,16=all) (int)

parm: copybreak:Maximum size of packet that is copied to a new buffer on receive (uint)

parm: TxIntDelay:Transmit Interrupt Delay (array of int)

parm: TxAbsIntDelay:Transmit Absolute Interrupt Delay (array of int)

parm: RxIntDelay:Receive Interrupt Delay (array of int)

parm: RxAbsIntDelay:Receive Absolute Interrupt Delay (array of int)

.

# modinfo e1000e | grep "^depends:"

depends: ptp

And we can check the current configuration of all modules with:

# modprobe -c                → for all modules

Sometimes we need to load kernel modules (i.e. to run some binary) that have not yet been loaded.

# modprobe -v fcoe

insmod /lib/modules/3.10.0-123.el7.x86_64/kernel/drivers/scsi/scsi_tgt.ko

insmod /lib/modules/3.10.0-123.el7.x86_64/kernel/drivers/scsi/scsi_transport_fc.ko

insmod /lib/modules/3.10.0-123.el7.x86_64/kernel/drivers/scsi/libfc/libfc.ko

insmod /lib/modules/3.10.0-123.el7.x86_64/kernel/drivers/scsi/fcoe/libfcoe.ko

insmod /lib/modules/3.10.0-123.el7.x86_64/kernel/drivers/scsi/fcoe/fcoe.ko

And very rarely we might need to unload certain modules:

# modprobe -r fcoe  → unload FCoE module if it is not used

# rmmod fcoe        → same as above

# rmmod -w fcoe     → if used, it waits until it isn't and then unloads it

If we need to change the value of some input parameter for a module, we can do so as follows:

# lsmod | grep e1000g      → make sure module is not loaded and, if it is, unload it!

# modprobe e1000g InterruptThrottleRate=3000,3000,3000 debug=1

If we need to change some kernel module settings (i.e. load some modules at startup or change the values of their parameters), we can do so by adding scripts to the /etc/modprobe.d directory. For example, if we need the virtio-net kernel module loaded at startup, we can achieve that simply with the command:

# echo “virtio-net” > /etc/modprobe.d/virtio-net.conf

# cat /etc/modprobe.d/virtio-net.conf

virtio-net

At boot time, systemd scans all the files /etc/modprobe.d/*conf and the kernel modules included in them are loaded with any explicit parameters given.

As regards to kernel parameters, the categories vary system to system but the main ones are: dev, fs, kernel, net & sunrpc. In most systems the default kernel parameters are set to acceptable values. But we might want to change some of them to tighten up the security of the system or to enable it to run specific software (i.e. databases, web servers, etc).

To view all the kernel parameters and their current values we use:

# sysctl -a                   → list all parameters and values

abi.vsyscall32 = 1

crypto.fips_enabled = 0

debug.exception-trace = 1

debug.kprobes-optimization = 1

dev.cdrom.autoclose = 1

dev.cdrom.autoeject = 0

dev.cdrom.check_media = 0

dev.cdrom.debug = 0

[...]

.

# sysctl -a -r “^sunrpc”      → list parameters matching regexp

sunrpc.max_resvport = 1023

sunrpc.min_resvport = 665

sunrpc.nfs_debug = 0

sunrpc.nfsd_debug = 0

[...]

.

# sysctl -a -N -r “^crypto”   → just list the parameters, no values shown

crypto.fips_enabled

.

# sysctl -n kernel.hostname   → just list the value

orap1.company.net

The default values for kernel parameters are either hard-coded, determined at compilation time or set in the files underneath the directory /usr/lib/sysctl.d/.

# ls -l /usr/lib/sysctl.d/

total 12

-rw-r--r--. 1 root root 466 Mar 5 2015 00-system.conf

-rw-r--r--. 1 root root 710 Sep 15 15:12 50-default.conf

-rw-r--r--. 1 root root 499 Sep 15 15:15 libvirtd.conf

If we need to change any parameter on a permanent basis we should make sure to add the value-pair to /etc/sysctl.conf or, even better, in its own file in /etc/sysctl.d.

# cat /etc/sysctl.conf

# System default settings live in /usr/lib/sysctl.d/00-system.conf.

# To override those settings, enter new settings here, or in an /etc/sysctl.d/.conf file

#

# For more information, see sysctl.conf(5) and sysctl.d(5).

# cat /etc/sysctl.d/oracle.conf

# Oracle settings

kernel.shmmni = 4096

kernel.shmmax = 4398046511104

kernel.shmall = 1073741824

kernel.sem = 250 32000 100 128

fs.aio-max-nr = 1048576

fs.file-max = 6815744

net.ipv4.ip_local_port_range = 9000 65500

net.core.rmem_default = 262144

net.core.rmem_max = 4194304

net.core.wmem_default = 262144

net.core.wmem_max = 1048586

Any value-pair set in a file in /etc/sysctl.d/ will overwrite the same pair if set in /etc/sysctl.conf. We should make sure the value-pairs do not conflict in the different configuration files as, if they do, the alphabetical order of the files in /etc/sysctl.d/ will determine what is the last pair to be read and set.

Changing the kernel values in the configuration files does not affect in any way a running system as those values won't be used until the next reboot. If we need changes to be effected immediately we have 3 ways forward.

We can update the kernel values by reloading sysctl.conf or any file underneath /etc/sysctl.d/:

# sysctl -p                 → reloads and enforces the settings in /etc/sysctl.conf
# sysctl -p /etc/sysctl.d/oracle.conf

We can use sysctl -w …

# sysctl -w net.core.rmem_max=10485760

… or we can do it directly (a bit less safe) …

# echo “10485760” > /proc/sys/net/core/rmem_max

But we have to remember that any immediate effect won't survive a reboot. So if we need the changes persisted, we should add the value-pairs to the configuration files.

All the kernel parameters can be read/written in the /proc/sys pseudo filesystem:

# ls -l /proc/sys/

total 0

dr-xr-xr-x. 1 root root 0 Oct 14 11:14 abi

dr-xr-xr-x. 1 root root 0 Oct 13 21:03 crypto

dr-xr-xr-x. 1 root root 0 Oct 14 11:14 debug

dr-xr-xr-x. 1 root root 0 Oct 14 11:14 dev

dr-xr-xr-x. 1 root root 0 Oct 13 21:03 fs

dr-xr-xr-x. 1 root root 0 Oct 13 21:03 kernel

dr-xr-xr-x. 1 root root 0 Oct 13 21:03 net

dr-xr-xr-x. 1 root root 0 Oct 14 11:14 sunrpc

dr-xr-xr-x. 1 root root 0 Oct 13 21:03 vm

# tree fs

fs

├── aio-max-nr

├── aio-nr

├── binfmt_misc

│ ├── register

│ └── status

├── dentry-state

├── dir-notify-enable

├── epoll

│ └── max_user_watches

├── file-max

├── file-nr

[...]

# sysctl -a -r "^fs"

fs.aio-max-nr = 1048576

fs.aio-nr = 0

fs.binfmt_misc.kshcomp = enabled

fs.binfmt_misc.kshcomp = interpreter /bin/ksh93

fs.binfmt_misc.kshcomp = flags:

fs.binfmt_misc.kshcomp = offset 0

fs.binfmt_misc.kshcomp = magic 0b1308

fs.binfmt_misc.status = enabled

fs.dentry-state = 217546 203735 45 0 0 0

fs.dir-notify-enable = 1

fs.epoll.max_user_watches = 791162

fs.file-max = 6815744

fs.file-nr = 5440 0 6815744

[...]

We can see the tree structure above in which the actual parameters are always at the end.

Let's go through some kernel parameters that could be changed to increase performance (values here are a guidance only!):

Network tuning

net.core.rmem_max = 10485760                → max OS receive buffer size to 10MB

net.core.wmem_max = 10485760                → max OS send buffer to 10MB

net.core.netdev_max_backlog = 5000          → max of packets queued for kernel processing

net.core.somaxconn = 1024                   → max connections backlogged waiting for socket accept

net.ipv4.tcp_syncookies = 1                 → avoid SYN flood DoS attacks

net.ipv4.tcp_fastopen = 1                   → speeds-up successive connections between 2 end-points

net.ipv4.tcp_window_scaling = 1              → enable window scaling if system can take it

net.ipv4.tcp_timestamps = 1                  → enable better measurement of RTT

net.ipv4.tcp_max_tw_buckets = 1000000        → pool size of time-wait sockets

net.ipv4.udp_rmem_min = 16384               → min size in bytes of UDP socket read buffer

net.ipv4.udp_wmem_min = 16384                → min size in bytes of UDP socket write buffer

net.ipv4.ip_local_port_range = ”9000 65500”  → port range available for network connections

Network security

net.ipv4.tcp_max_syn_backlog = 4096         → max TCP connections awaiting acceptance

net.ipv4.conf.*.rp_filter = 1               → drop packets that come from “impossible” places

net.ipv4.conf.*.log_martians = 1            → send to syslog any packets dropped by rp_filters

net.ipv4.ip_forward = 0                     → drop all forward packets (might break tunnels,VPN,etc)

net.ipv4.conf.all.forwarding = 0            → disable forwarding in all existing interfaces

net.ipv4.conf.default.forwarding = 0        → disable forwarding in all future interfaces

net.ipv4.conf.<interface>.forwarding = 0     → disable forwarding for a specific interface

net.ipv4.conf.all.send_redirects = 0        → not needed unless acting as router/gateway

net.ipv4.conf.all.accept_local = 0          → rejects packets with local source addresses

net.ipv4.conf.all.accept_redirects = 0      → redirects are a security risk unless they're secure

net.ipv4.conf.all.secure_redirects = 1       → accept redirects only from the specified gateways

net.ipv4.conf.all.accept_source_route = 0   → safer to ignore source route requests

net.ipv4.icmp_echo_ignore_broadcasts = 1    → ignore ICMP echo requests sent via broadcast

net.ipv4.icmp_echo_ignore_all = 1           → ignore all ICMP echo requests

net.ipv6.conf.all.router_solicitations = 0  → disable unless acting as router/gateway

net.ipv6.conf.all.accept_ra_defrtr = 0      → do not accept default routes sent by RAs

net.ipv6.conf.all.accept_ra_pinfo = 0        → do not accept Prefix Information sent by RAs

net.ipv6.conf.all.autoconf = 0              → do not use PI sent by RAs for device autoconfig

Disk I/O tuning

fs.aio-max-nr = 1048576                     → max async I/O concurrent ops

fs.file-max = 4194304                       → max number of entries of the system-wide file handle table

fs.nr_open = 1048576                         → max number of concurrent file handles for single process

fs.inode-max = 12582912                      → max number of inodes system-wide

fs.mqueue.queues_max = 256                  → max number of mqueues system-wide

fs.mqueue.msg_max = 1024                     → max number of messages in a mqueue

fs.mqueue.msgsize_max = 8192                → max size in bytes of a single mqueue message

fs.pipe-max-size = 1048576                  → max size in bytes of a pipe

Kernel security

kernel.dmesg_restrict = 1        → prevent non-privileged users from using dmesg

kernel.exec-shield = 1           → prevents execution in non-executable memory regions

kernel.kptr_restrict = 1         → kernel ptr addresses are hidden unless user has CAP_SYSLOG privs

kernel.msgmax = 8192             → max size in bytes of SysV queue message

kernel.msgmnb = 819200           → max size in bytes of SysV queue

kernel.msgmni = 32000             → max number of SysV queues system-wide

kernel.shmmax = 4294967295       → max size in bytes of a shared memory segment

kernel.shmmni = 4096             → max number of shared memory segments

kernel.shmall = 268435456        → max number of shared memory pages

kernel.sem =”512 32000 512 128”  → SEMMSL (max semaphores per set), SEMNS (max semaphores total),

.                                  SEMOPM (max ops per call), SEMMNI (max semaphores sets)

kernel.randomize_va_space = 2    → enable Address Space Layout Randomization to

.                                  prevent certain buffer overflow attacks

kernel.threads-max = 125810      → max number of threads system-wide (will be automatically reduced if more

.                                  than 1/8th of RAM would be consumed)

The optimal values of the parameters above might differ a lot from the suggested ones above. Test thoroughly...

<< grub2          systemd >>